I have decided to use VLC instead of mplayer/mencoder for one single reason : VLC supports MPEG-2 TS muxing (through the libdvbpsi) while mencoder do not.
Of course it may be possible to use this lib with mplayer too, but I don't want loosing my time to hack it ...
update 17/08/2007
Mmm, it seems that it is supported in ffmpeg (and so mencoder) too but it is not documented.
The output format name option is "mpegts".
Alternatively there is apparently the Avidemux software that can do the muxing job.
I cannot test right now ... So it's a note for later ...
vendredi 10 août 2007
mardi 7 août 2007
minitutorial : video.yuv -> video.yuv.avi
Well just a small tutorial note for people working with me on video coding ...
to obtain a yuv video from any video source that mplayer can read, execute:
the two actions can be done in one single command like this:
to obtain a yuv video from any video source that mplayer can read, execute:
mplayer file.ext -ao null -vo yuv4mpeg -o file.yuvand to pack a yuv video file into an avi container use:
mencoder file.yuv -ovc copy -o file.yuv.avito keep the sound in the uncompressed avi you can do:
mplayer file.ext -ao pcm -vo yuv4mpeg
mencoder stream.yuv -audiofile audiodump.wav -ovc copy -o file.yuvpcm.avi
the two actions can be done in one single command like this:
mencoder file.ext -ovc raw -nosound -of avi -o file.raw.avior
mencoder file.ext -ovc raw -oac pcm -of avi -o file.raw.avi
mardi 31 juillet 2007
gcc -a
in man pages of gcc 3
This option was remove from gcc4 !
In http://gcc.gnu.org/gcc-4.1/changes.html
Transition of basic block profiling to tree level implementation has been completed. The new implementation should be considerably more reliable (hopefully avoiding profile mismatch errors when using
mmmh I'm thinking about installing gcc-3.4
... done
Warg it's the same.
There is also the thread:
http://www.cygwin.com/ml/binutils/2007-05/msg00345.html
and
http://gcc.gnu.org/ml/gcc/2001-08/msg01385.html
So I have to use gcov !
-a Generate extra code to write profile information for
basic blocks, which will record the number of times
each basic block is executed, the basic block start
address, and the function name containing the basic
block. If -g is used, the line number and filename of
the start of the basic block will also be recorded.
If not overridden by the machine description, the
default action is to append to the text file bb.out.
This data could be analyzed by a program like "tcov".
Note, however, that the format of the data is not what
"tcov" expects. Eventually GNU "gprof" should be
extended to process this data.
This option was remove from gcc4 !
In http://gcc.gnu.org/gcc-4.1/changes.html
Transition of basic block profiling to tree level implementation has been completed. The new implementation should be considerably more reliable (hopefully avoiding profile mismatch errors when using
-fprofile-use
or -fbranch-probabilities
) and can be used to drive higher level optimizations, such as inlining. The -ftree-based-profiling
command line option was removed and -fprofile-use
now implies disabling old RTL level loop optimizer (-fno-loop-optimize
). Speculative prefetching optimization (originally enabled by -fspeculative-prefetching
) was removed.
mmmh I'm thinking about installing gcc-3.4
... done
Warg it's the same.
There is also the thread:
http://www.cygwin.com/ml/binutils/2007-05/msg00345.html
and
http://gcc.gnu.org/ml/gcc/2001-08/msg01385.html
So I have to use gcov !
samedi 28 juillet 2007
mardi 17 juillet 2007
MPlayer H.264 profiling (further ...)
Some interesting links
profiling line by line
annotated source listing
So, let's do it !
... working ...
gcc doesn't understand the "-a" option
Here is the new gprof command line to test :
gprof --line \
--annotated-source=h264.c:decode_residual \
--separate-files \
--directory-path=~/local/src/mplayer/libavcodec \
--flat-profile --no-graph \
~/local/bin/mplayer \
*.gmon > sumup.prof
Here is the head of the command result in sumup.prof
So, what's appears ?
mmmh everything is mixed, not just the decode_residual function as wanted, and the last gprof command consumes time...
Also just the function name line is commented and not all basic blocs as desired.
Something is missing ...
... thinking ...
rtfm gcc
does the -a option disappears in gcc-4 ?
is it replaced by -fprofile-arcs ? :/
... mmmh, no it doesn't seems.
profiling line by line
annotated source listing
So, let's do it !
... working ...
gcc doesn't understand the "-a" option
Here is the new gprof command line to test :
gprof --line \
--annotated-source=h264.c:decode_residual \
--separate-files \
--directory-path=~/local/src/mplayer/libavcodec \
--flat-profile --no-graph \
~/local/bin/mplayer \
*.gmon > sumup.prof
Here is the head of the command result in sumup.prof
So, what's appears ?
mmmh everything is mixed, not just the decode_residual function as wanted, and the last gprof command consumes time...
Also just the function name line is commented and not all basic blocs as desired.
Something is missing ...
... thinking ...
rtfm gcc
does the -a option disappears in gcc-4 ?
is it replaced by -fprofile-arcs ? :/
... mmmh, no it doesn't seems.
lundi 16 juillet 2007
script to sum up profiling result not needed.
rtfm gprof
or http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC4
or http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC4
You can give more than one profile data file by entering all their names after the executable file name; then the statistics in all the data files are summed together.
No, we'll just need to find a representive set of videos.for f in *.mov
do
mplayer $f
mv gmon.out {$f%.mov}.gmon
done
gprof ~/local/bin/mplayer *.gmon > sumup.prof
MPlayer H.264 profiling (continue ...)
Read the previous article
So the MPlayer H.264 first candidate function for HW acceleration (i.e.: the most time consuming procedure) is decode_residual (in libavcodec/h264.c part of FFmpeg). [geeknote: I use emacs with etags to browse and jump to source code].
Well, after analyzing the source code of this function, I've seen that there is in it a lots of inline procedures call. So these inline procedures are more likely the code parts to be HW accelerated but the profiling did not count the time consumed in theses procedures since there are inlined. Let's go further in the profiling using some more gcc compiling options
i.e.: --fno-inline.
before removing inlining we had no detail of the decode_residual subfunctions.
after, there is more details:
Profiling extract of the "Ducaty" trailer decompression (without inlined function).
Finally, it seems the inlined procedures does not represent so much of the decode_residual function. For the "1408" movie trailer, the total execution time of decode_residual calls is 8.19 s within 2.10 s in the de-inlined functions.
For the "Ducaty" trailer the the total execution time of decode_residual calls is 2.2 s, within 1.1 s in the de-inlined functions.
So the MPlayer H.264 first candidate function for HW acceleration (i.e.: the most time consuming procedure) is decode_residual (in libavcodec/h264.c part of FFmpeg). [geeknote: I use emacs with etags to browse and jump to source code].
Well, after analyzing the source code of this function, I've seen that there is in it a lots of inline procedures call. So these inline procedures are more likely the code parts to be HW accelerated but the profiling did not count the time consumed in theses procedures since there are inlined. Let's go further in the profiling using some more gcc compiling options
i.e.: --fno-inline.
sed s/OPTFLAGS.*/"& -pg -fno-inline"/ config.mak > tmpwe run mplayer on the "1408" movie trailer and then analyze the profiling.
mv tmp config.mak
make clean && make && make install
before removing inlining we had no detail of the decode_residual subfunctions.
after, there is more details:
Profiling extract of the "Ducaty" trailer decompression (without inlined function).
Finally, it seems the inlined procedures does not represent so much of the decode_residual function. For the "1408" movie trailer, the total execution time of decode_residual calls is 8.19 s within 2.10 s in the de-inlined functions.
For the "Ducaty" trailer the the total execution time of decode_residual calls is 2.2 s, within 1.1 s in the de-inlined functions.
lundi 9 juillet 2007
animated pdf figure : beamer + xfig + mpost/mptopdf
Minitutorial
This is a summary of usefull information I have found on the web and some tricks I've found myself.
You can have ave a look at http://www.xfig.org/userman/latex_and_xfig.html
(Xfig and pdf / section 4)
If your animation will just consists of successives layers of a figure you can
defines these layers with xfig provided that you will export the figure in the multimetapost format (.mmp files).
It is not so easy, to define the differents layers. With xfig you have to set the depths of your elements. Elements of consecutive depths will be parts of the same layer. Layers of deapest elements will be showned first.
Then when you have generated the .mmp file you may need to edit it.
For example I have had some latex code just for the accents.
In beamer you have to deactivate the covered feature if you normally use it.
For example:
In this way the number of tens indicate the number of the layer of the element, since you will probably not need more than 9 deapths for any layer (a diagram basically consists of a background shape, text, arrows, and eventually a foreground shape).
That's it.
It's just like a memo for me, but in the same time hope it could help ...
Don't hesitate to give feedback.
Today I prefer to use tikz with overlays and pause to do such animations in pdf documents generated by latex. It's more geeky, more hacker's tool !
I'll may post some example of what I have done with it later here.
update 15/08/2007
Tomorrow (another day :), I will test the animate package of Alexander Grahn
http://tug.ctan.org/macros/latex/contrib/animate/doc/animate.pdf
that auto-animate sets of graphics files or inline graphics.
Even il you don't need automatic animation, one feature of the package is that the animation running remains on the same pdf page whereas without this package animation like explain in this article creates various pages.
One drawback is that animation created with this package only can be seen with Adobe Reader or Acrobat.
Nicogeek
This is a summary of usefull information I have found on the web and some tricks I've found myself.
You can have ave a look at http://www.xfig.org/userman/latex_and_xfig.html
(Xfig and pdf / section 4)
If your animation will just consists of successives layers of a figure you can
defines these layers with xfig provided that you will export the figure in the multimetapost format (.mmp files).
It is not so easy, to define the differents layers. With xfig you have to set the depths of your elements. Elements of consecutive depths will be parts of the same layer. Layers of deapest elements will be showned first.
Then when you have generated the .mmp file you may need to edit it.
For example I have had some latex code just for the accents.
% +MP-ADDITIONAL-HEADERthen use mpost + mptopdf or just mptopdf that whill call mpost (I had problem using just mptopdf)
verbatimtex
%&latex
\documentclass{article}
\usepackage[french]{babel}
\usepackage[latin9]{inputenc}
\usepackage[cyr]{aeguill}
\usepackage[T1]{fontenc}
\begin{document}
etex
% -MP-ADDITIONAL-HEADER
mpost -tex=latex myfile.mmpor just
mptopdf myfile.*
mptopdf --latex myfile.mmp
In beamer you have to deactivate the covered feature if you normally use it.
For example:
\documentclass{beamer}With xfig, I will suggest you first define the number of layers you want for your figure, let's say ten. Then set to 91,92,..,99 the depth of the elements that you want to be in the first layer; 81,82,..,89 the depth of the elements that you want to be in the second layer; and so on to 1,2,..9 the depth of the elements you want to be in the last layer.
%\usepackage ...
\usepackage{xmpmulti}
% \DeclareGraphicsRule{*}{mps}{*}{}
\begin{document}
\setbeamercovered{transparent}
% some frames with transparent
\setbeamercovered{invisible} % deactivate transparent covered mode
\begin{frame}
\frametitle{my frame with the pdf animated figure}
\multiinclude[graphics={width=\textwidth},format=pdf]{figures/myfile}
\end{frame}
\setbeamercovered{transparent} % back to transparent covered mode
In this way the number of tens indicate the number of the layer of the element, since you will probably not need more than 9 deapths for any layer (a diagram basically consists of a background shape, text, arrows, and eventually a foreground shape).
That's it.
It's just like a memo for me, but in the same time hope it could help ...
Don't hesitate to give feedback.
Today I prefer to use tikz with overlays and pause to do such animations in pdf documents generated by latex. It's more geeky, more hacker's tool !
I'll may post some example of what I have done with it later here.
update 15/08/2007
Tomorrow (another day :), I will test the animate package of Alexander Grahn
http://tug.ctan.org/macros
that auto-animate sets of graphics files or inline graphics.
Even il you don't need automatic animation, one feature of the package is that the animation running remains on the same pdf page whereas without this package animation like explain in this article creates various pages.
One drawback is that animation created with this package only can be seen with Adobe Reader or Acrobat.
Nicogeek
vendredi 6 juillet 2007
MPlayer H.264 profiling (update)
The article "MPlayer H.264 profiling" have been updated.
click here to jump to it.
click here to jump to it.
mardi 3 juillet 2007
MPlayer H.264 profiling
#define HW HardWare
alias rtfm=man
Well, I do not know what the --enable-profile options really does, but I succeed profiling the application like this:
I do profiling on H.264 full HD (1080p) videos because, before designing a full HW decoder chip, I want to start with an HW accelerator for MPlayer that could be mapped on a FPGA, so I need to know which part of the code should be first mapped to HW.
Now I have some results on film trailers (full HD) downloaded on the quicktime web page http://www.apple.com/trailers/.
Here how to do.
1. download a set of films for benchmark in let's say ~/HDV
cd ~/HDV
2. view all the videos in native resolution (even if your screen do not support full HD resolution, since your video card does)
3. a gmon.out file will be generated each time mplayer finish displaying the video, it is used as entry for gprof (rtfm gprof).
I think about writing a script to sum up all the benchmark results in a human readeable table guarding for example the 10 more time consuming procedures.
I will update this article with the script and table when done (if I do).
update 06/07/2007
Waiting for the script ? Here is just some results sumary.
(FIXME: But the output in this blog looks horrible.
Currently I don't know how to hack it to control the end of lines and multiple spaces, it might be just html code to add :)
nicolas@iBook-Nicolas$ head -15 1408.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
18.08 9.78 9.78 205612091 0.00 0.00 decode_residual
12.10 16.32 6.54 21891240 0.00 0.00 decode_mb_cavlc
8.90 21.13 4.81 5875112 0.00 0.00 fast_memcpy
7.69 25.29 4.16 21891240 0.00 0.00 hl_decode_mb
6.52 28.81 3.53 21891240 0.00 0.00 fill_caches
4.11 31.03 2.22 16343918 0.00 0.00 put_h264_qpel8or16_v_lowpass_mmx2
3.38 32.86 1.83 18682094 0.00 0.00 put_h264_qpel8_h_lowpass_l2_mmx2
3.18 34.58 1.72 4928113 0.00 0.00 put_h264_qpel8or16_hv_lowpass_mmx2
2.98 36.19 1.61 27761182 0.00 0.00 put_h264_chroma_mc8_mmx
2.66 37.63 1.44 75903672 0.00 0.00 ff_h264_idct_add_mmx
nicolas@iBook-Nicolas$ head -14 xmanIII.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
10.59 5.61 5.61 19396320 0.00 0.00 hl_decode_mb
7.72 9.70 4.09 5171328 0.00 0.00 fast_memcpy
6.38 13.08 3.38 38792640 0.00 0.00 fill_caches
5.93 16.22 3.14 19396320 0.00 0.00 decode_mb_cabac
4.13 18.41 2.19 11993739 0.00 0.00 decode_mb_skip
3.85 20.45 2.04 20047404 0.00 0.00 filter_mb_edgeh
3.73 22.43 1.98 20045573 0.00 0.00 filter_mb_edgev
3.42 24.24 1.81 14893574 0.00 0.00 h264_h_loop_filter_luma_mmx2
2.76 25.70 1.46 12726619 0.00 0.00 put_pixels16_mmx
nicolas@iBook-Nicolas$ head -14 The\ Bourne\ Ultimatum.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
17.58 5.49 5.49 101873518 0.00 0.00 decode_residual
9.87 8.57 3.08 13078440 0.00 0.00 decode_mb_cavlc
8.58 11.25 2.68 13078440 0.00 0.00 hl_decode_mb
8.46 13.89 2.64 3508832 0.00 0.00 fast_memcpy
6.82 16.02 2.13 13727160 0.00 0.00 fill_caches
3.80 17.21 1.19 7748360 0.00 0.00 put_h264_qpel8or16_v_lowpass_mmx2
2.98 18.14 0.93 12297988 0.00 0.00 mc_part
2.66 18.97 0.83 45176732 0.00 0.00 ff_h264_idct_add_mmx
2.53 19.76 0.79 9036911 0.00 0.00 put_h264_qpel8_h_lowpass_l2_mmx2
nicolas@iBook-Nicolas$
alias rtfm=man
Well, I do not know what the --enable-profile options really does, but I succeed profiling the application like this:
#!/bin/sh
mkdir -p ~/local/src
svn checkout svn://svn.mplayerhq.hu/mplayer/trunk/\
~/local/src/mplayer
cd ~/local/src/mplayer
./configure --prefix=$HOME/local --disable-mencoder\
--enable-debug --enable-profile --extra-libs=-pg
# the -pg option is for profiling with gprof
# -pg should also be added to the CFLAGS (or OPTFLAGS).
# As I did not find such an option in the configure script,
# I modify the generated config.mak file like this:
sed s/"OPTFLAGS = "/"OPTFLAGS = -pg " config.mak > tmp
mv tmp config.mak
make
make install
I do profiling on H.264 full HD (1080p) videos because, before designing a full HW decoder chip, I want to start with an HW accelerator for MPlayer that could be mapped on a FPGA, so I need to know which part of the code should be first mapped to HW.
Now I have some results on film trailers (full HD) downloaded on the quicktime web page http://www.apple.com/trailers/.
Here how to do.
1. download a set of films for benchmark in let's say ~/HDV
cd ~/HDV
2. view all the videos in native resolution (even if your screen do not support full HD resolution, since your video card does)
3. a gmon.out file will be generated each time mplayer finish displaying the video, it is used as entry for gprof (rtfm gprof).
#!/bin/sh
mkdir -p profiling
for f in *.mov
do
~/local/bin/mplayer $f
gprof ~/local/bin/mplayer > profiling/${f%.mov}.prof
rm gmon.out
done
I think about writing a script to sum up all the benchmark results in a human readeable table guarding for example the 10 more time consuming procedures.
I will update this article with the script and table when done (if I do).
update 06/07/2007
Waiting for the script ? Here is just some results sumary.
(FIXME: But the output in this blog looks horrible.
Currently I don't know how to hack it to control the end of lines and multiple spaces, it might be just html code to add :)
nicolas@iBook-Nicolas$ head -15 1408.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
18.08 9.78 9.78 205612091 0.00 0.00 decode_residual
12.10 16.32 6.54 21891240 0.00 0.00 decode_mb_cavlc
8.90 21.13 4.81 5875112 0.00 0.00 fast_memcpy
7.69 25.29 4.16 21891240 0.00 0.00 hl_decode_mb
6.52 28.81 3.53 21891240 0.00 0.00 fill_caches
4.11 31.03 2.22 16343918 0.00 0.00 put_h264_qpel8or16_v_lowpass_mmx2
3.38 32.86 1.83 18682094 0.00 0.00 put_h264_qpel8_h_lowpass_l2_mmx2
3.18 34.58 1.72 4928113 0.00 0.00 put_h264_qpel8or16_hv_lowpass_mmx2
2.98 36.19 1.61 27761182 0.00 0.00 put_h264_chroma_mc8_mmx
2.66 37.63 1.44 75903672 0.00 0.00 ff_h264_idct_add_mmx
nicolas@iBook-Nicolas$ head -14 xmanIII.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
10.59 5.61 5.61 19396320 0.00 0.00 hl_decode_mb
7.72 9.70 4.09 5171328 0.00 0.00 fast_memcpy
6.38 13.08 3.38 38792640 0.00 0.00 fill_caches
5.93 16.22 3.14 19396320 0.00 0.00 decode_mb_cabac
4.13 18.41 2.19 11993739 0.00 0.00 decode_mb_skip
3.85 20.45 2.04 20047404 0.00 0.00 filter_mb_edgeh
3.73 22.43 1.98 20045573 0.00 0.00 filter_mb_edgev
3.42 24.24 1.81 14893574 0.00 0.00 h264_h_loop_filter_luma_mmx2
2.76 25.70 1.46 12726619 0.00 0.00 put_pixels16_mmx
nicolas@iBook-Nicolas$ head -14 The\ Bourne\ Ultimatum.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
17.58 5.49 5.49 101873518 0.00 0.00 decode_residual
9.87 8.57 3.08 13078440 0.00 0.00 decode_mb_cavlc
8.58 11.25 2.68 13078440 0.00 0.00 hl_decode_mb
8.46 13.89 2.64 3508832 0.00 0.00 fast_memcpy
6.82 16.02 2.13 13727160 0.00 0.00 fill_caches
3.80 17.21 1.19 7748360 0.00 0.00 put_h264_qpel8or16_v_lowpass_mmx2
2.98 18.14 0.93 12297988 0.00 0.00 mc_part
2.66 18.97 0.83 45176732 0.00 0.00 ff_h264_idct_add_mmx
2.53 19.76 0.79 9036911 0.00 0.00 put_h264_qpel8_h_lowpass_l2_mmx2
nicolas@iBook-Nicolas$
lundi 2 juillet 2007
Geek blog 4 geek
Je crée là maintenant un autre blog pour y écrire des choses plus en rapport avec mon (non-)travail.
Inscription à :
Articles (Atom)