#define HW HardWare
alias rtfm=man
Well, I do not know what the --enable-profile options really does, but I succeed profiling the application like this:
#!/bin/sh
mkdir -p ~/local/src
svn checkout svn://svn.mplayerhq.hu/mplayer/trunk/\
~/local/src/mplayer
cd ~/local/src/mplayer
./configure --prefix=$HOME/local --disable-mencoder\
--enable-debug --enable-profile --extra-libs=-pg
# the -pg option is for profiling with gprof
# -pg should also be added to the CFLAGS (or OPTFLAGS).
# As I did not find such an option in the configure script,
# I modify the generated config.mak file like this:
sed s/"OPTFLAGS = "/"OPTFLAGS = -pg " config.mak > tmp
mv tmp config.mak
make
make install
I do profiling on H.264 full HD (1080p) videos because, before designing a full HW decoder chip, I want to start with an HW accelerator for MPlayer that could be mapped on a FPGA, so I need to know which part of the code should be first mapped to HW.
Now I have some results on film trailers (full HD) downloaded on the quicktime web page http://www.apple.com/trailers/.
Here how to do.
1. download a set of films for benchmark in let's say ~/HDV
cd ~/HDV
2. view all the videos in native resolution (even if your screen do not support full HD resolution, since your video card does)
3. a gmon.out file will be generated each time mplayer finish displaying the video, it is used as entry for gprof (rtfm gprof).
#!/bin/sh
mkdir -p profiling
for f in *.mov
do
~/local/bin/mplayer $f
gprof ~/local/bin/mplayer > profiling/${f%.mov}.prof
rm gmon.out
done
I think about writing a script to sum up all the benchmark results in a human readeable table guarding for example the 10 more time consuming procedures.
I will update this article with the script and table when done (if I do).
update 06/07/2007
Waiting for the script ? Here is just some results sumary.
(FIXME: But the output in this blog looks horrible.
Currently I don't know how to hack it to control the end of lines and multiple spaces, it might be just html code to add :)
nicolas@iBook-Nicolas$ head -15 1408.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
18.08 9.78 9.78 205612091 0.00 0.00 decode_residual
12.10 16.32 6.54 21891240 0.00 0.00 decode_mb_cavlc
8.90 21.13 4.81 5875112 0.00 0.00 fast_memcpy
7.69 25.29 4.16 21891240 0.00 0.00 hl_decode_mb
6.52 28.81 3.53 21891240 0.00 0.00 fill_caches
4.11 31.03 2.22 16343918 0.00 0.00 put_h264_qpel8or16_v_lowpass_mmx2
3.38 32.86 1.83 18682094 0.00 0.00 put_h264_qpel8_h_lowpass_l2_mmx2
3.18 34.58 1.72 4928113 0.00 0.00 put_h264_qpel8or16_hv_lowpass_mmx2
2.98 36.19 1.61 27761182 0.00 0.00 put_h264_chroma_mc8_mmx
2.66 37.63 1.44 75903672 0.00 0.00 ff_h264_idct_add_mmx
nicolas@iBook-Nicolas$ head -14 xmanIII.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
10.59 5.61 5.61 19396320 0.00 0.00 hl_decode_mb
7.72 9.70 4.09 5171328 0.00 0.00 fast_memcpy
6.38 13.08 3.38 38792640 0.00 0.00 fill_caches
5.93 16.22 3.14 19396320 0.00 0.00 decode_mb_cabac
4.13 18.41 2.19 11993739 0.00 0.00 decode_mb_skip
3.85 20.45 2.04 20047404 0.00 0.00 filter_mb_edgeh
3.73 22.43 1.98 20045573 0.00 0.00 filter_mb_edgev
3.42 24.24 1.81 14893574 0.00 0.00 h264_h_loop_filter_luma_mmx2
2.76 25.70 1.46 12726619 0.00 0.00 put_pixels16_mmx
nicolas@iBook-Nicolas$ head -14 The\ Bourne\ Ultimatum.prof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
17.58 5.49 5.49 101873518 0.00 0.00 decode_residual
9.87 8.57 3.08 13078440 0.00 0.00 decode_mb_cavlc
8.58 11.25 2.68 13078440 0.00 0.00 hl_decode_mb
8.46 13.89 2.64 3508832 0.00 0.00 fast_memcpy
6.82 16.02 2.13 13727160 0.00 0.00 fill_caches
3.80 17.21 1.19 7748360 0.00 0.00 put_h264_qpel8or16_v_lowpass_mmx2
2.98 18.14 0.93 12297988 0.00 0.00 mc_part
2.66 18.97 0.83 45176732 0.00 0.00 ff_h264_idct_add_mmx
2.53 19.76 0.79 9036911 0.00 0.00 put_h264_qpel8_h_lowpass_l2_mmx2
nicolas@iBook-Nicolas$