Making a H.264 stream instantly with progressive download

Here’s a tip people has asked me about quite a few time concerning playback of H.264 videos using progressive download:
Why do I have to download the whole video file before I can start playback or skip in the video file?

Well, when you serve videos on a site using plain HTTP – known as progressive download – the position of the header becomes very important.
Either the header is placed in the beginning of the file or it’s places in the end of the file. In case of the latter, you’ll have to download the whole thing before you can begin playback – because without the header, the player can’t start decoding.
When you transcode H.264 files in a MP4 container using FFmpeg, the header will be placed in the end and needs to be moved.
So, how do we move the header from the end of the file to the beginning of the file?

Well, I always transcode the video part with FFmpeg and the audio part with NeroAAC and mux the two parts together using Mp4Box. Mp4Box places the header in the beginning of the file – and all is good. If you have files that needs the header moved, you can use the small tool Qt-faststart. You can find Qt-faststart for Windows here.
Recent developement in FFmpeg makes it possible to have FFmpeg move the header to the front of the file using the -movflags faststart option, but it can be a rather slow choice. Here’s how you would do it:

ffmpeg -i input.mp4 -c:a copy -c:v copy -movflags +faststart output.mp4

For further reading, including how to check if your header is placed correctly in regards to progressive download –  you can read this article from Adobe.

 

If you stream using a streaming server or using HTTP Live Streaming (HLS), the position of the header shouldn’t matter. There might be a performance boost in having the header in the front of the file – but this I haven’t tested, but if any of you readers has further info, please don’t be shy – let us know by leaving a comment…

/Fred

Merging VOB files

A couple of years ago I was faced with the challenge of transcoding a DVD into an optimized H.264 file for usage on the web. First of all I had to remove the copy protection using DVD Decrypter. Now I had a bunch of VOB (Video Object) files. A DVD movie is one continuously MPEG-2 file in a transport stream. But for simplicity the VOB has a max limit of 1 Gb, so before I could input the VOB files into FFmpeg, I had to find a way to merge the VOB files into one file. Thankfully this is easy, all you have to do is to merge the files – and you can use Windows’ copy function for this – as long as you remember to set the binary switch (/b):

copy /b vts_01_2.vob + vts_01_3.vob + vts_01_4.vob + vts_01_5.vob mergedfile.vob

I omitted the first file vts_01_1.vob since it contained the menu.

If you’re into GUI you can also go for DVD Merge.

Optimal H.264 encoding for Flash and HTML5

During the years I have been in involved in broadcasting and web developement, I have done a lot of trial and error regarding encoding for the web. I have always used FFmpeg for the video part of the encoding, and recently improvements in FFmpeg has done the syntax much more straightforward, so let me show you how to make the best possible encoding when wanting to playback video on the web using HTML5/Flash:

First of all – find out what resolution you want to target. If you’re advanced, you will want to encode multiple resolutions so the server or the user can choose whichever resolution is best suited. However, in some cases it’s too complicated, so finding one target size that represents the best compromise, is often preferable.
Finding the right resolution is a matter of finding the right balance between visual quality and performance – the more pixels, the harder it is for the computer to decode and display the signal. My choice is 768 x 432 pixels in 1 megabit (plus audio). It gives a descent image – not far away from DVD quality, and just about all computers will display it without stuttering and without dropping frames. We will encode the audio so well, that it will be hard to tell it apart from the original; which will actually make the viewer perceive the image quality as being better than it is (I guess this report shows my point).

1-pass or 2-pass encoding?
So, when encoding we’re left with two choices: 1-pass or 2-pass encoding. 2-pass encoding is the obvious choice if you plan to stream your signal using a stream server like: Adobe FMS or Wowza using a protocol like RTMP. A 2 pass encoding ensures, that the stream has a fixed bitrate – but without the artifacts and drawbacks known from CBR (Constant Bitrate). This is only relevant when using a real stream server, since it makes it easier to do load balancing cause you always know how many streams each server can handle – which is usually limited to the network card in the server.

If you just plan to stream using a regular webserver – also known as progressive download, then you’re better of using 1-pass encoding since 1-pass is faster to encode (like 40% faster) and gives you a better visual quality in the same amount of data.

Let’s transcode!Now it’s time for the actual transcoding using FFmpeg. What I do is as follows:

#1 – decode the audio of the input video file to wav (uncompressed).
#2 – encode the wav file to AAC using Nero AAC.
#3 – encode  the video using FFmpeg
#4 – mux (combine) the video and audio together using MP4Box.

Update! Since FFmpeg now offers a good AAC encoder, it’s no longer necessary to use Nero’s AAC encoder. Thus you can skip step 1, 2 and 4 and go straight to step 3. All you have to do is to skip the -an parameter. Nero would however still be my choice.

I use Windows and the following works great:

Video encoding (size 768×432 pixels, 1 megabit. -tune film is the default, use -tune animation for non-film inputs):

Video encoding using 2-pass:
Pass 1:

ffmpeg -i “input.mov” -vf scale=768:432 -pass 1 -sws_flags lanczos -vcodec libx264 -preset slow -tune film -y -an -b:v 1000k -bufsize 2000k -f rawvideo NUL
ffmpeg -i “input.mov” -vf scale=768:432 -pass 2 -sws_flags lanczos -vcodec libx264 -preset slow -tune film -y -an -b:v 1000k -bufsize 2000k -f mp4 temp.mp4

Video encoding using 1-pass (the rest of the steps are the same for both 1-pass and 2-pass encoding):
ffmpeg -i “input.mov” -vf scale=768:432 -vcodec libx264 -preset slow -tune film -y -an -crf 22 temp.mp4

Audio encoding:
ffmpeg -i “input.mov” -y -ac 2 -f wav temp.wav
neroAacEnc -q 0.35 -if temp.wav -of temp.m4a

Muxing together:
mp4box -add temp.m4a#audio “out.mp4″
mp4box -add temp.mp4#video “out.mp4″

Voila! You have the best H.264 encoding in town!

Notice that I use “-f rawvideo NUL” for my first pass. This tells FFmpeg not to output an output file since all we want to do is to build a stat file for the second pass. This speeds up the first pass a bit. Also notice the -an parameters which tells FFmpeg not to encode audio since we do that with Nero instead – again a minor performance gain.

A great bonus of using MP4Box is that it places the moov atom in the beginning of the file. This causes the file to play immediately when served using progressive download. FFmpeg on the other hand, places the moov atom in the end of the file – hence you have to download the whole file before being able to start it, because only the moov atom can tell the player how to interprete the H.264 file. If you want to know the deeper explanation behind this, you can get it here and here.
(Update august 2013: FFmpeg now has some support for faststart)
When doing the single pass video encoding, we use the CRF parameter instead of a fixed data rate. CRF means constant quality mode also known as constant ratefactor and denotes the quality of the encoding. You assign the CRF parameter a number between 15 (best) and 31 (worst) – and use decimals if you like. I often use 22, which gives a fairly small file size while maintaining a great visual quality.

Why not let FFmpeg encode to AAC? Well, in short, because FFmpeg is lacking a good AAC encoder – but more on this issue in a later blog post. Update! This has now changed and the built in AAC encoder is comparable with the Nero AAC encoder.

ProRes support in FFmpeg – I love it!

FFmpeg added support for ProRes in october 2011 – which made my life a little easier. The added support for ProRes closed a big gap for me as I quite often get exports in ProRes and need to transcode it into something more lossy like H.264 for usage on the web.
FFmpegs support for ProRes will also lead VLC in supporting ProRes in their upcoming 1.2 release. Great!
(Update: It was later renamed to version 2.0 and was released in february 2012.)

A quick note on how to trancode to Apple ProRes (often denoted APCN) using a recent build of FFmpeg:

ffmpeg -i input.mov -vcodec prores -profile:v NUMBER  output.mov

For different flavors of ProRes replace NUMBER with a number from zero to 3 where:
0 : ProRes422 (Proxy)
1 : ProRes422 (LT)
2 : ProRes422 (Normal)
3 : ProRes422 (HQ)

The following was taken from FFmbc’s wiki site:

The encoder behave differently based on 3 options:
-qscale < value > or -cqp < value >
Specify a fixed quantizer that will be used for every frame. This is a VBR encoding method.

If bitrate is not specified, the bitrate will be automatically chosen based on video resolution and will be similar to the reference encoder for the same profile.
-b < bitrate >
Specify a approximately constant bit rate to use during encoding.
444 encoding: add -pix_fmt yuv444p10 to your commandline options.

Update: ProRes 444 doesn’t seem to work, but people are working on a patch as learned from this thread:
http://ffmpeg.org/pipermail/ffmpeg-user/2012-September/009521.html
I must admit I haven’t used the codec lately, but people write that Final Cut Pro often gives the warning that ProRes files made with FFmpeg, are not optimized for FCP. All that means is, that the file wasn’t compressed using FCP but the file should work fine.

If you use the ProRes encoder, you might want to read this blogpost by the author Kostya.

Careful with audio resampling using FFmpeg

In my line of work transcoding videos for dr.dk/pirattv i use FFmpeg extensively. I have written a tool in C# that automates this task and in doing so, i discovered that FFmpeg is not a good choice for downsampling audio. The downsampling does not suffer from aliasing because the signal is properly attenuated at the Nyquist frequency – but the quality of this lowpass filter is terrible. The problem lies with the lowpass filter which is by no means steep enough so it unnecessarily cuts a lot of high frequencies fairly far from the Nyquist frequency. The only good thing about the filter, is that the resampling is really fast. I have not found a setting in FFmpeg that forces it to use a better filter so i wanted to find a better way:

From looking at this awesome site i learned that one of the very best resamplers around: SSRC – is free and even open source! So now i treat the audio separately from the video and do all downsampling using SSRC which preserves the treble which is indeed audible when transcoding music.

Original sweep
A 96 kHz sweep taken from infinitewave visualized using Audacity.

This is the same sweep downsampled to 44.1 kHz using FFmpeg.
Notice the server attentuation already at 17-18 kHz

Here the same downsampling is done using SSRC. The filter is extremely steep as we want it to be. In fact so steep that you can”t even see the attentuation right before Nyquist – but it’s there!
Notice that the y-axis is different than that on the FFmpeg version.

Update: I ended up using SoX, because I needed more than just great resampling. SSRC is in theory slightly better but the difference is inaudible to me and SoX provides me with an array of additional possibilities.

Update II – FFmpeg now includes the SoX resampler – but only utilizes it if you tells it to – the default resampler is still the simple one with its limitations.
To use the better resampler from SoX, eg resampling to 44.1 kHz, add the following to your command line:

-af aresample=resampler=soxr -ar 44100