This is a simple introduction to the technology of Dirac. It is hopefully of use to those who are new to the subject, or just want a simple overview. The main information for developers is on the SourceForge website (SourceForge is one of the leading development sites for Open Source software).
- a compression specification for the bytestream and the decoder
- software for compression and decompression
- algorithms designed to support simple and efficient hardware implementations
Unlike many of the other standard video compression systems, the software is not intended simply to provide reference coding and decoding, but also as a prototype implementation that can freely be modified, enhanced and deployed. The decoder implementation in particular is designed to provide fast decoding whilst remaining portable across software platforms.
Real time decoding of modern compression systems is difficult without extensively exploiting hardware support (in coprocessors and video cards) or assembly-language code, but these features can easily be added to Dirac’s modular codebase.
Architecture
Dirac is similar to many of the established video coding systems. However we have adopted established technologies which combine effectiveness, efficiency, and simplicity. Together, these features give us a quality system which is not encumbered with patents.
First we use motion compensation to make use of the correlation between picture frames. Good motion compensation can dramatically reduce the amount of data required to code a picture.
Then we use wavelets (not the more conventional DCT) to transform the residual error signal.
Motion Compensation
In Dirac, frames have two essential properties. Firstly, they are either predicted from other frames (Inter) or not (Intra). Secondly they can be used to predict other frames (Reference) or not (Non-reference). All combinations of these properties are possible, and any Inter frame can be predicted from up to two reference frames. This means that Dirac can support conventional MPEG-style structures (Group of Pictures or GOP), but also any other prediction structure that may give better performance.
When we get down to pixel level, we can define the reference pixels for motion compensation, either at the global level (through pan, tilt, zoom, rotate etc commands) or by reference to pixels chosen by calculating specific motion vectors for the local block of pixels.
Transform coding
Wavelets have commonly been used for still image compression (a recent example is the core of JPEG2000). Now the power of modern chips allows us to use wavelets for motion pictures. The wavelet transform repeatedly filters the signals into low and high frequency parts. This repeated split concentrates the important data in one subband which can be efficiently encoded. We apply different degrees of quantisation to the transformed data. The human eye appears to be insensitive to coarse quantisation in some of the higher wavelet bands, and we exploit this ruthlessly to achieve high compression efficiency.
One of the weaknesses of MPEG-2 is the way that the picture goes all blocky when the coder is being worked hard. The use of the Discrete Cosine Transform (DCT) to transform the residual error limits the flexibility of the blocks used in the processing. By using wavelets, we can use varying sizes of blocks, and overlap them to mitigate the impact of block edges. This block structure also results in better motion predictions, again yielding improved compression.
Entropy coding
The transformed data still has redundancy. Entropy coding is used to reduce the bandwidth. The entropy coding technique used in Dirac is arithmetic coding. This is efficient and flexible. Arithmetic coding separates statistical modelling from the compression process itself, and better compression is afforded when the inter-dependence of data is exploited by switching between models based on previously-coded data.
Dirac applies entropy coding to the motion vectors and the output of the wavelet transform process.
Bytestream
The whole of the compressed data is packaged in a simple bytestream. This has synchronisation, permitting access to any frame quickly and efficiently – making editing simple. The structure is such that the whole bytestream can be packaged in many of the existing transport streams, such as MPEG, MXF, IP, Ogg, etc.
This feature means that we are able to use a wide range of sound coding options, as well as easy access to all the other data transport systems required for production or broadcast metadata.
SourceForge – http://dirac.sourceforge.net/index.html
Schrodinger – http://schrodinger.sourceforge.net/