Macroblocks and variable-sized block matching

A macroblock consists of a 4x4 array of blocks, and there are three possible ways of splitting a MB:

Splitting level 0: no split, a single MV per reference frame for the MB;

Splitting level 1: split into four sub-macroblocks (sub-MBs), each a 2x2 array of blocks, one MV per reference frame per sub-MB;

Splitting level 2: split into the 16 constituent blocks.

Figure: Macroblock splitting modes

The splitting mode is chosen by redoing motion estimation for the sub-MBs and the MB as a whole, again using the RDO metric , suitably scaled to take into account the different sizes of the blocks. At the same time, the best prediction mode for each prediction unit (block, sub-MB or MB) is chosen. Four prediction modes are available:

INTRA: intra coded, predicted by DC value;

REF1_ONLY: only predict from the first reference;

REF2_ONLY: only predict from the second reference (if one exists);

REF1AND2: bi-directional prediction.

A further complication is that mode data itself incurs a cost in bit-rate. So a further MB parameter is defined, which records whether a common block prediction mode is to be used for the MB. If so, then each prediction unit will have the same mode, and it is only necessary to record the mode once for that MB. Otherwise, all the prediction modes may be different.

Of course if the splitting level is 0, then the MB consists of a single prediction unit in any case, and so there is no need to specify whether there is a common mode or not.

The result is a hierarchy of parameters: the splitting level determines whether there needs to be a common mode parameter or not; the MB parameters together determine what modes need to be transmitted; and the modes for each prediction unit themselves determines what motion vectors and block DC values (in the case of INTRA) need to be present.

In motion estimation, an overall cost for each MB is computed, and compared for each legal combination of these parameters. This is a tricky operation, and has a very significant effect on performance. The decisions interact very heavily with those made in coding the wavelet coefficients of the resulting residuals, and the best results probably depend on picture material, bit rate, the block size and its relationship to the size of the video frames, and the degree of perceptual weighting used in selecting quantisers for wavelet coefficients. Parameters for controlling the mode decision have been selected essentially using the tried and tested engineering process of a wet finger in the air, and we know there are some sequences where this works poorly. As Dirac develops we would like to institute more systematic adaptive technologies, which should substantially improve performance.

Choice of block sizes

Dirac can use any block sizes, by ensuring that the input frames are padded so that an integral number of macroblocks can fit both horizontally and vertically. The padding is by edge values and is applied to the right-hand side and bottom of the frames. Sometimes, additional padding is necessary so that the wavelet transform can be applied. In this case the frames are padded by both amounts, but the number of blocks is not increased to cover the transform padding area since the data here is not displayed and can be set to zero after motion compensation.

As an example, consider a picture of width 100 pixels, with horizontal block separation set to be 10 pixels. Then the picture must be padded to 120 pixels to give 3 full macroblocks horizontally. To apply a 4-level wavelet transform, the picture must be further padded to 128 pixels, but the number of macroblocks is not also increased. Motion compensation therefore covers all the original picture area but not the fully padded picture area.

Having said that Dirac is entirely flexible in terms of block sizes, choosing poor block sizes will introduce overhead through the padding process.

Blocks parameters do have to meet some constraints, however, so that the overlapping process works properly, especially in conjunction with subsampled chroma components (for which the blocks will be correspondingly smaller). For example, the block separations and corresponding lengths must differ by a multiple of two, so that overlap is symmetric. Normally this is enforced by the encoder, which may recompute unsatisfactory block parameters.

Next: Block Data

Table of contents Back to Macroblock structures and motion vector data