Wavelet Coefficient Coding

The entropy coding used by Dirac in wavelet subband coefficient coding is based on three stages: binarisation, context modelling and adaptive arithmetic coding.

Figure: Entropy coding block diagram

The purpose of the first stage is to provide a bitstream with easily analysable statistics that can be encoded using arithmetic coding, which can adapt to those statistics, reflecting any local statistical features.

Binarization

Binarization is the process of transforming the multi-valued coefficient symbols into bits. The resulting bitstream can then be arithmetic coded. The original symbol stream could have been coded directly, using a multi-symbol arithmetic coder, but this tends to suffer from 'context dilution': most symbols occur very rarely and so only sparse statistics can be gathered, which reduces coding efficiency.

The most obvious way to binarize a symbol is directly: a symbol is encoded by encoding the constituent bits of the binary representation of its magnitude, followed by a sign bit. This is termed bit-plane coding. Unfortunately, modelling the resulting bitstream in order to code it efficiently is complicated. Each bit-plane has different statistics, and needs to be modelled separately. More importantly, there are interdependencies between bit-planes, which cannot be known in advance, and which introduce conditional probabilities in the bit-plane models. Modelling these is possible, but for the most part the models do not well represent the statistics of transform coefficients.

Transform coefficients tend to have a roughly Laplacian distribution, which decays exponentially with magnitude. This suits so-called unary binarization. Unary codes are simple VLCs in which every non-negative number N is mapped to N zeroes followed by a 1:

    U(0)    =   1
    U(1)    =   0   1
    U(2)    =   0   0   1
    U(3)    =   0   0   0   1
    U(4)    =   0   0   0   0   1
    U(5)    =   0   0   0   0   0   1
    U(6)    =   0   0   0   0   0   0   1
Bins:           1   2   3   4   5   6   7

Figure: Unary encoding tree

For Laplacian distributed values, the probability of N occurring is 2-(|N|+1), so the probability of a zero or a 1 occurring in any unary bin is constant. So for an ideal only one context would be needed for all the bins, leading to a very compact and reliable description of the statistics. In practice, the coefficients do deviate from the Laplacian ideal and so the lower bins are modelled separately and the larger bins lumped into one context.

The process is best explained by example. Suppose one wished to encode the sequence: -3 0 1 0 -1

When binarized, the sequence to be encoded is: 0 0 0 1 | 0 | 1 | 0 1 | 1 | 1 | 0 1 | 0

The first 4 bits encode the magnitude, 3. The first bit is encoded using the statistics for Bin1, the second using those for Bin 2 and so on. When a 1 is detected, the magnitude is decoded and a sign bit is expected. This is encoded using the sign context statistics; here it is 0 to signify a negative sign. The next bit must be a magnitude bit and is encoded using the Bin 1 contexts; since it is 1 the value is 0 and there is no need for a subsequent sign bit. And so on.

Context modelling

The context modelling in Dirac is based on the principle that whether a coefficient is small (or zero, in particular) or not is well-predicted by its neighbours and its parents. Therefore the codec conditions the probabilities used by the arithmetic coder for coding bins 1 and 2 on the size of the neighbouring coefficients and the parent coefficient.

The reason for this approach is that, whereas the wavelet transform largely removes correlation between a coefficient and its neighbours, they may not be statistically independent even if they are uncorrelated. The main reason for this is that small and especially zero coefficients in wavelet subbands tend to clump together, located at points corresponding to smooth areas in the image, and as discussed elsewhere, are grouped together across subbands in the parent-child relationship.

To compute the context, two pieces of information are used. First, a value nhood_sum is calculated at each point (x,y) of each subband, as the sum of two previously coded quantised neighbouring coefficients:

(NB: nhood_sum depends on the size of the of the predicted neighbouring coefficients in the case of Intra DC band coding.) Secondly, it is determined whether the parent of the coefficient is zero or not.

There are sixteen contexts used in frame coding. They are:

0. SIGN0_CTX - sign context, previous symbol is 0

1. SIGN_POS_CTX - sign constext, previous symbol is +ve

2. SIGN_NEG_CTX - sign context, previous symbol is -ve

3. Z_BIN1z_CTX - bin 1, parent is zero, neighbours zero

4. Z_BIN1nz_CTX - bin 1, parent is zero, neighbours non-zero

5. Z_BIN2_CTX - bin 2, parent is zero

6. Z_BIN3_CTX - bin 3, parent is zero

7. Z_BIN4_CTX - bin 4, parent is zero

8. Z_BIN5plus_CTX - bins 5 plus, parent is zero

9. NZ_BIN1z_CTX - bin 1, parent is non-zero, neighbours zero

10. NZ_BIN1a_CTX - bin 1, parent is non-zero, neighbours small

11. NZ_BIN1b_CTX - bin 1, parent is non-zero, neighbours large

12. NZ_BIN2_CTX - bin 2, parent is non-zero

13. NZ_BIN3_CTX - bin 3, parent is non-zero

14. NZ_BIN4_CTX - bin 4, parent is non-zero

15. NZ_BIN5plus_CTX - bins 5 plus, parent is non-zero

What 'small' means depends on the subband, since the wavelet transform (as implemented in Dirac) has a gain of 2 for each level of decomposition. So a threshold is set individually based on the subband type.

After binarization, a context is selected, and the probabilities for 0 and 1 that are maintained in the appropriate context will be fed to the arithmetic coding function along with the value itself to be coded.

So in the example of the previous section, when coding the first value, -3, the encoder then checks the values of neighbouring coefficients and the parent coefficient. Based on this data, a different statistical model (that is, a count of 1 and a count of zero) is used to code the first two bins. So the coder maintains, for example, the probabilities that Bin 1 is 0 or 1, given that the value of neighbouring coefficients is 0 and the parent is 0 - this is contained in Z_BIN1z_CTX. These are fed to the arithmetic coding engine for encoding the bit in Bin 1, and the context probabilities are updated after encoding.

Arithmetic coding

Conceptually, an arithmetic coder can be thought of a progressive way of producing variable-length codes for entire sequences of symbols based on the probabilities of their constituent symbols. For example, if we know the probability of 0 and 1 in a binary sequence, we also know the probability of the sequence itself occurring. So if

P(0)=0.2, P(1)=0.8

then

P(11101111111011110101)=(0.2)3(0.8)17=1.8x10-4 (assuming independent occurrences).

Information theory then says that optimal entropy coding of this sequence requires log2(1/P)=12.4 bits. AC produces a code-word very close to this optimal length, and implementations can do so progressively, outputting bits when possible as more arrive.

All arithmetic coding (AC) requires are estimates of the probabilities of symbols as they occur, and this is where context modelling fits in. Since AC can, in effect, assign a fractional number of bits to a symbol, it is very efficient for coding symbols with probabilities very close to 1, without the additional complication of run-length coding. The aim of context modelling within Dirac is to use information about the symbol stream to be encoded to produce accurate probabilities as close to 1 as possible.

Dirac computes these estimates for each context simply by counting their occurrences. In order for the decoder to be in the same state as the encoder, these statistics cannot be updated until after a binary symbol has been encoded. This means that the contexts must be initialised with a count for both 0 and 1, which is used for encoding the first symbol in that context.

An additional source of redundancy lies in the local nature of the statistics. If the contexts are not refreshed periodically then later data has less influence in shaping the statistics than earlier data, resulting in bias, and local statistics are not exploited. Dirac adopts a simple way of refreshing the contexts by halving the the counts of 0 and 1 for that context at regular intervals. The effect is to maintain the probabilities to a reasonable level of accuracy, but to keep the influence of all coefficients roughly constant. Local adaption is likely to be developed further in subsequent versions of Dirac.

The software uses an abstract class to encapsulate the basic functions of both coding and decoding. Particular classes to code the subband data are derived from this.By using common context selection and other functions, sync between coder and decoder can be enforced.

Previous: Lagrangian parameter control of subband quantisation

Table of contents Back to Transform Coding