Pizer’s Weblog

programming, DSP, math

Optimal Integer Linear Prediction Coefficients

with one comment

Most lossless audio codecs make use of linear prediction as a means to decorrelate signals. Since encoder and decoder need to work with the same set of prediction coefficients they either have to be chosen by the encoder and included in the compressed data stream before they are used (forward adaptive) or there must be a deterministic algorithm to compute these coefficients based on previous data (backward adaptive). FLAC and TAK are two examples of codecs that use forward adaptive linear prediction. This is the case I’m dealing with in this article.

Computing good real-valued prediction coefficients isn’t that difficult, actually. But these coefficients also need to be coded somehow which involves quantization. One option is to simply quantize the optimal real-valued coefficients to rational numbers that share a certain denominator. FLAC’s format specification for example allows transmission of rational coefficients that have a power-of-two as denominator. However, simple per-coefficient rounding to the closest rational number doesn’t necessarily lead to the best set of quantized coefficients. What does “best” mean anyway? Suppose we fix the prediction order and the prediction coefficients’ accuracy (denominator). We’d like to choose a set of quantized prediction coefficients that reduce the number of bits spent on the residual samples. Since this is a really complicated minimization problem we need to simplify our objective function. The natural choice is to minimize the sum of squared (not-yet-quantized) prediction errors instead. This objective function turns the problem into an “integer least squares” problem.

Suppose you have a block of m samples s_k (1<=k<=m), you want to compute n prediction coefficients (n << m) for the remaining m-n predictions (the first n samples are the “warm-up” samples) and the coefficients’ denominator is ‘p’ for “precision”. This leads to the following integer least squares problem ILS(A,b):

A = (a_{i,j}) \in \mathbb{R}^{m-n,n} \, , \, \, b \in \mathbb{R}^{m-n}

a_{i,j} = s_{i+j-1} \, , \, \, b_i = p \cdot s_{i+n}

for 1 <= i <= m-n and 1 <= j <= n

find x \in \mathbb{Z}^n that minimizes || A \cdot x - b ||

ILS (integer least squares) is known to be NP-hard which practically means that computing the optimal solution for a large problem (matrix with many columns) takes quite some time. However, in case of typical linear prediction problems the complexity of computing the optimal set of oefficients is rather moderate.

How do we solve an ILS problem? There are a couple of approaches. Many include a popular tool known as LLL lattice reduction. With the help of orthogonal transforms and the LLL lattice reduction algorithm the above problem can be reduced to another instance ILS(R,y) where R is square upper triangular matrix of order ‘n’ with a near-orthogonal columns of low norm. A by-product of the reduction process is a an integer matrix M which connects the two problem instances: Assuming w is an optimal solution for ILS(R,y), x=M*w will be an optimal solution for ILS(A,b). Since M contains integers only, x will also be an integer vector. This reduction process is a kind of preconditioning which means the problem ILS(R,y) is “easier” to solve than ILS(A,b).

Still, we need to solve ILS(R,y). A simple approximation of the optimal solution is called Nulling and Cancelling:

w'_j = \lceil \frac{y_j - \sum_{t=j+1}^{n} R_{j,t} w'_t}{R_{j,j}} \rfloor\, for j=n,n-1,…,1

The approximation’s coefficients are computed one at a time. Except for the rounding part (\lceil \cdot \rfloor denotes the closest integer) this is exactly what you do to solve a triangular system. Note that previously quantized coefficients are reused to solve for the next coefficient. If we’re lucky we already found the optimal solution (w’=w). But there’s no easy way to tell.

The optimal solution can be computed with an informed tree search where each branch corresponds to a certain quantization decision. At the first level w'_n is quantized. Next in line is w'_{n-1} until we reach a leaf node where all coefficients have been determined. Coupling a depth-first search with branch and bound that picks the most promising path first is what I did for my C implementation.

You can download my C implementation here. It’s released under a two-clause BSD license. Have fun!

– P

Advertisement

Written by pizer

September 28, 2008 at 4:24 pm

One Response

Subscribe to comments with RSS.

  1. Hi,

    that’s a really interesting article. One wonders what kind of compression gains are possible with “more optimal” approaches like this one. A second interesting question is, if there is one lossless audio codec, which optimizes the coefficients while looking at the possible quantization like you did. Sadly enough, the strongest codecs are closed source: SAC (maybe not even alpha state), OptimFrog, LA and Tak.

    Your implementation code is not online anymore. Would it be possible to reupload the code or mail it to me?
    That would be great.

    Seeing your compression FAC (which is of course kind of basic, but a good overview), i would be pleased if you would continue writing articles about the merging/connection of compression/optimization. I think this topic is kind of of hot today (e.g. PAQ compressors/context-mixing; some kind of machine-learning approaches there.)

    Thanks for reading.

    Sascha

    November 7, 2011 at 6:55 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: