OII Standards and Specifications List


I'M Europe
OII Home Page
What is OII?
Standards List
OII Guides
OII Fora List
Conference Reports
Monthly Reports
EC Reports
Whats New?
OII Index
OII FAQ
OII Feedback
Disclaimer
Search Database

OII Guide to Image Compression

This guide, which is taken from an appendix of the 1993 IMSTAND report prepared by PIRA International for the Commission of the

European Communities, summarizes the roles of the following data compression techniques:

Introduction

An image, 1024 pixel x 1024 pixel x 24 bit, without compression, would require 3 MB of storage and 7 minutes for transmission,

utilising a high speed, 64 Kbit/s, ISDN line. If the image is compressed at a 10:1 compression ratio, the storage requirement is reduced

to 300 KB and the transmission time drops to under 6 seconds. Seven 1 MB images can be compressed and transferred to a floppy

disk in less time than it takes to send one of the original files, uncompressed, over an AppleTalk network.

In a distributed environment large image files remain a major bottleneck within systems. Compression is an important component of the

solutions available for creating file sizes of manageable and transmittable dimensions. Increasing the bandwidth is another method, but

the cost sometimes makes this a less attractive solution.

Platform portability and performance are important in the selection of the compression/decompression technique to be employed.

Compression solutions today are more portable due to the change from proprietary high end solutions to accepted and implemented

international standards. JPEG is evolving as the industry standard technique for the compression of continuous tone images.

Two categories of data compression algorithm can be distinguished:

  • lossless and
  • 'lossy'

Lossy techniques cause image quality degradation in each compression/decompression step. Careful consideration of the human visual

perception ensures that the degradation is often unrecognisable, though this depends on the selected compression ratio. In general,

lossy techniques provide far greater compression ratios than lossless techniques.

Lossless coding techniques

Lossless coding guaranties that the decompressed image is absolutely identical to the image before compression. This is an important

requirement for some application domains, e.g. medial imaging, where not only high quality is in demand, but unaltered archiving is a

legal requirement. Lossless techniques can also used for the compression of other data types where loss of information is not

acceptable, e.g. text documents and program executables.

Some compression methods can be made more effective by adding a 1D or 2D delta coding to the process of compression. These

deltas make more effectively use of run length encoding, have (statistically) higher maxima in code tables (leading to better results in

Huffman and general entropy codings), and build greater equal value areas usable for area coding.

Some of these methods can easily be modified to be lossy. Lossy element fits perfectly into 1D/2D run length search. Also, logarithmic

quantisation may be inserted to provide better or more effective results.

Run length encoding

Run length encoding is a very simple method for compression of sequential data. It takes advantage of the fact that, in many data

streams, consecutive single tokens are often identical. Run length encoding checks the stream for this fact and inserts a special token

each time a chain of more than two equal input tokens are found. This special input advises the decoder to insert the following token n 

times into his output stream. Following is a short example of this method:

Clock      Input    Coder        Decoder                      Output       Output  1          A  2
          B        A  3          C        B            A  4          C        Ø            B  5  
        C        Ø            Ø  6          C        Ø            Ø  7          C        Ø     
       Ø  8          D        %5C          Ø  9          E        D            CCCCC  10       
  Ø        E            D  11         Ø        Ø            E    

In this example, there are 9 tokens going into the coder, but just 7 going out. The effectivity of run length encoding is a function of the

number of equal tokens in a row in relation to the total number of input tokens. This relation is very high in undithered two tone images

of the type used for facsimile. Obviously, effectivity degrades when the input does not contain too many equal tokens. With a rising

density of information, the likelihood of two following tokens being the same does sink significantly, as there is always some noise

distortion in the input.

Run length coding is easily implemented, either in software or in hardware. It is fast and very well verifiable, but its compression ability

is very limited.

Huffman encoding

This algorithm, developed by D.A. Huffman, is based on the fact that in an input stream certain tokens occur more often than others.

Based on this knowledge, the algorithm builds up a weighted binary tree according to their rate of occurrence. Each element of this tree

is assigned a new code word, whereat the length of the code word is determined by its position in the tree. Therefore, the token which

is most frequent and becomes the root of the tree is assigned the shortest code. Each less common element is assigned a longer code

word. The least frequent element is assigned a code word which may have become twice as long as the input token.

The compression ratio achieved by Huffman encoding uncorrelated data becomes something like 1:2. On slightly correlated data, as

on images, the compression rate may become much higher, the absolute maximum being defined by the size of a single input token and

the size of the shortest possible output token (max. compression = token size[bits]/2[bits]). While standard palletised images with a

limit of 256 colours may be compressed by 1:4 if they use only one colour, more typical images give results in the range of 1:1.2 to

1:2.5.

Entropy coding (Lempel/Ziv)

The typical implementation of an entropy coder follows J. Ziv/A. Lempel's approach. Nowadays, there is a wide range of so called

modified Lempel/Ziv codings. These algorithms all have a common way of working. The coder and the decoder both build up an

equivalent dictionary of metasymbols, each of which represents a whole sequence of input tokens. If a sequence is repeated after a

symbol was found for it, then only the symbol becomes part of the coded data and the sequence of tokens referenced by the symbol

becomes part of the decoded data later. As the dictionary is build up based on the data, it is not necessary to put it into the coded data,

as it is with the tables in a Huffman coder.

This method becomes very efficient even on virtually random data. The average compression on text and program data is about 1:2,

the ratio on image data comes up to 1:8 on the average GIF image. Here again, a high level of input noise degrades the efficiency

significantly.

Entropy coders are a little tricky to implement, as there are usually a few tables, all growing while the algorithm runs.

LZ coding is subject to patents owned by IBM and Unisys (formerly Sperry). A standardized version of LZ level 1 coding exists in the

form of the ECMA-222 Adaptive Lossless Data Compression (ALDC) algorithm. In January 1996 this standard was submitted to

ISO/IEC for fast-track approval as ISO/IEC 15200.

Area coding

Area coding is an enhanced form of run length coding, reflecting the two dimensional character of images. This is a significant advance

over the other lossless methods. For coding an image it does not make too much sense to interpret it as a sequential stream, as it is in

fact an array of sequences, building up a two dimensional object. Therefore, as the two dimensions are independent and of same

importance, it is obvious that a coding scheme aware of this has some advantages. The algorithms for area coding try to find rectangular

regions with the same characteristics. These regions are coded in a descriptive form as an Element with two points and a certain

structure. The whole input image has to be described in this form to allow lossless decoding afterwards.

The possible performance of this coding method is limited mostly by the very high complexity of the task of finding largest areas with

the same characteristics. Practical implementations use recursive algorithms for reducing the whole area to equal sized subrectangles

until a rectangle does fulfill the criteria defined as having the same characteristic for every pixel.

This type of coding can be highly effective but it bears the problem of a nonlinear method, which cannot be implemented in hardware.

Therefore, the performance in terms of compression time is not competitive, although the compression ratio is.

Lossy coding techniques

Lossy image coding techniques normally have three components:

  • image modelling which defines such things as the transformation to be applied to the image
  • parameter quantisation whereby the data generated by the transformation is quantised to reduce the amount of information
  • encoding, where a code is generated by associating appropriate codewords to the raw data produced by the quantiser.

Each of these operations is in some part responsible of the compression. Image modelling is aimed at the exploitation of statistical

characteristics of the image (i.e. high correlation, redundancy). Typical examples are transform coding methods, in which the data is

represented in a different domain (for example, frequency in the case of the Fourier Transform [FT], the Discrete Cosine Transform

[DCT], the Kahrunen-Loewe Transform [KLT], and so on), where a reduced number of coefficients contains most of the original

information. In many cases this first phase does not result in any loss of information.

The aim of quantisation is to reduce the amount of data used to represent the information within the new domain. Quantisation is in

most cases not a reversible operation: therefore, it belongs to the so called 'lossy' methods.

Encoding is usually error free. It optimises the representation of the information (helping, sometimes, to further reduce the bit rate), and

may introduce some error detection codes.

In the following sections, a review of the most important coding schemes for lossy compression is provided. Some methods are

described in their canonical form (transform coding, region based approximations, fractal coding, wavelets, hybrid methods) and some

variations and improvements presented in the scientific literature are reported and discussed.

Transform coding (DCT/Wavelets/Gabor)

A general transform coding scheme involves subdividing an NxN image into smaller nxnblocks and performing a unitary transform 

on each subimage. A unitary transform is a reversible linear transform whose kernel describes a set of complete, orthonormal discrete

basic functions. The goal of the transform is to decorrelate the original signal, and this decorrelation generally results in the signal energy

being redistributed among only a small set of transform coefficients. In this way, many coefficients may be discarded after quantisation

and prior to encoding. Also, visually lossless compression can often be achieved by incorporating the HVS contrast sensitivity function

in the quantisation of the coefficients.

Transform coding can be generalized into four stages:

  • image subdivision
  • image transformation
  • coefficient quantisation
  • Huffman encoding.

For a transform coding scheme, logical modelling is done in two steps: a segmentation one, in which the image is subdivided in

bidimensional vectors (possibly of different sizes) and a transformation step, in which the chosen transform (e.g. KLT, DCT,

Hadamard) is applied.

Quantisation can be performed in several ways. Most classical approaches use 'zonal coding', consisting in the scalar quantisation

of the coefficients belonging to a predefined area (with a fixed bit allocation), and 'threshold coding', consisting in the choice of the

coefficients of each block characterised by an absolute value exceeding a predefined threshold. Another possibility, that leads to

higher compression factors, is to apply a vector quantisation scheme to the transformed coefficients.

The same type of encoding is used for each coding method. In most cases a classicalHuffman code can be used successfully.

The JPEG and MPEG standards are examples of standards based on transform coding.

Vector quantisation

A vector quantiser can be defined mathematically as a transform operator T from a K-dimensional Euclidean space R^K to a finite

subset X in R^K made up of N vectors. This subset X becomes the vector codebook, or, more generally, the codebook.

Clearly, the choice of the set of vectors is of major importance. The level of distortion due to the transformation T is generally

computed as the most significant error (MSE) between the "real" vector x in R^K and the corresponding vector x' = T(x) in X. This

error should be such as to minimise the Euclidean distance d.

An optimum scalar quantiser was proposed by Lloyd and Max. Later on, Linde, Buzo and Gray resumed and generalized this method,

extending it to the case of a vector quantiser. The algorithm that they proposed is derived from the KNN clusterisation method, and is

performed by iterating the following basic operations:

  • subdivide the training set into N groups (called 'partitions' or 'Voronoi regions'), which are associated with the N codebook
    letters, according to a minimum distance criterion;
  • the centroids of the Voronoi regions become the updated codebook vectors;
  • compute the average distortion: if the percent reduction in the distortion (as compared with the previous step) is below a certain
    threshold, then STOP.

Once the codebook has been designed, the coding process simply consists in the application of the T operator to the vectors of the

original image. In practice, each group of n pixels will be coded as an address in the vector codebook, that is, as a number from 1 to N.

The LBG algorithm for the design of a vector codebook always reaches a local minimum for the distortion function, but often this

solution is not the optimal one. A careful analysis of the LBG algorithm's behaviour allows one to detect two critical points: the choice

of the starting codebook and the uniformity of the Voronoi regions' dimensions. For this reason some algorithms have been designed

that give better performances. With respect to the initialisation of the LBG algorithm, for instance, one can observe that a random

choice of the starting codebook requires a large number of iterations before reaching an acceptable amount of distortion. Moreover,

if the starting point leads to a local minimum solution, the relative stopping criterion prevents further optimisation steps.

Segmentation and approximation methods

With segmentation and approximation coding methods, the image is modelled as a mosaic of regions, each one characterised by a

sufficient degree of uniformity of its pixels with respect to a certain feature (e.g. grey level, texture); each region then has some

parameters related to the characterising feature associated with it.

The operations of finding a suitable segmentation and an optimum set of approximating parameters are highly correlated, since the

segmentation algorithm must take into account the error produced by the region reconstruction (in order to limit this value within

determined bounds). These two operations constitute the logical modelling for this class of coding schemes; quantisation and encoding

are strongly dependent on the statistical characteristics of the parameters of this approximation (and, therefore, on the approximation

itself).

Classical examples are polynomial approximation and texture approximation. For polynomial approximation regions are

reconstructed by means of polynomial functions in(x, y); the task of the encoder is to find the optimum coefficients. In texture

approximation, regions are filled by synthesising a parametrised texture based on some model (e.g. fractals, statistical methods,

Markov Random Fields [MRF]). It must be pointed out that, while in polynomial approximations the problem of finding optimum

coefficients is quite simple (it is possible to use least squares approximation or similar exact formulations), for texture based techniques

this problem can be very complex.

Spline approximation methods (Bilinear Interpolation/Regularisation)

These methodologies fall in the more general category of image reconstruction or sparse data interpolation. The basic concept is to

interpolate data from a set of points coming from original pixel data or calculated in order to match some error criteria. The problem

of interpolating a set of sparse data is generally ill posed, so some regularization algorithm must be adopted in order to obtain a unique

solution. The problem is well documented, and many interpolation algorithms have been proposed.

In order to apply this kind of technique to image coding, a good interpolant must be used to match visual criteria. Spline interpolation

provides a good visual interpolant, notwithstanding its requiring a great computational effort. Bilinear interpolation is more easy to

implement, while maintaining a very good visual quality. Regularization involves the minimisation of an energy function in order to

obtain an interpolant which presents some smoothness constraints; it can be combined with non-continuities along edges in order to

preserve contour quality during reconstruction. Generally all interpolants computations require the solution of very large linear equation

sets, even if related to very sparse matrices. This leads to the use of recursive solution such as relaxation (suitable for a large parallel

implementation), or to the use of gradient descent algorithm.

The use of an interpolation algorithm for image coding is more interesting when related to techniques such as two source

decomposition, where the image is modelled as the sum of two sources; one is the stationary part (it can be considered related to the

low frequency content), the second is the residual content coming from non-stationaries such as edges. The first of the two sources is

coded by means of a prediction scheme that can be one of the previous described interpolants. The second source (the residual) can

be coded trough the use of a classical coding method (transform coding, vector or scalar quantisation etc). Moreover two source

decomposition is a very effective coding scheme as far as it shows a low tile effect that affects all block coding techniques when

compression factors become higher.

Fractal coding (texture synthesis, iterated functions system [IFS], recursive IFS [RIFS])

Fractal parameters, including fractal dimension, lacunarity, as well as others described below, have the potential to provide efficient

methods of describing imagery in a highly compact fashion for both intra- and interframe applications. Fractal methods have been

developed for both noisy and noise free coding methods.

Images of natural scenes are likely candidates because of the fractal structure of the scene content, but results are reported to be

applicable to a variety of binary, monochrome, and colour scenes.

In the mid-eighties Dr Michael Barnsley reported the use of "Iterated Function System" for image compression and synthesis. Using

sets of affine transformations developed for a given image, and a principal result known as the "collage theorem", intraframe

compressions in excess of 10,000:1 and interframe compression in excess of 1,000,000:1 were reported. The collage theorem states

that if an image can be covered (approximately) with compressed affine transformations of itself, then the image can be

(approximately) reconstructed by computing the attractor (in the sense of non linear dynamic systems) of this set of affine

transformations.

This convergence was extremely slow, about 100 hours on a Cray, unless assisted by a person and was presented as an illustration of

a scientific possibility, not as a commercial reality. In 1987 Barnsley and Sloan formed Iterated Systems Inc. to develop a product

that would function in a commercial environment. By the end of 1988 Iterated Systems had developed the patented technique called

the 'Fractal Transform' which has become the basis of their current product range. The development allowed a real world image to be

reduced to a set of fractal equations based on the image being processed, rather than a huge library of pre-calculated, reference,

fractal patterns. Image compression algorithms which are noise free have been reported to be developed from this transform for

real time automatic image compression at ratios between 10:1 and 100:1

Barnsley has not divulged any technical information about the Fractal Transform beyond that in his book "Fractals Everywhere". The

book explains the idea of IFS codes at length but is vague about the application of the collage theorem to specific compression

problems.

Researchers at BellCore have developed a compression method that incorporates a Peano Scan (the Peano curve is a "space filling"

fractal curve) with a fractal based coding schema for intraframe compression, so making it possible to archive high picture quality

with bit rates of less than one bit per pixel. A fractal based method was reported earlier by Walach while the concepts of using a

Peano Scan had been used for storage, compression and display. The BellCore researchers have combined these two fractal concepts

into an efficient implementation which is now being incorporated into a VLSI chip. Compression results show good quality imagery

at a rate of 0.8-1 bit per pixel. The technique may be implemented in an adaptive fashion since a local estimate of fractal dimension

provides an objective measure of image complexity.

An alternative approach to image compression, also incorporating some features of fractal geometry, is based on orthogonal pyramid

transforms. These transforms decompose the image as a weighted sum of basis functions. The basis functions have several attractive

features including localisation in space and spatial frequency as well as self similarity. This latter feature reflects the notion that basis

functions are scaled version of one another, and thus are able to capture image information at all scales in a uniform way. The pyramid

transform techniques are based, in part, on an earlier concept of quadrature mirror filters. The pyramid structure results from dealing

with the image at multiple resolutions. Results reported show high quality imagery at rates of 0.65 bits per pixel. The MSE is

comparable to 16 X 16 block DCT transform coding algorithms but superior in visual image quality.

Efficiency and quality of different lossy compression techniques

The performances of lossy picture coding algorithms is usually evaluated on the basis of two parameters:

  1. the compression factor (or analogously the bit rate) and
  2. the distortion produced on the reconstruction.

The first is an objective parameter, while the second strongly depends on the usage of the coded image. Nevertheless, a rough

evaluation of the performances of a method can be made by considering an objective measure of the error, like MSE or SNR.

For the methods described in the previous pages, average compression ratios and SNR values obtainable are presented in the

following table:

Method     VQ       DCT-SQ   DCT-VQ    AP        SplineTSD Fractals  ----------------------------
---------------------------------------  Bit Rate   0.8-0.4  0.8-0.3  0.3-0.08  0.3-0.1   0.4-0.1  
 0.8-0.0  (bpp)  SNR (dB)   36-30    36-31    30-25     image     36-32     image                                         
dependent           dependent  
Comparison of Different Compression Methods

During the last years, some standardisation processes based on transform coding, such as JPEG, have been started. Performances of

such a standard are quite good if compression factors are maintained under a given threshold (about 20 times). Over this threshold,

artifacts become visible in the reconstruction and tile effect affects seriously the images decoded, due to quantisation effects of the

DCT coefficients.

On the other hand, there are two advantages: first, it is a standard, and second, dedicated hardware implementations exist. For

applications which require higher compression factors with some minor loss of accuracy when compared with JPEG, different

techniques should be selected such as wavelets coding or spline interpolation, followed by an efficient entropy encoder such as

Huffman, arithmetic coding or vector quantisation. Some of this coding schemes, are suitable for progressive reconstruction

(Pyramidal Wavelet Coding, Two Source Decomposition, etc). This property can be exploited by applications such as coding of

images in a database, for previewing purposes or for transmission on a limited bandwidth channel.



Section Contents
OII Home Page
OII Index
OII Help

This information set on OII standards is maintained by Martin Bryan of The SGML Centre and Man-Sze Li of IC Focus on behalf of European Commission DGXIII/E.

File last updated: January 1998

Home - Gate - Back - Top - Compress - Relevant