## OII Guide to Image CompressionThis guide, which is taken from an appendix of the 1993 IMSTAND report prepared by PIRA International for the Commission of the European Communities, summarizes the roles of the following data compression techniques: - Lossless coding techniques
- Lossy coding techniques
- Transform coding (DCT/Wavelets/Gabor)
- Vector quantisation
- Segmentation and approximation methods
- Spline approximation methods (Bilinear Interpolation/Regularisation)
- Fractal coding (texture synthesis, iterated functions system [IFS], recursive IFS [RIFS])
- Efficiency and quality of different lossy compression techniques
## IntroductionAn image, 1024 pixel x 1024 pixel x 24 bit, without compression, would require 3 MB of storage and 7 minutes for transmission, utilising a high speed, 64 Kbit/s, ISDN line. If the image is compressed at a 10:1 compression ratio, the storage requirement is reduced to 300 KB and the transmission time drops to under 6 seconds. Seven 1 MB images can be compressed and transferred to a floppy disk in less time than it takes to send one of the original files, uncompressed, over an AppleTalk network. In a distributed environment large image files remain a major bottleneck within systems. Compression is an important component of the solutions available for creating file sizes of manageable and transmittable dimensions. Increasing the bandwidth is another method, but the cost sometimes makes this a less attractive solution. Platform portability and performance are important in the selection of the compression/decompression technique to be employed. Compression solutions today are more portable due to the change from proprietary high end solutions to accepted and implemented international standards. JPEG is evolving as the industry standard technique for the compression of continuous tone images. Two categories of data compression algorithm can be distinguished: - lossless and
- 'lossy'
Lossy techniques cause image quality degradation in each compression/decompression step. Careful consideration of the human visual perception ensures that the degradation is often unrecognisable, though this depends on the selected compression ratio. In general, lossy techniques provide far greater compression ratios than lossless techniques. ## Lossless coding techniquesLossless coding guaranties that the decompressed image is absolutely identical to the image before compression. This is an important requirement for some application domains, e.g. medial imaging, where not only high quality is in demand, but unaltered archiving is a legal requirement. Lossless techniques can also used for the compression of other data types where loss of information is not acceptable, e.g. text documents and program executables. Some compression methods can be made more effective by adding a 1D or 2D delta coding to the process of compression. These deltas make more effectively use of run length encoding, have (statistically) higher maxima in code tables (leading to better results in Huffman and general entropy codings), and build greater equal value areas usable for area coding. Some of these methods can easily be modified to be lossy. Lossy element fits perfectly into 1D/2D run length search. Also, logarithmic quantisation may be inserted to provide better or more effective results. ## Run length encodingRun length encoding is a very simple method for compression of sequential data. It takes advantage of the fact that, in many data streams, consecutive single tokens are often identical. Run length encoding checks the stream for this fact and inserts a special token each time a chain of more than two equal input tokens are found. This special input advises the decoder to insert the following token times into his output stream. Following is a short example of this method: Clock Input Coder Decoder Output Output 1 A 2 B A 3 C B A 4 C Ø B 5 C Ø Ø 6 C Ø Ø 7 C Ø Ø 8 D %5C Ø 9 E D CCCCC 10 Ø E D 11 Ø Ø E In this example, there are 9 tokens going into the coder, but just 7 going out. The effectivity of run length encoding is a function of the number of equal tokens in a row in relation to the total number of input tokens. This relation is very high in undithered two tone images of the type used for facsimile. Obviously, effectivity degrades when the input does not contain too many equal tokens. With a rising density of information, the likelihood of two following tokens being the same does sink significantly, as there is always some noise distortion in the input. Run length coding is easily implemented, either in software or in hardware. It is fast and very well verifiable, but its compression ability is very limited. ## Huffman encodingThis algorithm, developed by D.A. Huffman, is based on the fact that in an input stream certain tokens occur more often than others. Based on this knowledge, the algorithm builds up a weighted binary tree according to their rate of occurrence. Each element of this tree is assigned a new code word, whereat the length of the code word is determined by its position in the tree. Therefore, the token which is most frequent and becomes the root of the tree is assigned the shortest code. Each less common element is assigned a longer code word. The least frequent element is assigned a code word which may have become twice as long as the input token. The compression ratio achieved by Huffman encoding uncorrelated data becomes something like 1:2. On slightly correlated data, as on images, the compression rate may become much higher, the absolute maximum being defined by the size of a single input token and the size of the shortest possible output token (max. compression = token size[bits]/2[bits]). While standard palletised images with a limit of 256 colours may be compressed by 1:4 if they use only one colour, more typical images give results in the range of 1:1.2 to 1:2.5. ## Entropy coding (Lempel/Ziv)The typical implementation of an entropy coder follows J. Ziv/A. Lempel's approach. Nowadays, there is a wide range of so called modified Lempel/Ziv codings. These algorithms all have a common way of working. The coder and the decoder both build up an equivalent dictionary of metasymbols, each of which represents a whole sequence of input tokens. If a sequence is repeated after a symbol was found for it, then only the symbol becomes part of the coded data and the sequence of tokens referenced by the symbol becomes part of the decoded data later. As the dictionary is build up based on the data, it is not necessary to put it into the coded data, as it is with the tables in a Huffman coder. This method becomes very efficient even on virtually random data. The average compression on text and program data is about 1:2, the ratio on image data comes up to 1:8 on the average GIF image. Here again, a high level of input noise degrades the efficiency significantly. Entropy coders are a little tricky to implement, as there are usually a few tables, all growing while the algorithm runs. LZ coding is subject to patents owned by IBM and Unisys (formerly Sperry). A standardized version of LZ level 1 coding exists in the form of the ECMA-222 Adaptive Lossless Data Compression (ALDC) algorithm. In January 1996 this standard was submitted to ISO/IEC for fast-track approval as ISO/IEC 15200. ## Area codingArea coding is an enhanced form of run length coding, reflecting the two dimensional character of images. This is a significant advance over the other lossless methods. For coding an image it does not make too much sense to interpret it as a sequential stream, as it is in fact an array of sequences, building up a two dimensional object. Therefore, as the two dimensions are independent and of same importance, it is obvious that a coding scheme aware of this has some advantages. The algorithms for area coding try to find rectangular regions with the same characteristics. These regions are coded in a descriptive form as an Element with two points and a certain structure. The whole input image has to be described in this form to allow lossless decoding afterwards. The possible performance of this coding method is limited mostly by the very high complexity of the task of finding largest areas with the same characteristics. Practical implementations use recursive algorithms for reducing the whole area to equal sized subrectangles until a rectangle does fulfill the criteria defined as having the same characteristic for every pixel. This type of coding can be highly effective but it bears the problem of a nonlinear method, which cannot be implemented in hardware. Therefore, the performance in terms of compression time is not competitive, although the compression ratio is. ## Lossy coding techniquesLossy image coding techniques normally have three components: *image modelling*which defines such things as the transformation to be applied to the image*parameter quantisation*whereby the data generated by the transformation is quantised to reduce the amount of information*encoding*, where a code is generated by associating appropriate codewords to the raw data produced by the quantiser.
Each of these operations is in some part responsible of the compression. Image modelling is aimed at the exploitation of statistical characteristics of the image (i.e. high correlation, redundancy). Typical examples are transform coding methods, in which the data is represented in a different domain (for example, frequency in the case of the Fourier Transform [FT], the Discrete Cosine Transform [DCT], the Kahrunen-Loewe Transform [KLT], and so on), where a reduced number of coefficients contains most of the original information. In many cases this first phase does not result in any loss of information. The aim of quantisation is to reduce the amount of data used to represent the information within the new domain. Quantisation is in most cases not a reversible operation: therefore, it belongs to the so called 'lossy' methods. Encoding is usually error free. It optimises the representation of the information (helping, sometimes, to further reduce the bit rate), and may introduce some error detection codes. In the following sections, a review of the most important coding schemes for lossy compression is provided. Some methods are described in their canonical form (transform coding, region based approximations, fractal coding, wavelets, hybrid methods) and some variations and improvements presented in the scientific literature are reported and discussed. ## Transform coding (DCT/Wavelets/Gabor)A general transform coding scheme involves subdividing an on each subimage. A unitary transform is a reversible linear transform whose kernel describes a set of complete, orthonormal discrete basic functions. The goal of the transform is to decorrelate the original signal, and this decorrelation generally results in the signal energy being redistributed among only a small set of transform coefficients. In this way, many coefficients may be discarded after quantisation and prior to encoding. Also, visually lossless compression can often be achieved by incorporating the HVS contrast sensitivity function in the quantisation of the coefficients. Transform coding can be generalized into four stages: *image subdivision**image transformation**coefficient quantisation**Huffman encoding*.
For a transform coding scheme, logical modelling is done in two steps: a segmentation one, in which the image is subdivided in bidimensional vectors (possibly of different sizes) and a transformation step, in which the chosen transform (e.g. KLT, DCT, Hadamard) is applied. Quantisation can be performed in several ways. Most classical approaches use 'zonal coding', consisting in the scalar quantisation of the coefficients belonging to a predefined area (with a fixed bit allocation), and 'threshold coding', consisting in the choice of the coefficients of each block characterised by an absolute value exceeding a predefined threshold. Another possibility, that leads to higher compression factors, is to apply a vector quantisation scheme to the transformed coefficients. The same type of encoding is used for each coding method. In most cases a classicalHuffman code can be used successfully. The JPEG and MPEG standards are examples of standards based on transform coding. ## Vector quantisationA vector quantiser can be defined mathematically as a transform operator subset Clearly, the choice of the set of vectors is of major importance. The level of distortion due to the transformation computed as the most significant error (MSE) between the "real" vector error should be such as to minimise the Euclidean distance An optimum scalar quantiser was proposed by Lloyd and Max. Later on, Linde, Buzo and Gray resumed and generalized this method, extending it to the case of a vector quantiser. The algorithm that they proposed is derived from the KNN clusterisation method, and is performed by iterating the following basic operations: - subdivide the training set into
*N*groups (called 'partitions' or 'Voronoi regions'), which are associated with the*N*codebook letters, according to a minimum distance criterion; - the centroids of the Voronoi regions become the updated codebook vectors;
- compute the average distortion: if the percent reduction in the distortion (as compared with the previous step) is below a certain
threshold, then STOP.
Once the codebook has been designed, the coding process simply consists in the application of the original image. In practice, each group of The LBG algorithm for the design of a vector codebook always reaches a local minimum for the distortion function, but often this solution is not the optimal one. A careful analysis of the LBG algorithm's behaviour allows one to detect two critical points: the choice of the starting codebook and the uniformity of the Voronoi regions' dimensions. For this reason some algorithms have been designed that give better performances. With respect to the initialisation of the LBG algorithm, for instance, one can observe that a random choice of the starting codebook requires a large number of iterations before reaching an acceptable amount of distortion. Moreover, if the starting point leads to a local minimum solution, the relative stopping criterion prevents further optimisation steps. ## Segmentation and approximation methodsWith segmentation and approximation coding methods, the image is modelled as a mosaic of regions, each one characterised by a sufficient degree of uniformity of its pixels with respect to a certain feature (e.g. grey level, texture); each region then has some parameters related to the characterising feature associated with it. The operations of finding a suitable segmentation and an optimum set of approximating parameters are highly correlated, since the segmentation algorithm must take into account the error produced by the region reconstruction (in order to limit this value within determined bounds). These two operations constitute the logical modelling for this class of coding schemes; quantisation and encoding are strongly dependent on the statistical characteristics of the parameters of this approximation (and, therefore, on the approximation itself). Classical examples are reconstructed by means of polynomial functions in approximation, regions are filled by synthesising a parametrised texture based on some model (e.g. fractals, statistical methods, Markov Random Fields [MRF]). It must be pointed out that, while in polynomial approximations the problem of finding optimum coefficients is quite simple (it is possible to use least squares approximation or similar exact formulations), for texture based techniques this problem can be very complex. ## Spline approximation methods (Bilinear Interpolation/Regularisation)These methodologies fall in the more general category of image reconstruction or sparse data interpolation. The basic concept is to interpolate data from a set of points coming from original pixel data or calculated in order to match some error criteria. The problem of interpolating a set of sparse data is generally ill posed, so some regularization algorithm must be adopted in order to obtain a unique solution. The problem is well documented, and many interpolation algorithms have been proposed. In order to apply this kind of technique to image coding, a good interpolant must be used to match visual criteria. Spline interpolation provides a good visual interpolant, notwithstanding its requiring a great computational effort. Bilinear interpolation is more easy to implement, while maintaining a very good visual quality. Regularization involves the minimisation of an energy function in order to obtain an interpolant which presents some smoothness constraints; it can be combined with non-continuities along edges in order to preserve contour quality during reconstruction. Generally all interpolants computations require the solution of very large linear equation sets, even if related to very sparse matrices. This leads to the use of recursive solution such as relaxation (suitable for a large parallel implementation), or to the use of gradient descent algorithm. The use of an interpolation algorithm for image coding is more interesting when related to techniques such as two source decomposition, where the image is modelled as the sum of two sources; one is the stationary part (it can be considered related to the low frequency content), the second is the residual content coming from non-stationaries such as edges. The first of the two sources is coded by means of a prediction scheme that can be one of the previous described interpolants. The second source (the residual) can be coded trough the use of a classical coding method (transform coding, vector or scalar quantisation etc). Moreover two source decomposition is a very effective coding scheme as far as it shows a low tile effect that affects all block coding techniques when compression factors become higher. ## Fractal coding (texture synthesis, iterated functions system [IFS], recursive IFS [RIFS])Fractal parameters, including fractal dimension, lacunarity, as well as others described below, have the potential to provide efficient methods of describing imagery in a highly compact fashion for both intra- and interframe applications. Fractal methods have been developed for both noisy and noise free coding methods. Images of natural scenes are likely candidates because of the fractal structure of the scene content, but results are reported to be applicable to a variety of binary, monochrome, and colour scenes. In the mid-eighties Dr Michael Barnsley reported the use of "Iterated Function System" for image compression and synthesis. Using sets of affine transformations developed for a given image, and a principal result known as the "collage theorem", intraframe compressions in excess of 10,000:1 and interframe compression in excess of 1,000,000:1 were reported. The collage theorem states that if an image can be covered (approximately) with compressed affine transformations of itself, then the image can be (approximately) reconstructed by computing the attractor (in the sense of non linear dynamic systems) of this set of affine transformations. This convergence was extremely slow, about 100 hours on a Cray, unless assisted by a person and was presented as an illustration of a scientific possibility, not as a commercial reality. In 1987 Barnsley and Sloan formed Iterated Systems Inc. to develop a product that would function in a commercial environment. By the end of 1988 Iterated Systems had developed the patented technique called the 'Fractal Transform' which has become the basis of their current product range. The development allowed a real world image to be reduced to a set of fractal equations based on the image being processed, rather than a huge library of pre-calculated, reference, fractal patterns. Image compression algorithms which are noise free have been reported to be developed from this transform for real time automatic image compression at ratios between 10:1 and 100:1 Barnsley has not divulged any technical information about the Fractal Transform beyond that in his book "Fractals Everywhere". The book explains the idea of IFS codes at length but is vague about the application of the collage theorem to specific compression problems. Researchers at BellCore have developed a compression method that incorporates a Peano Scan (the Peano curve is a "space filling" fractal curve) with a fractal based coding schema for intraframe compression, so making it possible to archive high picture quality with bit rates of less than one bit per pixel. A fractal based method was reported earlier by Walach while the concepts of using a Peano Scan had been used for storage, compression and display. The BellCore researchers have combined these two fractal concepts into an efficient implementation which is now being incorporated into a VLSI chip. Compression results show good quality imagery at a rate of 0.8-1 bit per pixel. The technique may be implemented in an adaptive fashion since a local estimate of fractal dimension provides an objective measure of image complexity. An alternative approach to image compression, also incorporating some features of fractal geometry, is based on orthogonal pyramid transforms. These transforms decompose the image as a weighted sum of basis functions. The basis functions have several attractive features including localisation in space and spatial frequency as well as self similarity. This latter feature reflects the notion that basis functions are scaled version of one another, and thus are able to capture image information at all scales in a uniform way. The pyramid transform techniques are based, in part, on an earlier concept of quadrature mirror filters. The pyramid structure results from dealing with the image at multiple resolutions. Results reported show high quality imagery at rates of 0.65 bits per pixel. The MSE is comparable to 16 X 16 block DCT transform coding algorithms but superior in visual image quality. ## Efficiency and quality of different lossy compression techniquesThe performances of lossy picture coding algorithms is usually evaluated on the basis of two parameters: - the compression factor (or analogously the bit rate) and
- the distortion produced on the reconstruction.
The first is an objective parameter, while the second strongly depends on the usage of the coded image. Nevertheless, a rough evaluation of the performances of a method can be made by considering an objective measure of the error, like MSE or SNR. For the methods described in the previous pages, average compression ratios and SNR values obtainable are presented in the following table: Method VQ DCT-SQ DCT-VQ AP SplineTSD Fractals ---------------------------- --------------------------------------- Bit Rate 0.8-0.4 0.8-0.3 0.3-0.08 0.3-0.1 0.4-0.1 0.8-0.0 (bpp) SNR (dB) 36-30 36-31 30-25 image 36-32 image dependent dependent Comparison of Different Compression Methods
During the last years, some standardisation processes based on transform coding, such as JPEG, have been started. Performances of such a standard are quite good if compression factors are maintained under a given threshold (about 20 times). Over this threshold, artifacts become visible in the reconstruction and tile effect affects seriously the images decoded, due to quantisation effects of the DCT coefficients. On the other hand, there are two advantages: first, it is a standard, and second, dedicated hardware implementations exist. For applications which require higher compression factors with some minor loss of accuracy when compared with JPEG, different techniques should be selected such as wavelets coding or spline interpolation, followed by an efficient entropy encoder such as Huffman, arithmetic coding or vector quantisation. Some of this coding schemes, are suitable for progressive reconstruction (Pyramidal Wavelet Coding, Two Source Decomposition, etc). This property can be exploited by applications such as coding of images in a database, for previewing purposes or for transmission on a limited bandwidth channel. |

This information set on OII standards is maintained by Martin Bryan of The SGML Centre and Man-Sze Li of IC Focus on behalf of European Commission DGXIII/E. File last updated: January 1998 Home - Gate - Back - Top - Compress - Relevant |