Subsections

Introduction

  There is a lot of confusion surrounding the terms  audio compression,  audio encoding, and  audio decoding. This section will give you an overview what audio coding (another one of these terms...) is all about.

The purpose of audio compression

Up to the advent of audio compression, high-quality digital audio data took a lot of hard disk space to store. Let us go through a short example.

You want to, say, sample your favorite 1-minute song and store it on your harddisk. Because you want CD quality, you sample at 44.1 kHz, stereo, with 16 bits per sample.

44100 Hz means that you have 44100 values per second coming in from your sound card (or input file). Multiply that by two because you have two channels. Multiply by another factor of two because you have two bytes per value (that's what 16 bit means). The song will take up 44100 samples/s · 2 channels · 2 bytes/sample · 60 s/min ~ 10 MBytes of storage space on your harddisk.

If you wanted to download that over the internet, given an average 56k modem connected at 44k (which is a typical case), it would take you (at least) 10000000 bytes · 8 bits/byte / (44000 bits/s) · / (60 s/min) ~ 30 minutes

Just to download one minute of music!

Digital audio coding, which - in this context - is synonymously called digital audio compression as well, is the art of minimizing storage space (or channel bandwidth) requirements for audio data. Modern perceptual audio coding techniques (like MPEG Layer III) exploit the properties of the human ear (the perception of sound) to achieve a size reduction by a factor of 11 with little or no perceptible loss of quality.

Therefore, such schemes are the key technology for high quality low bit-rate applications, like soundtracks for CD-ROM games, solid-state sound memories, Internet audio, digital audio broadcasting systems, and the like.

The two parts of audio compression

Audio compression really consists of two parts. The first part, called  encoding, transforms the digital audio data that resides, say, in a WAVE file, into a highly compressed form called  bitstream. To play the bitstream on your soundcard, you need the second part, called  decoding. Decoding takes the bitstream and re-expands it to a WAVE file.

The program that effects the first part is called an audio encoder. LAME is such an encoder . The program that does the second part is called an audio decoder. One well-known MPEG Layer III decoder is Xmms, another mpg123. Both can be found on www.mp3tech.org .

Compression ratios, bitrate and quality

It has not been explicitly mentioned up to now: What you end up with after encoding and decoding is not the same sound file anymore: All superflous information has been squeezed out, so to say. It is not the same file, but it will sound the same - more or less, depending on how much compression had been performed on it.

Generally speaking, the lower the compression ratio achieved, the better the sound quality will be in the end - and vice versa. Table 1.1 gives you an overview about quality achievable.

Because compression ratio is a somewhat unwieldy measure, experts use the term  bitrate when speaking of the strength of compression. Bitrate denotes the average number of bits that one second of audio data will take up in your compressed bitstream. Usually the units used will be kbps, which is kbits/s, or 1000 bits/s. To calculate the number of bytes per second of audio data, simply divide the number of bits per second by eight.


   
Table 1.1: Bitrate versus sound quality
Bitrate Bandwidth Quality comparable to or better than
8 kbps 2.5 kHz POTS (telephone sound)
16 kbps 4.5 kHz shortwave radio
32 kbps 7.5 kHz AM radio
64 kbps 11 kHz FM radio
128 kbps 16 kHz CD