Optimal Pulse Code Modulation (oPCM) and its application as an audio quality parameter

Optimale Puls-Code-Modulation (oPCM) und ihre Anwendung als Audioqualitätsparameter

The introduction of digital audio (Compact Disc) has boosted the audio quality compared to the old analog media. Since then, the term "digital" has become a synonym of quality. This quality is bought at the price of an enormous amount of data - over 700'000 bits per second (700 kbps) are necessary for encoding a mono signal.

If you reduce this data rate, the sound quality suffers - because you simply throw away some information. But this reduction is needed for low bandwidth channels as a telephone line, the internet or terrestic broadcasts.

Standard digital sound is stored as PCM data (pulse code modulation): The audio signal is measured at a fixed rate ("sampling rate") with a given precision ("resolution"). On a CD, each stereo channel is measured 44'100 times a second with a precision of 16 bits, resulting in 705'600 bits per second. Decreasing this rate can be achieved by lowering the sampling rate or the resolution. But the best idea is to lower both values. Of course, there is a best compromise between reducing the sampling rate and the resolution. These best combinations are called "optimal PCM" (oPCM).

But there are much more intelligent ways in decreasing the data rate with (hopefully!) fewer audible quality loss. Special audio formats ("codecs") were introduced: ADPCM (adaptive differential pulse code modulation), MPEG audio ("layer I, II, III") and RealAudio for example.

How much quality do they gain at a given bit rate compared to uncompressed digital data? In the following table, various codecs are compared with oPCM. From the right to the left, the bit rate decreases (logarithmic scale). At the top of the table, the sound quality is best and decreases to the bottom. The quality parameter is defined by the reference oPCM files on the white diagonal fields.

The bigger the difference between the conventional oPCM files on the diagonal and the test items, the more "intelligent" the codec. The best codecs are on the blue fields.

<b>This table displays correctly with Netscape 3.0 only. Please upgrade your browser.</b><p>


Audio codec quality comparison chart

Compare on your own: Just click onto the test items below.

mono bit rate [kbps] >
16 20 24 28 32 40 48 56 64 80 96 112 128 160 192 224 256 320 384 448 550
^

q
u
a
l
i
t
y

[Q]
9.1                                                         L2                     PCM 9.1
^

q
u
a
l
i
t
y

[Q]
8.8                     L3 L3+ L3                 8.8
8.6                                     PCM     8.6
8.3                   L3     L2                 8.3
8.0                                 PCM         8.0
7.8               L3+ L3+ RA3 L2   ADP                 7.8
7.6                 L3 L2         PCM             7.6
7.3             L3   L2*                         7.3
7.0         L3     L2*         PCM                 7.0
6.8     L3                                     6.8
6.6           RA3     ADP   PCM                     6.6
6.3             L2                             6.3
6.0                 PCM                         6.0
5.8 L3       L2*       L2                         5.8
5.6         ADP   PCM                             5.6
5.3 RA3   ADP                                     5.3
5.0         PCM                                 5.0
4.8                                           4.8
4.6     PCM                                     4.6
4.3                                           4.3
4.0 PCM                                         4.0
16 20 24 28 32 40 48 56 64 80 96 112 128 160 192 224 256 320 384 448 550
mono bit rate [kbps] >

Q gain

> 2.0 net compression ratio ~ 1:4 or better
+ 1.8 net compression ratio ~ 1:3.5
+ 1.6 net compression ratio ~ 1:3
+ 1.3 net compression ratio ~ 1:2.5
+ 1.0 net compression ratio ~ 1:2
+ 0.8 net compression ratio ~ 1:1.75
+ 0.6 net compression ratio ~ 1:1.5
+ 0.3 net compression ratio ~ 1:1.25
0.0 no compression (reference oPCM)
< 0 blow-up

Audio codecs and player sources

PCM reference optimal PCM files
(simulated with standard rates)
  L2   Layer II, with CoolEdit/model #2
player (Maplay), encoder
L2* Layer II, pre-filtered spectrum
filtered at 4; 8; 10 kHz
L3 Layer III with l3enc Ver. 2.0
player (Winplay3), encoder
L3+ Layer III with l3enc Ver. 2.6
encoded with hq option
RA3 Real Audio 3.0
player, encoder
ADP ADPCM (Microsoft Windows)

Remarks

  1. For playing the sample sounds, you must install the appropriate players first. Just follow the links in the box above. With other players, the sound quality may suffer.
  2. The codec comparison tests were performed with cheap PC loudspeakers ("Typhoon 20 W PMPO"). The RA3 and L3+ codecs rated at Q = 7.8 sound significantly better when listened with high quality headphones (relative to the calibration oPCM files): Q = 8.3. This seems to be caused by the psychoacoustic modelling which fails with nonlinear loudspeakers.
  3. Since Layer II and III are open standards, the results may vary slightly from encoder to encoder. All what you can learn from this comparison test is how a specific encoder performs with the given sample sound.
  4. Below 192 kbps mono, the used Layer III encoder l3enc V 2.0 cannot encode a 16,5 kHz tone, which is well within most people's perception limit (e.g. cricket chirp - [Grillenzirpen]). However, the sample sound CEMBALO sounds transparent.
  5. Please let me know if you disagree with a rating. Above Q = 7.8 the offered quality is very close to CD, but you can hear the difference.
  6. The test sound CEMBALO is the beginning of J. S. Bach's canon per tonus from his Musikalisches Opfer. You can find the full length version on "Acoustic illusions".

Evaluating audio compression is complicated because there is no objective quality parameter. This is why the comparison with uncompressed raw data (PCM files) is introduced. The maximum quality that a PCM file can deliver at a given bit rate is determined first. Since there are two parameters that define a PCM file (sampling rate and resolution) the best combination of these two values has to be worked out. This is how optimal PCM (oPCM) streams are defined: They are the best compromise between sampling rate and resolution at a fixed bitrate. oPCM files depend on a single parameter, the perceptional quality parameter Q.

'loss' and 'net compression' Sound data compression consits of two steps:

oPCM files can only compress the sound data with quality loss (presumed that the raw data is also oPCM*). They do not offer any net compression. By this means it is possible to evaluate the important net compression of a given codec candidate. oPCM files serve therefore as reference files.

For example, Real Audio 3.0 sounds at 40 kbps ("ISDN mono") roughly like oPCM at 96 kbps (Q=6.6). This means a net compression ratio of 1:2.4 (Q gain = +1.3). However, most of the total compression is loss.

There is one technical obstacle when generating oPCM files: Standard sound devices do not allow "odd" sampling rates and resolutions. I have therefore simulated the oPCM quality by upsampling to a higher rate and resolution.

In these first tests, the sample sound CEMBALO was used. It has a very broad frequency spectrum and reveals most compression artifacts (distortion, dynamic hiss). It is much more suitable for codec evaluation than narrow-band signals (e.g. string orchestra) that pose no severe problem for codecs. However, CEMBALO has very limited dynamic (no silence intervals) and no isolated high frequencies. The spectrum of CEMBALO is given below.

Spectrum of CEMBALO

The following preliminary data were obtained in 3 listening tests with headsets (Sennheiser Reference II) and the test file CEMBALO. The tests were performed at 10, 25, 37.5, 50, 75, 100, 150, 200, 300, 400 kbps in steps of 1 bit. The plot shows the interpolated curve. The estimated standard deviation is ± 0.3 bit (1 sigma).

Precision measurements are currently performed with other test sounds (including speech), higher resolution and more listeners.

        
  RESOLUTION [bit]     1 kHz   2 kHz   4 kHz   8 kHz   16 kHz  
       16 + . . . . . . . : . . . : . . . : . . . : . . . +   X 
          . . . . . . . . : . . . : . . . : . . . : . . . +  CD
       15 + . . . . . . . : . . . : . . . : . . . : . . . + 
          . . . . . . . . : . . . : . . . : . . . : . . .O+ 
       14 + . . . . . . . : . . . : . . . : . . . : . . .O+ 
          . . . . . . . 80 dB . .:. . . .:. . . .:. . . O+. 
       13 + + + + + + + +:+ + + +:+ + + +:+ + + +:+ + + O+. 
          . . . . . . . : . . . : . . . : . . . : . . .O+ . 
       12 + . . . . . . : . . . : . . . : . . . : . . .O+ . 
          . . . . . . .:. . . .:. . . .:. . . .:. . . O+. . 
       11 + . . . . . : . . . : . . . : . . . : . . .O+ . . 
          . . . . . . : . . . : . . . : . . . : . . O + . . 
       10 + . . . . .:. . . .:. . . .:. . . .:. . OO.+. . . 
          . . . . . .:. . . .:. . . .:. . . .:. OO. .+. . . 
        9 + . . . . : . . . : . . . : . . . :OOO. . + . . . 
          . . . . .:. . . .:. .ISDN:. . .OOOO . . .+. . . . 
        8 + . . . : . . . : . . . X OOOOO : . . . + . . . . 
          . . . . : . . . : . OOOOOO. . . : . . . + . . . . 
        7 + . . .:. . . .OOOOO. .:. . . .:. . . .+. . . . . 
          . . . : . .OOOO . . . : . . . : . . . + . . . . . 
        6 + . .:.OOOO .:. . . .:. . . .:. . . .+. . . . . . 
          . . :OO . . : . . . : . . . : . . . + . . . . . . 
        5 + .OO . . .:. . . .:. . . .:. . . .+. . . . . . . 
          .OO . . . : . . . : . . . : . . . + . . . . . . . 
        4 . . . . ::. . . ::. . . ::. . . + .16 kHz . . . . 
          . . . ::. . . ::. . . ::. . . + . . . . . . . . . 
        3 + . ::. . . ::. . . ::. . . + . . . . . . . . . . 
          . ::. . . ::. . . ::. . . + . . . . . . . . . . . 
        2 +:. + . +:. + . +:. + . + . + . + . + . + . + . + 
          8       16      32      64     128     256     512 -> RATE [kbps]
         3.0     4.0     5.0     6.0     7.0     8.0     9.0 -> Q
                                             

       optimal quality at given bitrate   OOOOOOO (oPCM)  (± 0.3 bit)
       perception limit (loud music)      + + + + (16kHz/ 80dB)
       lines of constant audio bandwidth  ::::::: (1/2/4/8 kHz)

From this plot, the following parameters of the oPCM files were derived. These values are used for the calibration oPCM files in the audio codec evaluation at the beginning of this document.


perceptional quality Q

Q = log2 RATE/kbps
     Q   RATE   RES.  BW    Divisor
         kbps   bit   kHz  216/levels

     4.0   16   6.0   1.333  1024 
     4.6   24   6.4   1.888   800
     5.0   32   6.7   2.396   640
     5.6   48   7.4   3.263   400  "telephone"
     6.0   64   7.7   4.168   320  "AM/MW"
     6.6   96   8.0   6.000   256 
     7.0  128   8.4   7.659   200
     7.6  192   9.4  10.261   100  "Dolby B"
     8.0  256  10.0  12.800    64  "FM/UKW"
     8.6  384  12.7  15.144    10  "DSR" 
     9.0  512  15.0  17.067     2
     9.1  550  16.0  17.188     1  "CD" 

     RATE: bit rate of oPCM data, per channel
     RES.: sampling resolution
     BW  : bandwidth = 1/2 smpl-rate  

Example: The sample sound A has the quality of Q=6.8 (or 68 dQ, deci-Q) when it offers a better quality than a oPCM file at Q=6.6 (96 kbps per channel), but a lower quality than oPCM at Q=7.0 (128 kbps per channel). If A is a stereo sound, it must be compared with stereo oPCM sounds. The Q parameter is defined as the binary logarithm of the oPCM bitrate per channel. You must indicate whether the Q value refers to a mono or a stereo sound.


The oPCM evaluation method compared to the old ISO methods

Sound quality evaluation is not a new topic. Until now, audio quality tests were performed as follows: The "expert listener" was presented with a CD quality file and a test file. Then, the listener had to rate the test item with school notes ranging from 5.0 ("no difference to CD") to 1.0 ("extremely annoying distortions").

The shortcomings of this method (used by ISO) are obvious: The rating is rather subjective and the only reference item is the perfect CD quality. That is why the listener only knows what the rating "5.0" means. The lower ratings are strongly a matter of taste because they are not strictly defined. As a consequence, the results vary from listener to listener.

The oPCM method overcomes these problems: The listener knows exactly what a rating of e.g. Q=6.7 means: The test item sounds better than the Q=6.6 oPCM refernce file, but worse than the Q=6.8 oPCM file.

As an illustration, think about "sound intensity evaluation". A fictive "intensity scale" could be: "5.0 = as loud as reference item", "4.0 = a little bit weaker", "3.0 = weaker", "2.0 = much weaker", "1.0 = very much weaker". It is obvious that this fictive scale would produce very subjective results. The objective decibel (dB) scale would be much better. The oPCM Q-scale for quality evaluation is like the decibel scale for intensity evaluation.


Thanks to: Nils Ehlert (Germany) and Bernhard Weber (Germany) for listening tests; Ross Lewis (New Zealand) for some Layer III streams.


Stefan Scheller, webmaster@sciencesite.de, 18.10.96