Macdoc Digital Audio Primer v.1
prepared by Nicholas Ursa (music@macdoc.com)
Last updated Apr 10, 2002
Acoustics
Sound is a disturbance of air pressure. Normally, the ear is equally dense, but a sound source makes the air denser and looser at a certain place, and then this disturbance spreads out spherically, like a ripple in a pond
.
Our ear is a sensitive measuring device particularly capable of interpreting rapid changes in air pressure. This information is sent to the brain for further processing, where we interpret the data and construct an acoustic reality based on previous experience. We are particularly good at hearing relationships between various sounds. For example, a frequencies of 440 and 660 Hz (A and E) "sounds" like a perfect fifth. We can hear the relative simplicity of a bass guitar, and the complexity of the clatter from hitting a garbage can with a hammer. But out hearing ability is limited to a certain range of frequencies. Although it varies from individual to individual, deteriorating as we get older, it is roughly the frequencies between 20 Hz and 20,000 Hz (or 20 kHz) that we hear with any loudness. Most sound equipment is built to these specifications. If you can hear audio on your computer, Click the links below to hear various pure sine waves at various frequencies.
|
100 Hz |
500 Hz |
2000 Hz |
10 000 Hz |
Here is a picture of the above sine wave, shows as air pressure intensity over time:
Hertz (Hz) is a measure of how quickly a steady wave swings up and down, measured in cycles per second. A cycle is one "period" or how long it takes for a wave to get back to the same point of it's cycle.
Electricity is the medium with which we store, transmit, and manipulate sound in the analog world. Using a microphone, we can create an electrical impulse in the same shape as the original sound pressure impulse of the sound. Using and amplifier and loudspeaker, the electrical impulse can then move a speaker cone back and forth, recreating the original sound. Using a mixer, we can add together the various sounds and create a combined waveform from many of them. We can also store electrical impulses on magnetic tape, altering the polarity of the magnetic particles according to the intensity of the sound. This is the basis of tape-based recording, the precursor to digital recording.
When we use the term "digital audio", we are speaking of the digitization of sound. We try to represent the sound as a series of numbers (digits) instead of an smooth, but imprecisely defined wave. Why would we wish to do this ? A series of numbers can be transmitted from location to location with no signal loss. If you took a tape rocording of a concert and made a dub of it. Then made a dub of that, and continued the process, you would notice the signal getting worse and worse with each dub. This is because the noise and faults of the each transfer are recorded, and then added to the next generation. However a digital system sends a series of numbers like 10323, 10548, 23292, down the wire, and with a little error checking to make sure nothing got garbled, the result if a perfect copy each time. Digital information can also be stored and manipulated in a computer very easily.
Sampling
The basis for makind audio digital is a process known as sampling. A measurement of the signal level is taken (sampled) every so often and assigned a number based on how high or low the signal is at that particular point. The following chart illustrates the process. A stream of numbers is generated, which can be stored in a file, mixed with other streams of numbers (thw equivalent of an analogue mixer) or processed by the computer in more sophisticated ways, such as compression or reverberation. This is called analogue to digital converson, or A/D conversion for short. To play back this data, the computer basically reverses the process and plots the data, connect the points in between to make a wave again. This is known as D/A or digital to analog conversion. All CD players are Digial to Analog converters.
The picture above shows how an analog signal, like that coming from a microphone or mixing board, is converted into a series of numbers. The circuit that does this is called the Analog to Digital Converter, or A/D chip, and measures the voltage of the signal at a given time. For the purposes of sampling, the chip measures the voltage at even rate, called the sample rate. On the left side of the aboive picture, a the A/D converter has been sampling every 0.002 seconds, a sample rate of 500 Hz. It is sampling, in this case, a pure sine wave of about 25 Hz. A stream of numbers, shown in the yellow box.
To get an analog signal back, so that it can be sent to an amplifier for example, an reverse chip, the D/A converter is used.This device generates an analog signal by connecting the points of recorded data with a smooth line.
Sample Rate
The choice of sample rate is important because if we choose too low a rate, we may not get all the higher frequencies present in a signal. This can be demonstrated as follows. The sollowing signal, in blue, was sampled at too low a rate to catch all the ups and downs in the signals. It is the equivalent of opening your eyes only ever ten seconds and then trying to describe a basketball game. You miss most of the detail.
When reconstructed, the D/A converted jst connects the dots, and all the busy, high frequency information was lost.. This is what the effect sounds like:
There are a few standard sampling frequencies in use. The most common is 44,100 Hz, or 44.1 kHz, which is what CDs were standardized on. There is a rule called Nyquist's Law, which states that for any given sampling rate, the greatest frequency that can be captures is half of that rate. So 44.1 kHz gives us the ability to capture frequencies from 0 to 22,050 Hz. This number was decided on by Philips and a consortium of auidio manufactures in the late 70s for use in the Audio CD standard. 22 kHz was believed to encompas the range of human hearing. Film and television, however, usually use a rate of 48 kHz, affording an upper frequency range of 24 kHz. Some high end digital audio system are touting 96 kHz, though is is debatable if the sound is improved upon any.
Other than sampling rate, there is another value we sould be concerned with
Bit Depth
A bit of knowledge of binary math would help. Here are some links to simple explanation of how it works:
http://www.fhi-berlin.mpg.de/amiga/ar/ar119/p1-10.HTML
But for now, we should be aware that is a computer uses 16 bits to assign a value to a particular sample, then there a 65536 possible levels. If the computer uses on 8 bits, there are only 256 possible levels. This is the equivalent of truncating decimal numbers. Consider the problem of meauring the length of somthing. Let us say that the real length of this object is 1.722388124 meters. Now I give you a piece of paper and say you can only write two digits to express the height. you would have to write 1.7. If I say 4, then you can write 1.722. The more digits I give you, the more accurately you can record the length. each successive digit at the end of the decimal allows a more accurate description of the height. In the same way, computers can more accurately determine the "height" of an audio signal at any given point if they can measure to a greater degree of accuracy. The standard for CDs if 16 bits, or 65536 "levels" between silence and full volume. Many audio cards now feature 24-bit recording, which allows a greater amount of detail in the signal. In old fashion analog terms, higher bit depth affords a better dynamic range. 24-bit recording is more sensitive to volume changes, especially to quiet things. It also allows you to not have to track as hot (and risk clipping) since even if you turn up the volume later on, there is still alot of good dynamic information. Increased bit bepth provides greater dynamic range.
Here are two examples. While I cannot show you 24 bit recording, becuase you likely don't have the equipment to play it back, the noise that 8-bit intorduces to the signal is very apparant in music with large dynamic ranges, like this orchestral hit and subsequence quiet:
The difference between 24-16 is much more subtle, but present. 24 bit gear has always been more expensive because of the greater procesing power and lower noise requirements in the hardware.
Let me know if you found this useful.