Simple sound generation

We can generate the simplest of all sounds – a sine wave by writing a program that places values in a file that can be sent as raw data to the sound card’s DAC.

The choices we will have to make are:

the sampling rate. Let’s call it sr. We will often use 44,100 samples per second, the rate at which CDs are recorded. Other common ones are 48,000 in recording studios and 22,050 or 11,025 for sound in computer games.
the sample size. We will use 16 bits or 2 bytes. Call this ss.
the sample format. There are many varieties of this, including ones that decrease noise. We will choose a simple signed integer between -32767 and 32768. This will give us linear pulse coded modulation, or linear PCM. This is the most common format and is the one found in most .wav files, and on CDs.
The amplitude of the signal. Call this A.
The frequency of the wave. Call this f cycles per second.
The number of channels. We will just use mono here.
The duration of the sound. Call this T.

Space requirements

First, how much computer space does sound take up? If we use sr = 44,100, and ss = 2. We get 88,200 bytes per second, or 5,292,000 bytes per minute of mono sound. We can double this for stereo to give about 10MB per minute. A typical CD holds 600MB, so we can store about 60 minutes of sound or music on a typical CD.

Generating a sine wave

In the continuous (real) world, we use the formula

a_t = A × sin(2 πft)

to represent a simple sinusoidal waveform. The function is zero at t = 0, t = 1/2f, and t = 1/f (actually any multiples of these as well). This is because sin(0) = sin(π) = sin(2π) = 0. The interval 1/f secs represents one complete cycle of the waveform, as below.

In the graph we have taken f = 1000, and the x axis is in milliseconds. The maximum amplitude is 1.

Digitizing the sine wave

When we digitize a continuous signal like the sine wave, we take samples every 1/sr seconds, where sr is the sampling rate. The continuous time, t, is replaced by the digitzed n/sr where n is the sample number. We can thus use the formula s_n = A × sin(2πfn/sr) where n is a positive integer to generate samples for a frequency of f cycles per second. To make the arithmetic easier, imagine sr = 10,000 and f = 100. n should then range over 0 to 100 for one complete cycle to be output. This is pictured below; the x axis is now sample number.

If f = 1000, with the same sampling rate, then n ranges from 0 to 10 for one cycle, as below.

Clearly the higher the frequency the poorer the sampling is, unless we raise the sampling rate as well. At the Nyquist cut-off frequency of 5,000, n ranges from 0 to 2 just as we expect. In fact at this frequency the three samples are all zero, which is why we can only represent frequencies less than half the sampling rate.

The amplitude A depends on the sample size, ss. The maximum value of A is 2^n-1 where n is the number of bits in each sample.

Playing the file

Assuming we have written samples representing T seconds of our sine wave, we need to send this file to the DAC on the sound card. The command line program aplay can do this. We need to give it the parameters we have set in the file, namely sr, ss and the format. To play a file out.raw of linear PCM samples at 44,100 samples per second in one channel, we need:

aplay –c1 –fS16_LE –r44100 out.raw

S16_LE is the format with signed 16 bit integers with the normal byte order (i.e. ‘little-endian’ - some architectures need the bytes swapped and become ‘big-endian’). With luck the file will sound out the sine wave at whatever frequency was used in the formula.