Part I: Fourier Transforms and Sampling

This section is concerned with: (1) the relationship between the time variation of a signal and its frequency spectrum, and (2) digital signals, such as music on a compact disk. This material is useful background for Part II, digital filters. However the theory is not essential for understanding the advantages of digital filters, which can be directly appreciated by viewing the performance comparison of one analog and three digital filters in part II. Part III, maximum length sequences, considers a specialized type of digital filter used in audio measurements.

Another application of digital signal processing is to compensate for room acoustics. Commercial products are available I believe, but the links I had here before no longer work. (Thanks to Don Maurer for raising the issue of room compensation). Don also mentioned a book on digital signal processing which he has found useful.

Back to the main Table of Contents

Part I consists of:

Continuous Signals and the Fourier Transform

A physical signal, such as sound pressure, or the output voltage of an amplifier, can be represented as a continuous function of time. For example, a pure tone produces a sound pressure proportional to f(t)=sin(ωot). This is the time domain representation of the sound. There is an equally valid frequency domain representation of a pure tone. There are two points that can be conceptually difficult. First of all, in the real world the tone can't last for an infinite period of time. As shown in Appendix B, a "pure" tone will therefore contain an infinite spectrum of frequencies. Second, even if the tone were infinite in time, the spectrum would consist of spikes at both ωo and -ωo. Note that "frequency" can mean Hz (cycles per second), or radians per second, the later usually indicated by use of the symbol ω.

The mathematical expression [exp(jωt) - exp(-jωt)]/2j for a sine wave illustrates why negative frequencies are necessary. The positive and negative parts of the spectrum combine to produce a real function at a positive frequency. If the negative frequency was not there, the time function would contain an imaginary part. Negative frequencies are simply a result of the way the mathematical analysis is structured.

The significance of the duration of the tone is quite real. I originally included a file consisting of a single cycle of a 440 Hz tone, and subsequently found out that many computer sound systems did not reproduce any sound at all! It sounds kind of a like feeble thunk or tap. Here is a 1/2 second sample [86 kb] of the tone. It takes around 10 cycles [4kb] before the signal begins to sound at all like a tone, and with 25 cycles [10kb] it begins to sound much like the continuous tone. The spectrum of one-cycle of a signal is discussed further below.

The frequency domain representation is very convenient for filter design and analysis, among other applications. The magnitude of this function is normally called the "frequency response," but it is important to realize that the phase must also be specified for the frequency domain representation to be complete.

A Fourier transform pair mathematically relates the time and frequency domain representations,

In the real world f(t) must be a real function, which implies that the complex conjugate of F must satisfy F(-ω)=F*(ω). This is the mathematical shorthand for the need for negative frequencies.

As noted above, if f(t) exists for a finite time window, the spectrum spreads over an infinite frequency range. As an example, see the spectrum of one cycle of a 440 Hz tone [42 kb]. Not only is the spectrum spread out, but the maximum occurs at 368 Hz (the red circle) rather than 440. This shift is a consequence of the sidelobes of the spectrum peak at -ωo spilling into the positive frequency region (see the derivation in Appendix B).

A Fourier transform is a linear relationship, but this should not be confused with possible non-linearities in the system that produces f(t). The only place that linearity of the system itself arises in the context of this analysis is in connection with convolution, as also discussed in Appendix B.

Back to the list of subsections

Impulse Response

If f(t)=δ(t), (a delta function, an infinitely narrow voltage spike at t=0), then F(ω)=1. This is an ideal "flat" frequency response, and the corresponding ideal time domain response. The starting point for measuring time is totally arbitrary, so if the function f is delayed by a time to, it is really the same signal. For any function a time delay multiplies the frequency domain response by exp(-jωto). Conversely, phase that varies linearly with frequency simply represents a shift in the time origin.

The response of a system to a delta function input is called its impulse response. The frequency response of a filter, or any system, is equal to the Fourier transform of its impulse response. This is an exact relationship, and is commonly used by measurement systems (such as CLIO) to determine the amplitude and phase response of a loudspeaker.

Back to the list of subsections

Sampled Signals and the Discrete Fourier Transform

A compact disk represents a music signal f(t) by specifying its value at a sample rate of Fs=44,100 times per second. As well as being time sampled, the 16-bit quantization on a standard CD results in 216=65536 discrete amplitude levels, corresponding to a dynamic range of 96 dB.

Discrete time and frequency representations are related by the discrete Fourier transform (DFT) pair. The fast Fourier transform, (FFT), is a very efficient numerical method for computing a discrete Fourier transform, and is an extremely important factor in modern digital signal processing.

To derive the DFT, we begin with a continuous function f(t) defined by

 where the frequency bins are specified by

Then define

where the sample times are specified by

The basis for a discrete Fourier transform is to relate the sampling interval Δt and the frequency interval Δω by

Then writing

The inside summation can be re-written in the form of a geometric series

Using equation (6) the numerator of the m≠p case is always zero, and the denominator is never zero as long as 0≤m≤N-1 and 0≤p≤N-1. Therefore F(ωp)=ap. Then for the discrete values tn in equation (2) and ωm in equation (4), the two equations form a discrete Fourier transform pair.

The impulse response for a sampled system is the F(ωm) that results from a time sequence that is zero for every sample except one. Shifting the time of the non-zero sample linearly shifts the phase of F, analogous to the continuous case.

Back to the list of subsections

Resolution, Bandwidth, and Sample Time

The intervals Δω and Δt represent the resolution, or "graininess" of the responses, and represent a kind of uncertainty principle. The frequency interval Δω/2π Hz is equal to 1/NΔt, the reciprocal of the total sample time. This can be interpreted as follows: frequency can be determined by counting zero-crossings, each corresponding to a half-cycle. Suppose that in a one-second interval there are 6 zero-crossings. Depending on the phase of the signal at the start and finish, this could be produced by any frequency between 2.5 and 3.5 Hz. If a two-second interval, there would be 12 zero-crossings, which could be produced by any frequency between 2.75 and 3.25 Hz. So there is always an uncertainty in frequency equal to the reciprocal of the total time sample length.

There is a similar relationship between the time interval and the total bandwidth. The total bandwidth in Hz equals 1/Δt. This corresponds to the maximum value of ωm in equation (2), and for a compact disk it is 44,100 Hz. But as discussed below, the "real" bandwidth is 22,050.

Frequency resolution can be improved to whatever degree is desired by zero padding, appending a string of zeros to the time domain sequence to increase the total sample time. The cost is increased data processing.

Back to the list of subsections

Periodicity

A unique feature of the discrete Fourier transform, in contrast to the Fourier transform, is that both resulting representations are periodic: f(t) repeats with a period equal to the total sample time NΔt, and F(ω) repeats with a period equal to the total bandwidth NΔω. For any integer p

Equation (9) can be used to re-write the summation over frequency in terms of negative and positive frequencies. The most efficient form of a FFT requires that N is a power of two, so we will assume that N is even. We then can write the summation

In this form we see that for f(t) to be real we must again have F(-ωm)=F*(ωm), and F(0) and F(ωN/2) must be real. In this form it is also clear that the highest frequency is 1/2Δt=Fs/2, or 22,050 Hz for a compact disk. What appear to be higher frequencies in equation (2) are really just the negative frequencies shifted one period higher. (But the full bandwidth, covering positive and negative frequencies, is still 44,100 Hz).

Back to the list of subsections

Leakage and Allowable Time-delays

The periodicity and symmetry of F result in constraints on time-delays. For a delay to, the phase of F(ωm) would be multiplied by exp(-jωmto). But this term will not have the correct periodicity unless to is an integer multiple of Δt. If F is phase-shifted in violation of this condition, drastic changes occur if f(t) is computed using equation (2), with all positive values of ω. If equation (10) is used, the differences are smaller, but f(t) is still not perfectly time-shifted unless this condition is satisfied. Normally time can simply be defined to be restricted to these discrete intervals, and this is not a problem.

The equivalent situation in the frequency domain is more problematic. If we are given a signal that contains a frequency that doesn't fall exactly on a multiple of Δω, the FFT of the signal will in general contain responses in all of the frequency bins. This is called leakage. Leakage can be minimized by multiplying the time waveform by a window, that tapers the amplitudes at each end of the full time sample, as discussed in Appendix B. Leakage also reduces the peak of the FFT response, an effect called processing loss, or scalloping loss.

Any operation on F, for example if F is modified by a filter response, must preserve the conditions stated following equation (10), in order to keep the response real.

 Back to the list of subsections

Aliasing and the Sampling Theorem

A rather remarkable theorem1 states that if a music sample (or any signal) does not contain any frequencies higher than fo Hz, it can be perfectly reproduced by sampling the signal at a rate of Δt =1/2fo. It is not surprising that a signal can be closely approximated by sampling, but it's amazing that it can be done with as few as two samples per cycle (almost - see discussion below), and with no error at all! A simple derivation of this important theorem is included in Appendix A. There are three practical qualifications which spoil the perfection: (1) A signal that is finite in time cannot be perfectly band-limited; (2) digitizing the amplitude levels introduces error; and (3) the theorem is based on signal reconstruction utilizing a series of sinc functions (see Appendix A), each of which is infinite in time, which in reality must of course be truncated.

The first qualification is related to the flip side of this theorem: when there actually are higher frequencies present, they will be falsely reproduced as lower frequencies - an effect called aliasing. This is easily shown by the relationship

Therefore at the sample points tn, a signal of frequency 2fo-f will have exactly the same values as a signal of frequency f. For example, for a compact disk where fo=22,050, a signal at a frequency of 34,100 Hz will look exactly like a signal at a frequency of 44,100-34,100=10,000 Hz. This can be shown graphically as well [44kb]. To avoid this problem, music is passed through an anti-aliasing filter, to attenuate frequencies above 22,050, prior to sampling. This requires a steep cutoff (brickwall) filter that can affect the sound quality if not done carefully.

The second qualification means that enough bits must be used to make the quantization error negligible.

Back to the list of subsections

Signal reconstruction

The third qualification means that the digital-to-analog (D/A) converter must closely approximate the sinc series to obtain accurate reconstruction. It may not be obvious, but the sample values themselves do not necessarily faithfully follow the waveform contour. An example of a 8 Hz tone sampled at 20 Hz [45kb] shows that the sample values fluctuate with a period of 4 Hz. The black curve is the tone; the small red circles are the sample values; the red curves are the sinc functions associated with each sample value. When all of the red curves are summed, the result overlays the black curve. (Actually there is a minor imperfection due to the finite 5 second length of the sample, but it is visible only within a cycle or two from the end points).

An ideal D/A would reproduce the signal in the following way: each output sample is converted into a very short voltage pulse, which is put through a low pass filter with a sinc function time response. The difficulty is that such a filter would have a response of 1.0 for frequencies less than Fs/2, and zero above Fs/2 - i.e. an infinite cutoff slope, impossible to achieve in the real world. Early CD players did strive to approach this response with analog brickwall filters, exactly like the anti-aliasing input filters (in this case called anti-imaging filters). Current practice is to oversample the output sequence. As an example, consider oversampling by a factor of four; three new samples of value zero are added between every pair of old samples. For a CD this makes the new output sample rate 176,400 per second. Then the sequence can be digitally filtered, where the ideal response can be approached with much less difficulty.

Consider the sum of the sinc functions in the 8 Hz example at time to. The red curves for samples at future times t>to are just as significant in the summation as those for past values. In practice this doesn't violate causality, it just means that the digital filter output needs to be delayed to include these contributions. The oversampling filter coefficients are simply the values of the desired sinc function at the sample times, and the series is truncated when the sinc function is sufficiently decayed. The truncation point determines the delay.

The resulting spectrum still repeats periodically, but now every 176,400 Hz instead of 44,100 Hz. Therefore a final analog filter is still required to eliminate frequencies above 88,200 Hz, but now the cutoff slope can be relatively gentle.

A subtle (and potentially confusing) point is that the highest frequency must be slightly less than, not equal to, Fs/2. This is obvious from the fact that the samples of a sine wave of frequency exactly equal to Fs/2 could all be zero. Mathematically the spectrum in this case consists of delta functions that fall exactly at the end points of the integral of equation (1a) in Appendix A. This is invalid and will not yield the correct result for f(t). But with an infinitesimal reduction in frequency all is well. Studying the example of the 8 Hz tone sampled at 20 Hz [45 kb] makes it clear that as the signal frequency approaches Fs/2, more and more sinc terms are required. Therefore if one attempts to approximate the sinc function with a truncated version, the allowable truncation point stretches out as the frequency approaches the limit. For this reason it is highly desirable to maintain a guard-band between the highest frequency and the theoretical limit.

Back to the list of subsections

Relationship of Fourier Transforms and Fourier Series

For a Fourier series the time function is periodic, but the frequency function is not. The Fourier Series is a limiting case of the discrete Fourier transform, where the sample interval Δt → 0. Then the bandwidth becomes infinite, and there is no periodicity in the frequency domain.

If in addition, NΔt → ∞ , then Δω → 0, and the result is a Fourier transform.

Back to the list of subsections

Appendix A: Derivation of the Sampling Theorem

We assume all functions are mathematically well behaved (e.g. continuous). A function f(t) band-limited to ±ωo means, by definition, that it can be expressed by a Fourier integral over this range,

Within this range F(ω) can be expressed as a Fourier series,

Then

Which is easily evaluated to be

Then since

The result is

 Back to the list of subsections

Appendix B: The Convolution Theorem and Windowing

An important signal processing tool is the Convolution theorem. Given two time domain functions f(t) and h(t), and their Fourier transforms F(ω) and H(ω),

convolution is defined by

Convolution is a linear process, so g(t) must be a linear function of f(t) to be expressed by equation (1b). The convolution theorem states that the Fourier transform of g(t) is

Similarly, the inverse Fourier transform of the product of f(t) and g(t) is equal to the convolution of F(ω) and H(ω).

There is exactly the same kind of relationship for sampled signals; the integral in equation (1b) is replaced by a summation, and the continuous times and frequencies are replaced by discrete values.

This relationship between convolution in one domain and multiplication in the other is useful in filter design. If H(ω) is the response of a filter, h(t) is its impulse response, and equation (1b) shows that the filter response g(t) is the convolution of its impulse response and its input f(t). For the sampled case each input sample generates an impulse response weighted by the input value f(tn), and the convolution theorem simply states that the total response is the superposition of these impulse responses.

The convolution theorem also helps illustrate the behavior of windowing. Consider the case of a sine wave and a time window h(t):

The Fourier transforms are

Therefore, the Fourier transform of a sine wave that exists only during a time period of length T is the convolution of F(ω) and H(ω)

The example of this type of function mentioned in the text, one cycle of a 440 Hz tone [42kb], exhibits a spectrum with sidelobes that extend from each maximum to ±∞. As noted above the spectrum peak occurs at 368 Hz (red circle) instead of 440 Hz due to the sidelobes of the negative frequency peak spilling over into the positive frequency region.

The convolution theorem also proves that a signal that is finite in time has an infinite spectrum: the response can always be expressed as a convolution with a sinc function, which extends the spectrum to infinity. The symmetrical case is that a strictly band-limited function must be infinite in time.

A time window which is unity over the interval T and zero elsewhere is called a boxcar, and produces sidelobes defined by the sinc function. If the window is replaced by a function that is tapered over the interval, the sidelobes can be very much reduced. There are many such window functions, some of the most popular being the Hanning, Hamming, Blackman, Chebyshev, and Kaiser functions.

Back to the list of subsections

______________________

1. The sampling theorem is commonly attributed to either Harry Nyquist, or Claude Shannon. Pohlmann refers to even earlier proofs by E. T. Whittaker, and K Ogura.