Non-linear Behavior of the Ear

The more I learn about the ear, the more I am amazed. Reading a very interesting review, "Nonlinear Cochlear Signal Processing," by Jont Allen (Chapter 19 of Physiology of the Ear, Jahn et. al.) was my most recent learning experience, and a source for some of the material here.

Non-linearity and Distortion

Non-linearity in a sound system produces unwanted harmonic and intermodulation distortion, and perfect linearity is the ideal. But for the ear, non-linear behavior is far from being a flaw; in fact it is a critical feature that enables the large dynamic range of human hearing. The inner hair cells (IHC) of the cochlea, which convert sound to nerve impulses, have a dynamic range of less than 50 dB. But we can hear over a 120 dB dynamic range! How is this possible? It turns out that the ears have a built-in sound level compression system, created by the outer hair cells (OHC) of the cochlea. In the most active region of the cochlea basilar membrane, a 4 dB increase in sound pressure at the eardrum increases the membrane motion as little as 1 dB, due to mechanical action of the OHC.

The non-linearity of the ear has been known over a century, but it was relatively recent that the OHC of the cochlea were identified as the primary cause. The middle ear is quite linear over sound pressures of 40 to 110 dB SPL, and does not result in noticeable distortion at normal listening levels (Hartmann page 512). The inner ear non-linearity does produce distortion, which can be heard, and measured in the ear canal. In fact, the measurement of distortion products in the ear canal is used as a hearing test for newborn infants, since the distortion products are absent for certain forms of hearing impairment!

Intermodulation Products

In principle the ear non-linearity could be expressed as a power series; that is, where the response is linearly proportional to the sound pressure, plus a term proportional to the square of the sound pressure, cube of the pressure, etc. If two frequencies are present, a square term would produce intermodulation products equal to the sum and difference of the two frequencies. A cubic term would produce products equal to twice one frequency plus and minus the other frequency.

The ear response is approximately proportional to the cube root of the sound pressure. Offhand I don't know of an analytical expression for the intermod products in this case, but I ran a FFT of a cube-root response with excitation frequencies of 1 kHz and 1.2 kHz, and I got spectral lines at every multiple of 200 Hz - i.e. 200, 400, 600, etc. So this type of response produces a rich harvest of intermodulation products.

A intriguing question is: if there are two frequencies above the limit of human hearing, say at 23 kHz and 24 kHz, would the non-linearity cause audible intermodulation products below 20 kHz? Everest states (page 55) that a difference tone of 1 kHz can be heard in this case. I did several tests to see if I could hear such a product, and I could not. Hartmann also states (page 514), without mentioning the frequencies, that a difference tone can be heard for tone levels greater than 50 dB SPL. I also could not hear any difference tones listening to 4 kHz and 5 kHz tones. I thought this indicated that the non-linearities of the ear were virtually inaudible. Not true! It turns out that a difference tone of 2f1-f2 is quite audible, for certain choices of frequencies.

The ear apparently also produces harmonic distortion. This is difficult to detect directly, but there is convincing indirect evidence from tests on the audibility of phase differences between a tone and its second harmonic (see section on audibility of phase)

Localization on the Basilar Membrane

The basilar membrane of the cochlea is about 35 mm long. A pure tone creates a vibration that is highly localized along this length. The point of maximum excitation is narrow and moves about 5mm per octave, with the lowest frequencies furthest along the membrane. The vibration can decay by as much as 100 dB per mm away from the point of maximum excitation.

This means that when I did my test using 4 kHz and 5 kHz tones, there were two narrow, separated regions of the membrane that were vibrating. Therefore there was essentially no OHC interaction between the two frequencies, and thus no intermod products. A necessary condition to obtain distortion products is to have two frequencies fairly close together, such as 1 kHz and 1.2 kHz. Hartmann (page 257) states that two frequencies have to be within a critical band to produce distortion products. The critical band at 4 kHz is 456 Hz, using Hartmann's Cambridge equation.

Frequencies above 20 kHz fall outside of the range of active regions on the membrane, and thus create little OHC excitation. Therefore intermodulation products between very high frequencies seem unlikely. If anyone does a test that indicates that Everest is correct regarding the audibility of these intermods, please let me know.

A New Experiment

Using CoolEdit it is easy to generate a signal containing equal parts of a 1 kHz and a 1.2 kHz tone, and with earphones I can clearly hear an 800 Hz distortion product. But is the non-linearity in my ear or somewhere in the sound system? I repeated the test using two separate signal generators as inputs to the left and right channels of my sound system. Thus the two tones are never together until they are radiated into the room. The result was not as clear-cut as with headphones. Each tone by itself has a very clear bell-like sound; the two tones together sound like a murky mess. I can't really say that I could identify a particular distortion tone, but I do believe I was hearing distortion.


It has also been known for a long time that one (masker) tone can make a second weaker (probe) tone inaudible, which otherwise by itself would be audible. The greatest effect occurs when both frequencies are within the same critical band. The non-linear behavior of the ear has a major effect on masking. For probe frequencies lower than the masker frequency, it takes more than a 1 dB increase in masker intensity to cause a 1 dB increase in the masking threshold. For frequencies above the masking frequency, a 1 dB increase in masker intensity can cause a 2.4 dB increase in masking threshold. There is also a minimum level of masking intensity, which increases as the probe frequency increases. For example, for a masker at 400 Hz, there is almost no masking effect until the masker intensity is above 16 dB SPL for a probe frequency of 450 Hz. For a probe frequency of 3 kHz the minimum masker level is 60 dB SPL. Masking is sensitive to the relative phase of the two tones, as noted in the section on audibility of phase.

Back to Music and the Human Ear

To the main table of contents