Sound Demo Files, HRTFs

Each music file is roughly 2MB in size, and to avoid a surcharge by my ISP, I am only posting a few of them on this site. If you would like to obtain the files that are not posted, send me an e-mail, and I will send them to you. Please ask for a maximum of 4 specific files per request. (If anyone has storage to spare and would be willing to host the files, let me know). There is a summary list of all of the available demo files at the bottom of this page, which indicates the files that can be immediately downloaded.

I recommend good quality headphones for listening to these files (I use Sennheiser HD-545's). Otherwise effects will be washed out by the quality of computer speakers or the room you are sitting in. Standard computer sound cards are not the ultimate in fidelity, but they are adequate.

Arny Kruger has posted free downloadable software for performing ABX tests to compare two .wav files. His site also includes a selection of sample files to illustrate the effect of filtering out high frequencies, truncating bits, etc. One way to make A-B comparisons of multiple sound files is to open multiple Media Player windows, one for each file, and then play the files quickly in sequence. But Arny's ABX tester is much better.

Reference Files

All music demos are based on a comparison of a reference source vs. the same music perturbed in some way. The two reference sources are: (1) a tidbit from Vivaldi's Four Seasons, [1.5Mb] Archiv Produktion CD 400 045-2; and (2) a jazz selection from Joe Sample, [1.7Mb] Warner Bros. CD 946572-2. Both files are 16-bit 44,100 sample rate .wav files, each about 10 seconds long.

Amplifier Distortion

This demo is a set of three music files that mimic the distortion produced by three amplifiers. The average output power in each case is 16% of the maximum power that the amp can deliver. But the total harmonic distortion (THD) spec at this power level varies greatly in the three cases: (1) a single-ended triode tube amp at 4.0% THD; (2) a push-pull pentode tube amp at 0.2% THD, and (3) a solid state amp at <0.001% THD. Details of how the files are created can be found in the section on amplifier distortion.

High Frequencies

Any decent tweeter will produce frequencies up to 20kHz. A lot of music doesn't have much spectral content above 10kHz, but the two reference files used here do have a lot of high frequency content, right up to the 22kHz limit. A good way to test the high-frequency response of your ears is to compare two files containing tones at different frequencies. If you can distinguish between the two, you can hear at least one of them. I recently did this (July 2006) with tones at 13kHz and 14kHz. I am happy to say that my 67-year old male ears still function up to 13kHz, but that's it.

I applied a brick-wall filter that totally eliminates frequencies above 6, 7, 8, or 10 kHz to the two music reference files. I found the 6 kHz cutoff very easy to detect, and I can reliably detect the 7 kHz cutoff as well. For the 8 kHz cutoff my success rate is not statistically significant.

Frequency Discrimination

There are three files: each consists of a pair of tone bursts. The first file contains two bursts at 400 Hz; the second file contains a burst at 400 Hz followed by a burst at 402 Hz; the third file contains a burst at 400 Hz followed by a burst at 404 Hz. See if you can discriminate between the files.

Crossover Effects

This is a test of an ideal analog crossover, meaning that: (1) each filter sees a pure resistive load, and (2) the filter outputs sum perfectly and coherently. The first assumption is fairly realistic for an active crossover, but not for a passive crossover. The second assumption requires perfect time-alignment, and then is still only true for a small "sweet spot." Graphs of the crossover responses and more design details can be found here. An ideal Linkwitz-Riley crossover only effects the phase of the summed signals. When I apply the phase change to the reference files I can't hear any difference. Arny also a Linkwitz-Riley crossover test file on his site.

Volume

Buying a 100 watt amplifier instead of a 200 watt amplifier might cause you to play your music at half the power level, or 3dB lower. A 3dB decrease in the reference files shows what you would be missing (hint: not much, but I paid for lots of watts for my system anyway).

HRTFs

I really had a lot of fun with this! The starting point is a monophonic sound segment, recorded with 16-bit resolution and 44,100 samples per second. The segment is then convolved with the HRTF impulse response for left and right ears. (More on HTRFs below).This yields stereo files which are (theoretically) what ears would receive if the sound source was located at angles of -60, -30, 0, 30, and 60 degrees to the left, or right, respectively. The source is 20 degrees above line of sight. The demo file sequentially plays the sound synthesized for each of the 5-angles.

When I listen to this file I definitely hear the sound coming from different locations, but the illusion is not perfect. The primary reason for this is that my brain has had many years of experience with my HRTFs, and the HRTFs used in the processing are measured using a KEMAR dummy head. (I am a bit surprised that this demo works pretty well with computer speakers as well as headphones).

There is a HRTF for every angular direction; one example is shown here [48 kb]. This shows is a graph of the sound level arriving at each eardrum, as a function of time, for a sound source consisting of a single pulse. If the ear received an "ideal" flat frequency response, this graph would consist of one extremely narrow spike for each ear. The fact that peak of the red curve occurs before the peak of the blue curve reflects the fact that the sound source is closer to the right ear and sound arrives there first. The same geometry results in a stronger response (higher peak) for the right ear.

These two effects, relative time delay, and relative amplitude response between the left and right ears are the two primary clues that the human brain uses to determine the location of a sound source. It is not difficult to manipulate the HRTFs to remove the relative time delay, or remove the relative amplitude difference, and that is what I did next to create two new files. The time delay was removed by a simple time-shift, and the amplitude difference by equalizing power. When I listen to these files I find it fascinating that I perceive direction almost as well with only one clue present as with both clues present. This is one example of the redundancy built into data processing in our brains.

As a final test I created 3 files similar to the above, but where the HRTFs were replaced with a single spike; therefore the effect is a pure time delay and/or amplitude change. This works fairly well, but the sense of direction is less convincing than using the real HRTFs, and the amplitude-only case was marginal. This would indicate that a simple time shift of the real HRTFs still leaves some directional timing information - I wonder if this is the effect of the pinna?

Reflections and Echoes

These files are based on the -30 degree HRTF-processed file described above. A scaled and time-shifted copy of the file is added to the original to simulate a single reflection. The reflected sound is down -3 dB relative to the original sound, which is roughly the level of reflection from a typical wall. The demo file sequentially plays the original sound, and then the sound plus one echo, at each of the delays.

The first set of files uses relatively long time delays of 25, 50, 100, and 400 milliseconds. These delays correspond to path-length differences of 8.6 to 138 meters between the direct and reflected sound, so they do not have much relevance for home audio systems. They are included here as an interesting experiment regarding the human hearing process. The only case where a distinct "echo" is heard, meaning a repetition of the original sound, is for the longest delay. Shorter delays sound as though you are inside a cave.

The second set of files uses delays of 0.1, 0.5, 1, and 2 milliseconds, corresponding to path-length differences of 3.4 to 69 centimeters, which are relevant for home audio systems. The two shortest delays could arise from diffracted fields from the rim of a tweeter and the edge of a cabinet; the longer delays could arise from wall reflections. A -3 dB relative magnitude for a diffracted wave is really too high, but makes the effect a lot easier to hear. The apparent location of the source moves around, and the frequency response is altered due to comb-filtering. The sound is significantly different, but it is a bit surprising that the effects are not even more audible.

General Stuff Regarding Sound Processing

The computer age has provided us with incredible capabilities for manipulating sound files. This is my procedure: (1) sound from a music CD or other source is recorded into my computer; (2) the sound .wav file is then read into Matlab, and manipulated to simulate one or more effects on the sound; and (3) a new .wav file is created so it can be listened to in comparison with the original file.

Once in the computer, the sound can also be analyzed up the kazoo (see, for example, peak vs. RMS power in the section on music and ears). All kinds of sound system designs can be mathematically simulated, and a new sound file written out which can be listened to, to actually hear the influence of the design on a particular piece of music. Crossover networks can be designed and heard without physically constructing them. The acoustic effects of various room shapes can be heard. Different levels of distortion produced by tube and transistor amps can be compared. Wow!! I feel like I'm in engineers heaven!

One of the most sophisticated and exciting aspects of current research involves modeling human hearing. The relationship between a sound wave traveling in open air and the sound at the eardrum is represented mathematically by a head-related transfer function (HRTF). There are research efforts dedicated to both numerically computing, and measuring, HRTFs. Bill Gardner and Keith Martin of the MIT Media Lab have generously made their measurements of head-related transfer functions available to the world by posting them on the web (link sometimes disappears). All of my results involving HRTFs are derived from their diffuse-field equalized data. The HTRF is a function of frequency and the angle at which the sound originates, with respect to the head. The sound level angle dependence for the right-ear HRTF is shown here [35.7 kb]. Zero degrees elevation means the source is at ear level; 90 degrees means the source is directly overhead. Zero degrees azimuth means the source is in front of your nose; 180 degrees behind you. For most elevation angles there is nearly 20 dB isolation between the left ear at -90 degrees and the right ear at +90 degrees azimuth - which is a lot. In principle these data can be used to synthesize a virtual listening space. In other words, the sound at the eardrum can be made to mimic what you would hear if you were inside an Egyptian pyramid, or wherever. In fact there are several commercially available software suites for doing exactly this. Three that I am aware of are EASE/EARS, Ramsete/Aurora, and Catt. The Odeon site contains sound demos and other material. The University of Southampton has an interesting sound demo file which is designed for two closely spaced stereo speakers, and creates a sound stage much wider than the speaker spacing. Many other interesting links are also available (unfortunately includes many broken links) regarding virtual 3D sound. This seems to be a very active research field, and some fascinating stuff is going on.

Summary of Available Files

All music files are 1.5 Mb for the Vivaldi, and 1.7 Mb for Joe Sample. "Available for download" means you can click on the link and download it immediately; otherwise they are available by request.

Three amplifier distortion files are available for download. The reference file, the single-ended triode, the push-pull pentode, and the solid state for the Vivaldi segment. Joe Sample by request.

Music with high frequencies above 6, 7, 8, or 10 kHz filtered out.

Music with ideal Linkwitz-Riley crossover phase effects.

Music with the power reduced by half (volume reduced by 3.01 dB).

High frequencies: Single frequency tones at 13 kHz and 14 kHz, [860 kb each].

Frequency discrimination: tone burst pairs at 400-400, and 400-404 Hz are available for download. 400-402 Hz by request. All are [176kb].

HRTF files are available for download [630 kb each]. The first file uses complete HRTFs for -60, -30, 0, 30, and 60 degrees in azimuth and 20 degrees elevation. The second file uses HRTFs where the relative time difference between the right and left ears has been removed, and the third file removes the relative amplitude differences. There are 3 additional files where the real HRTFs are replaced with a single spike for each ear.

Reflections and echoes [630 kb each]. The original sound is the -30 degree HRTF file. The demo file sequentially plays the original sound, and then the sound plus one echo, at each of the delays. The echo is -3dB down from the original. For the first file the delays are 25, 50, 100, and 400 milliseconds. For the second file the delays are 0.1, 0.5, 1, and 2 milliseconds.

Back to the top.

To the table of contents