Mostrando entradas con la etiqueta cerebro y escucha. Mostrar todas las entradas
Mostrando entradas con la etiqueta cerebro y escucha. Mostrar todas las entradas

Un nuevo estudioo lo confirma: Vista y escucha son sentidos que se complementan.

 Hemos comentado al cansancio sobre las pruebas ciegas para evaluación de un eq    uipo de audio o sistema, es decir sin ver, la lógica de quienes defienden tal prueba es que no se vicia o corrompe el juicio auditivo con la vista, pero ¿Y si los sentidos funcionaran en conjunto para un mejor proceso cognitivo y sensorial? ¿Imaginan saborear algo sin tener bein el olfato ? ¡Horrible! Es como cuando se tiene gripe.

Resumen: Una nueva investigación revela que las personas que experimentan pérdida de visión antes de los 10 años tienen más dificultades para juzgar la distancia del sonido en comparación con aquellos que pierden la vista más adelante en la vida. Esta dificultad para percibir la ubicación de los sonidos tiene importantes implicaciones para la seguridad y la navegación.

El estudio destaca la necesidad de soluciones sanitarias personalizadas para aquellas personas con pérdida temprana de visión para mejorar su calidad de vida.

Hechos clave:

La pérdida temprana de la visión afecta la capacidad de juzgar con precisión las distancias del sonido.

Los participantes con pérdida temprana de la visión percibieron los sonidos cercanos como si estuvieran más lejos.

El estudio subraya la importancia de comprender la dependencia sensorial en personas con pérdida de visión.


https://neurosciencenews.com/vision-loss-sound-distance-26470/



Audición supranormal lograda potenciando las sinapsis del oído

 https://neurosciencenews.com/supranormal-hearing-neuroscience-26389/ 







La música sincroniza los cerebros de los artistas intérpretes o ejecutantes y su audiencia


Increíble, lamentablemente ya no pude tener acceso al artículo.

https://www.scientificamerican.com/article/music-synchronizes-the-brains-of-performers-and-their-audience/

Cuanto más disfruta la gente con la música, más parecida es su actividad cerebral a la del músico


2 de junio de 2020 - Robert Martone 





Los neurocientíficos recrean la canción de Pink Floyd a partir de la actividad cerebral de los oyentes


La inteligencia artificial ha convertido las señales eléctricas del cerebro en rock clásico algo un tanto confuso.

https://www.scientificamerican.com/article/neuroscientists-re-create-pink-floyd-song-from-listeners-brain-activity/

Hay vida por encima de los 40kHz

 http://www.cco.caltech.edu/~boyk/spectra/spectra.htm

Music Lab Home
James Boyk Home
Articles by James Boyk
Alive with Music!

 

 
Author's Notes, May 4, 2000
At the request of people involved in standards-setting for audio, who wanted this information made available as soon as possible, I published this original paper here, rather than in a professional journal.

Because I use figures 1(a,b,c) not only as data but to explain my reasoning, I include them in the paper itself. To save download time, other figures are given as links. After you look at one of these figures, your browser's "Back" button may return you to exactly where you were in the paper. If it doesn't, please note what section of the paper you are in before you link to the figure, then return by using the section links supplied with each figure.

The footnotes have links to return you to where they were cited.

All of the figures are 900 pixels wide. Viewing will be easiest on a monitor screen of 1024 x 768 or higher resolution, and with 256 or more colors.

 

 

There's Life Above 20 Kilohertz!
A Survey of Musical Instrument Spectra to 102.4 KHz


James Boyk
California Institute of Technology

Music Lab, 0-51 Caltech, Pasadena, CA 91125, USA
Tel: +626 395-4590, E-mail: boyk@caltech.edu
Home: http://www.cco.caltech.edu/~musiclab

Copyright © 1992, 1997 James Boyk. All rights reserved.

 

Abstract
      At least one member of each instrument family (strings, woodwinds, brass and percussion) produces energy to 40 kHz or above, and the spectra of some instruments reach this work's measurement limit of 102.4 kHz. Harmonics of muted trumpet extend to 80 kHz; violin and oboe, to above 40 kHz; and a cymbal crash was still strong at 100 kHz. In these particular examples, the proportion of energy above 20 kHz is, for the muted trumpet, 2 percent; violin, 0.04 percent; oboe, 0.01 percent; and cymbals, 40 percent. Instruments surveyed are trumpet with Harmon ("wah-wah") and straight mutes; French horn muted, unmuted and bell up; violin sul ponticello and double-stopped; oboe; claves; triangle; a drum rimshot; crash cymbals; piano; jangling keys; and sibilant speech. A discussion of the significance of these results describes others' work on perception of air- and bone-conducted ultrasound; and points out that even if ultrasound be taken as having no effect on perception of live sound, yet its presence may still pose a problem to the audio equipment designer and recording engineer.


I. Introduction
Each musical instrument family — strings, winds, brass and percussion — has at least one member which produces energy to 40 kHz or above. Some of the spectra reach this work's measurement limit of 102.4 kHz.
       Harmonics of French horn can extend to above 90 kHz; trumpet, to above 80; violin and oboe, to above 40; and a cymbal crash shows no sign of running out of energy at 100 kHz. Also shown in this paper are samples from sibilant speech, claves, a drum rimshot, triangle, jangling keys, and piano.
       The proportion of energy above 20 kilohertz is low for most instruments; but for one trumpet sample it is 2%; for another, 0.5%; for claves, 3.8%; for a speech sibilant, 1.7%; and for the cymbal crash, 40%. The cymbal's energy shows no sign of stopping at the measurement limit, so its percentage may be much higher.
      The spectra in this paper were found by recording each instrument's sound into a spectrum analyzer, then "prospecting" moment by moment through the recordings. Two instruments (clarinet and vibraphone) showed no ultrasonics, and so are absent here. Other instruments' sounds extended high up though at low energy. A few combined ultrasonic extension with power.
      The mere existence of this energy is the point of this paper, and most of the discussion just explains why I think that the spectra are correct, within the limits described below. At the end, however, I cite others' work on perception of air- and bone-conducted ultrasound, and offer a few remarks on the possible relevance of our spectra to human perception and music recording.
 

II. Explanation of trumpet spectra in Figures 1(a) & 1(b)
The upper trace in Figure 1(a) shows the spectrum of a concert B-flat played on a trumpet with a Harmon ("wah-wah") mute, as captured by an Aco/Pacific quarter-inch microphone four feet away and analyzed with a Hewlett-Packard model 3567A FFT spectrum analyzer. This and all other instruments were played in normal concert fashion. (For details of instruments and players, see Appendix A.)
      The lower trace shows the background with the trumpet silent; this is dominated by the microphone's "self-noise," as shown in section VIII, below. Of course this background is present when the trumpet plays; and that is why the upper trace is identified as "Trumpet + Background."
      Are the trumpet peaks actually harmonics? To find out, we'd like to place markers at harmonic frequencies. To be easily readable, though, such a graph would have to be huge, so Figure 1(b) provides excerpts from it.
      The first excerpt shows the spectrum up to 8 kHz; the second, from 15 to 32 kHz; the third, from 38 to 53 kHz. Note the 100th harmonic at 46,560 Hz and the 108th at 50,263 Hz. (The vertical scale has been adjusted separately in each excerpt to make it easier to judge the presence or absence of harmonics. Figure 1(a) shows the overall relationships of level.) It is clear that the peaks are indeed harmonics (and equally clear in the omitted portions of the frequency spectrum, for this and the other spectra).
      The fourth excerpt shows that by 55 kHz, the harmonics are vanishing. Note that, as seen in Figure 1(a), the trumpet is still 12 to 15 dB above the background at this frequency; so the energy seen at 55 kHz, though non-harmonic, is still trumpet sound. To be conservative, however, I don't claim this portion of the spectrum as part of the sound; and Table I says only that harmonics are visible to "above 50 kHz." Similarly, where the last column in Table I shows that 0.5% of the total energy is above 20 kHz; this is calculated only to the 50 kHz limit given for the harmonics.
      In Figures 2(b) through 9(b), as in this one, the last excerpt will show the region where visible harmonics vanish.

 
III. More trumpet, horn, violin, and oboe
In the same way as just described for Figure 1, Figures 2 through 9 give information about other instruments whose sound has harmonics.
      Skipping Figure 1(c) for the moment, in Figure 2 we see another sample of trumpet with Harmon mute, 20 dB lower in level than the sample in Figure 1, yet with harmonics extending higher, and with a higher percentage of its total energy in the harmonics. (See Table I.) Figure 3 shows trumpet with straight mute. Here the harmonics extend higher yet, to above 85 kHz.
      Figures 45, and 6 give three examples of French horn, played respectively "bell up," with mute, and in normal fashion. One hundred or more harmonics are visible in each!
      Figure 7 shows a violin "double-stop", that is, two notes played simultaneously. Since each note produces its own harmonic series, Figure 7(b) uses markers of two different shapes to show the two harmonic series.
      Figure 8 shows a single violin note played sul ponticello, that is, with the bow very close to the bridge. This gives a distinctive squeaky-scratchy sound which composers sometimes specify, as for example Beethoven in the C-sharp Minor string quarter, Opus 131. Even in this mezzo-piano (medium-soft) note, harmonics are still visible past 40 kHz. (Due to absence of mind, I took no sample of normal violin sound playing a single note normally.)
      Figure 9 shows an oboe note. It is striking how the harmonics suddenly drop in level after the 40th at 43 kHz.
      Not shown are any clarinet or vibraphone samples, because, as mentioned above, I could find no harmonic activity above 20 kHz anywhere in several samples of each, despite the closest "prospecting" with spectrum analyzer. These were the only instruments of the group that did not show such activity.
 

IV. Microphone and analyzer distortion
We return now to Figure 1 to ask whether the harmonics are spurious. Are they perhaps caused by overload of the microphone or analyzer? The waveform from which the spectrum was derived is shown in Figure 1(c), between the "Begin" and "End" points. Gross microphone overload would be shown by "flat-topping," which is absent. Nor was the analyzer overloaded on this or other samples. [1]
      Microphone distortion short of gross overload is not a factor, either, according to information supplied by the makers of the microphones. 
[2] Capsule distortion is primarily 2nd harmonic, and falls 20 dB with every 20 dB drop in level down to 136 dB SPL (23 dB higher than any of my samples), continuing to fall below that level.
      Since distortion is predominantly second harmonic [2], a spectral peak at 50 kHz, if due to distortion, would be the second harmonic of 25 kHz. If the 50 kHz peak were found to be at the 0.1% level, that is, 60 dB below the 25 kHz peak, then it might be due to distortion—if the distortion were indeed as high as 0.1%.
      But in fact, the 50 kHz region in Figure 1(a) is 25 dB higher than this. Coupling this with the undoubted fact that the distortion is lower than 0.1%, the energy seen in the 50 kHz region is certainly not due to microphone distortion.
      Similar reasoning applied to Figures 2 through 9 leads to the conclusion that microphone distortion is not a factor in any of them, nor by extension in Figures 10-16. However, in addition to such reasoning, I wished to test the microphones directly. Verifying the performance of the capsules (the diaphragm assemblies) is beyond the capability of my equipment; however, these units are widely regarded as a "gold standard" and their performance claims universally accepted as true. I rely on this.
      I was able to test the preamps, however. I injected test signals into them via a B&K adapter, with a small capacitance to mimic the presence of the microphone capsule. Test signals were the following:

      a. B&K 2639: preamp: A pure tone at 550 Hz, at a level 1 dB higher than that of the loudest musical-instrument sample.
      b. B&K 2639: The same, but 1 dB lower than the softest musical sample.
      c. B&K 2639 and Aco 4012 preamps: Tone cluster at 500, 1000 and 1500 Hz, 1 dB higher than the loudest musical sample.
      d. B&K 2639 and Aco 4012 preamps: The same cluster, 1 dB lower than the softest musical sample.

      Figures 18(a) and (b) show the performance of the B&K preamp to signals (c) and (d) respectively. The preamp is clearly free of harmonics at both high and low levels. The small bump at 85 kHz in the low-level test is breakthrough from the switching power supply. [3] I don't know the source of the even smaller bump at 50 kHz. Both are so small that they may be ignored, however.
      The behavior of the Aco 4012 preamp was indistinguishable from the B&K 2639 at the higher level, and superior at the lower.
      Note that these tests of the preamps are also tests of the H-P 3567A FFT analyzer. From the clean results, one may conclude that neither preamps nor analyzer are creating a false appearance in any of the spectra in this paper.

 
V. Room acoustics and rattles
I assume that the room acoustics are linear, and thus cannot create spurious frequencies in the spectrum. On the other hand, the room does contain objects which could conceivably rattle at ultrasonic frequencies, including loudspeakers, vacuum tubes, fluorescent light fixtures, metal chassis, and so on. What is more, the time samples analyzed for instruments with harmonics were generally long enough (31.25 milliseconds) for the microphone to pick up not only the direct sound of the instrument but also many reflections, and conceivably rattles, from around the room. (This is another reason that I do not claim greater high-frequency extension for these spectra than validated by visible harmonics.)
      It is impossible however that these hypothetical rattles would fall precisely at frequencies that were harmonics for all the variety of fundamental pitches shown in the various Figures, so I discount rattles as a source of spurious harmonics.

 
VI. Instruments without harmonics
For sounds with no harmonics, the argument just given cannot eliminate the possibility of room contamination; so instead I attempted to eliminate the room from the sound. To do this, I analyzed only the very beginning of a sound (so the room was not already excited), and cut short the time record before the first reflection could return from the nearest surface. A shorter time record means coarser resolution of frequency; but since we're no longer looking for harmonic peaks, this does not matter. This procedure was followed for Figures 11 through 16; but first I present Figure 10, in which the speech sibilant of interest happened to come after the beginning of the sound, so any rattles might already have been excited. I believe Figure 10 to be all right nonetheless, for two reasons. First, the microphone was much closer to the desired source than to any possible rattle except those in the microphone mount, boom, or cable. Second, the spectrum presents a coherent picture. One would expect rattles to be at one or a few specific frequencies or narrow frequency bands; but this spectrum smoothly covers a very broad band. I rely on the latter point also to support my ignoring the possibility of rattles from the microphone cable, shock mount and boom in Figures 10-16.
       In Figure 10, since the room was already excited by the sound preceding the analyzed segment, there was no point in limiting the length of the time record; so I used a 31.25-millisecond record for a high-resolution analysis of 32 Hz per spectral line. I hoped this might reveal rattles more clearly; but none showed up. In Figures 11-16, as described, I analyzed the beginning of the sounds and cut off the time records before the first reflections.
      Figure 11(a) shows the spectrum of a claves strike; 11(b) shows the 60-microsecond rise from a standing start to 104 dB. If one discounts the first tiny wiggle, the rise takes just 30 microseconds.
      Figures 12(a) and (b) show a jazz-style rimshot on a remarkably beautiful-sounding drum. (See Appendix A for identification of instruments and musicians.)
      Figure 13 shows crash cymbals. Note that the energy at 20, 30 and 40 kHz is higher than at 2, 3 and 4 kHz respectively; and that at 100 kHz it is still far above the background. I had never heard crash cymbals up close before, but I now think that this sound would be adequate for a sound-track of either the Big Bang or the Apocalypse!
      Figure 14 shows a strike of a ten-inch triangle.
      Figure 15 shows keys jangling. Recording engineers often use this sound to test their equipment, and one can see that it is indeed demanding, with the energy at an elevated level from 7 kHz to above 40 kHz.
      Figure 16 shows a high note on the piano (G-sharp 72, where the notes are numbered 1 to 88 from lowest to highest). I took the hardwood floor of the concert room as part of the instrument, since a piano is always on a floor; so I cut off the time-capture not before the floor reflection but before the reflection from the nearest wall, 12 feet away. Note that the partials are not harmonic, as one sees clearly in 16(b). I'm not sure how to divide responsibility for this inharmonicity between the strings and the soundboard, both of which can vibrate non-harmonically (the former acting as a 'bar' rather than an ideal vibrating string; the latter, because the solutions for a two-dimensional system, as the soundboard essentially is, are inherently non-harmonic).
      Whatever the cause, even at middle C on the piano (not shown), the first seven partials do look harmonic; but higher partials of middle C do go increasingly sharp, and the 17th partial is where the 18th harmonic would be. (The way this non-harmonicity functions in piano sound, and perhaps in the meaning of piano music, might make an interesting study.)

 
VII. Aliasing and "window splatter"
Returning to the topic of possible errors in the spectra, note that a spectrum may be corrupted in more subtle ways than those already mentioned. Aliasing, for example, is the spurious appearance below the Nyquist frequency of energy actually above that frequency.
      In the case of the H-P 3567A spectrum analyzer, the Nyquist frequency is 131,072 Hz; that is, the analyzer samples 262,144 times per second. Of course the analyzer has an 'anti-alias' filter, which eliminates aliasing as a problem. But it's interesting to note that, since my point is the mere existence of ultrasonic energy from musical instruments, aliasing would be no problem even in the absence of the filter; for the presence of aliased energy would mean that the musical instrument sound extends above 131 kHz, which would make my point even more strongly.
      Besides aliasing, I also considered "window splatter." The phenomenon by which a pure sine wave appears in Fourier analysis not as a single narrow spectral line but broadened and with bumps decreasing in level on either side—this is well-known. Extending the idea, one can imagine that even if a spectrum had no content above 20 kHz, it might nevertheless look as though it did because of the adding-up of "bumps" from energy below 20 kHz. I call this putative phenomenon "window-splatter." (More generally, window-splatter means that every point of the spectrum is potentially affected by every other point; but we are concerned here only with the issue of energy above 20 kHz.)
      To calculate the effect of window-splatter, we should construct an artificial spectrum that is flat up to 20 kHz, then convolve this truncated spectrum with the Fourier Transform of the Hann window. The result will be a spectrum whose energy above 20 kHz will be due entirely to window-splatter.
      Luckily, it turns out that window-splatter is an insignificant source of corruption for this work. (Not non-existent, but insignificant.) Using the "Math" function of the H-P 3567A analyzer, I created a spectrum whose value was 1 at all points up to just short of 20 kHz. At 20 kHz and above, the value was 0. After the convolution, the point at 20 kHz was indeed raised to -12 dB relative to the constant spectrum, but points at higher frequencies were at -150 dB, indistinguishable from the computation noise. (Figure 17.)

 
VIII. Correcting for the microphones' response curves
With regard for the points already discussed — microphone and input overload, microphone distortion, room acoustics and rattles, aliasing and "window-splatter" — I see no reason to doubt the existence of the ultrasonic energy. I did however correct the figures to allow for the unflatness of the microphones' responses.
      Each microphone has not one but a family of responses: on-axis or random-incidence, each with protective grid on or off. The on-axis response with the grid off is very flat to 100 kHz; with grid on, it is not flat, nor even known beyond 70 kHz. The random-incidence response is not flat with grid on or off; and the two curves differ.
      I first considered how to correct the background spectra (the lower curve in Figure 1(a), for example). I reflected that if the microphone had no "self-noise," then the background I measured would consist only of ambient sound in the room. Since this comes from all angles in a random fashion (as verified in measurements not shown), and is therefore captured according to the microphone's random-incidence response curve, the correct spectrum would be obtained by applying the opposite of that curve to the measured background. In other words, if the random-incidence response curve supplied by the manufacturer is down by 5 decibels at a certain frequency, I should raise the background spectrum by 5 dB at that frequency to get a correct reading.
      If on the other hand the room were silent, so that the measured background came entirely from microphone self-noise, then no correction would be necessary, since self-noise is generated electrically in the microphone, and has nothing to do with the presence or absence of the grid, nor with the angle of a source relative to the microphone.
      Since the room is quiet and the microphone, like any quarter-inch microphone, is noisy, I thought it likely that microphone noise dominated. To check this, I compared two background measurements taken with the quarter-inch B&K 4135 microphone, one measurement with grid on and the other with grid off. These two spectra were identical; they superimposed even when viewed using a vertical scale of only 0.6 dB per division.
      If I assumed that the background were dominated by room sound, I would "correct" each trace to allow for the unflatness of its particular microphone frequency response. The grid-on and grid-off random-incidence responses of the B&K 4135 differ by 3 dB at 30 kHz, 4 dB at 40 kHz, 1½ dB at 50 kHz and presumably by substantial amounts at higher frequencies, where the calibration of one of the curves is not known (that is, not supplied by the manufacturer).
      Since the background spectra I obtained are identical, "correcting" them based on two different curves would give different spectra for the same acoustic background. This nonsensical result implies that the assumption was wrong and the measured background is indeed dominated by microphone self-noise.
      I conclude that microphone self-noise indeed dominates the background measurements, and therefore that they are good over the full band without correction.

Now I consider the musical instrument spectra; for example, the upper trace in Figure 1(a). Here I take the applicable curve to be the on-axis microphone response, since the microphones were always pointed at the instruments. While the on-axis "grid-off" curve is very flat to 100 kHz, the "grid-on" curve deviates by as much as 6 dB, and moreover is not known beyond 70 kHz. When correcting the response of a "grid-on" measurement, I freeze the correction at the 70 kHz level; that is, I make it constant from 70 kHz on up.
      It is likely that the grid-on response continues to roll off beyond 70 kHz, and that therefore a true correction would not freeze at 70 kHz but would continue increasing. This would raise the apparent level of the high-frequency energy. Thus, by freezing the correction, I am taking a conservative approach.
      It is even possible, contrary to my assumption of two paragraphs ago, that the on-axis curves may not be the appropriate ones to use at all. At very high frequencies, because of the shortness of the wavelengths and the acoustic "liveness" of the room, the instrument may be picked up more in a random-incidence fashion. If this be the case, then because the random-incidence response is down 9½ dB at 70 kHz, I should raise the measured spectrum by 9½ dB at that frequency, and by appropriate amounts at other frequencies.
      Thus, at very high frequencies the true spectra may be 10 or even 20 dB higher than the curves shown here. As I do not have facilities to decide in which fashion the microphone is picking up the instruments, I use the on-axis curves to be conservative.
      Using the "Math" facility of the H-P 3567A analyzer, all the spectra have been corrected as described here, to within ±0.5 dB.

 



Table I.  Ultrasonic Extension and Energy of Some Musical Instruments

A summary of this paper's findings. Column one refers to the figure showing the spectrum in question. Column two identifies the instrument. Column three gives the sound pressure level measured at the microphone. Column four gives the measured frequency extension: For instruments with harmonics, this is the highest frequency where harmonics are still present; for those without harmonics, the highest frequency where the sound is still at least 10 dB above the background. (See text.) The last column tells what percentage of the total energy is contained in the range between 20 kHz and the limit given in the previous column.

 
Instruments With Harmonics

Fig.   Instrument         SPL     Harmonics       Percentage
                          (dB)    Visible To       of Power
                                  What Freq.?    Above 20 kHz

1. Trumpet (Harmon mute) 96. >50 kHz 0.5 2. Trumpet (Harmon mute) 76. >80 " 2. 3. Trumpet (straight mute) 83. >85 " 0.7 4. French horn (bell up) 113. >90 " 0.03 5. French horn (mute) 99. >65 " 0.05 6. French horn 105. >55 " 0.1 7. Violin (double-stop) 87. >50 " 0.04 8. Violin (sul ponticello) 77. >35 " 0.02 9. Oboe 84. >40 " 0.01

 
Instruments Without Harmonics

Fig.   Instrument           SPL   10 dB Above    Percentage
                            (dB)   Bkgnd. to      of Power
                                  What Freq.?   Above 20 kHz

10. Speech Sibilant 72. >40 kHz 1.7 11. Claves 104. >102 " 3.8 12. Rimshot 73. >90 " 6. 13. Crash Cymbal 108. >102 " 40. 14. Triangle 96. >90 " 1. 15. Keys jangling 71. >60 " 68. 16. Piano 111. >70 " 0.02



 
IX. Results
Table I summarizes the results. Instruments with harmonics (Figures 1 to 9) are claimed to have energy to the highest frequencies where harmonics are still visible. Those without harmonics (Figures 10 to 16) are claimed to have energy to the frequency where they are still 10 dB above the background. These frequencies are listed in the fourth column of the table, while the last column tells what percentage of the total energy of each sample lies below these frequencies but above 20 kHz. That is, a figure of 0.5 in the last column means that half of one percent of the energy is above 20 kHz. As described above, every step has been taken to make these figures conservative, and the real figures may well be substantially higher.
      For the samples which include room reflections (Figures 1 to 10), I do not claim that our spectra are the "absolute" spectra that would be found in anechoic measurement, because the spectra may have been altered by room resonances. As my point is simply the existence of the ultrasonic energy, however, this does not matter.
      Since Figures 11 to 16 exclude room reflections, their spectra should indeed be quantitatively accurate to within the few-decibel total error of the analysis chain.

 
X. Significance of the results
Given the existence of musical-instrument energy above 20 kilohertz, it is natural to ask whether the energy matters to human perception or music recording. The common view is that energy above 20 kHz does not matter, but AES preprint 3207 by Oohashi et al. claims that reproduced sound above 26 kHz "induces activation of alpha-EEG (electroencephalogram) rhythms that persist in the absence of high frequency stimulation, and can affect perception of sound quality.[4]
      Oohashi and his colleagues recorded gamelan to a bandwidth of 60 kHz, and played back the recording to listeners through a speaker system with an extra tweeter for the range above 26 kHz. This tweeter was driven by its own amplifier, and the 26 kHz electronic crossover before the amplifier used steep filters. The experimenters found that the listeners' EEGs and their subjective ratings of the sound quality were affected by whether this "ultra-tweeter" was on or off, even though the listeners explicitly denied that the reproduced sound was affected by the ultra-tweeter, and also denied, when presented with the ultrasonics alone, that any sound at all was being played.
      From the fact that changes in subjects' EEGs "persist in the absence of high frequency stimulation," Oohashi and his colleagues infer that in audio comparisons, a substantial silent period is required between successive samples to avoid the second evaluation's being corrupted by "hangover" of reaction to the first.
      The preprint gives photos of EEG results for only three of sixteen subjects. I hope that more will be published.

In a paper published in Science, Lenhardt et al. report that "bone-conducted ultrasonic hearing has been found capable of supporting frequency discrimination and speech detection in normal, older hearing-impaired, and profoundly deaf human subjects." [5] They speculate that the saccule may be involved, this being "an otolithic organ that responds to acceleration and gravity and may be responsible for transduction of sound after destruction of the cochlea," and they further point out that the saccule has neural cross-connections with the cochlea. [6]

Even if we assume that air-conducted ultrasound does not affect direct perception of live sound, it might still affect us indirectly through interfering with the recording process. Every recording engineer knows that speech sibilants (Figure 10), jangling key rings (Figure 15), and muted trumpets (Figures 1 to 3) can expose problems in recording equipment. If the problems come from energy below 20 kHz, then the recording engineer simply needs better equipment. But if the problems prove to come from the energy beyond 20 kHz, then what's needed is either filtering, which is difficult to carry out without sonically harmful side effects; or wider bandwidth in the entire recording chain, including the storage medium; or a combination of the two.
      On the other hand, if the assumption of the previous paragraph be wrong — if it is determined that sound components beyond 20 kHz do matter to human musical perception and pleasure — then for highest fidelity, the option of filtering would have to be rejected, and recording chains and storage media of wider bandwidth would be needed.

 
XI. What Next?
A natural next step would be to measure the ultrasonic content of orchestral sound as heard from normal listening or recording distances. This will automatically allow for the absorption of ultrasonics by the air. The project will be expensive, because musicians' union rules require players to be paid at recording rates, which are several times ordinary "scale," whenever a live microphone is present; and I anticipate difficulty in having these rules waived for our research. We solicit funding for this project!

 
Acknowledgements
This project went from a long-held idea of mine to reality because of the enthusiasm of Scott Kelly, Sandee Perez and Hovel Babikian, students in my Caltech course "Projects in Music & Science," EE/Mu 107. I am grateful for their substantial help in getting started.
      Without the help of my friend Prof. Gerald Jay Sussman of MIT, the work would not have been finished. As a visiting faculty member at Caltech during 1991-92, and continuing since then, he has contributed his deep knowledge, lucid teaching, and experienced counsel.
      Sincere gratitude also to:
      Caltech's Robert McEliece, Barry Megdal and the late Ed Posner for interest and support;
      Hewlett Packard Corporate Gifts; Rick Walker (former student in EE/Mu 107) for his good offices; Mac MacDonald for patient and knowledgeable field support; and Fred Cruger, Paul Gallagher, Norm Olsen and Bob Youden;
      Aco/Pacific Co. and Noland Lewis for equipment and information;
      Bruel & Kjaer Instruments, and Erling Frederiksen and Joe Chou for information;
      "Anonymous," whose $10,000 gift allowed the purchase of crucial equipment;
      Doug Sax, Ernst Schoenthal;
      Denise Bovet, for calculations of inharmonicity of piano strings;
      the late Bart Locanthi, who was generous with his interest and knowledge;
      Julie Sussman, for careful reading and useful comments;
      and Daniel W. Martin, editor-in-chief of the Journal of the Acoustical Society of America; Patricia M. Macdonald, executive editor of the Audio Engineering Society Journal; and the anonymous AES reviewers; for their generous assistance and good suggestions. (This does not imply endorsement of this paper by these individuals or by their organizations or publications.)
      This paper is being published long after the work was completed because of the difficulty of creating publication-quality graphs from the data. I spent a lot of time and money discovering half a dozen programs that would not make acceptable graphs. Finally, Caltech undergrad Peter Oakley learned to use Matlab to do the job, and carried out the work reliably and creatively.
      The Web (HTML) programming of this paper was done by David Boyk, of MegaHard Design.

 
Appendix A. Musicians and Instruments
Measurements of all instruments except piano were carried out in the Music Lab at California Institute of Technology; piano was measured in Dabney Lounge, Caltech's superb small concert room. Musicians were asked to play as in performance, and to avoid artificial effects.
      Figures 1 to 3. Trumpeter William Bing, Director of the Wind Ensemble and Jazz Band at California Institute of Technology, playing a Yamaha YTR 6335H trumpet with 135H lead pipe, YLH bell, and Yamaha 17D4 mouthpiece with Malone back bore. Mutes: EMO Harmon mute, stem in, played "straight out" (that is, uncovered); Dennis Wick straight mute.
      Figures 4 to 6. Hornist Jeff Greif playing Conn 8D French horn. Mute: Humes & Berg, stone-lined. Though an amateur, Dr. Greif is an excellent and experienced player.
      Figures 7 and 8. Violinist Linda Rose playing a Nicolas Gagliano violin, ca. 1782. Sartory bow, tourte mute.
      Figure 9. Oboe student Katja Pelzer playing a Loree oboe with "mix & match" sections.
      Figure 10. Caltech graduate student Paul Sivilotti speaking.
      Figure 11. James Boyk playing an inexpensive claves of unknown origin.
      Figures 12 to 14. Percussionist David Johnson playing, in Figure 12, a jazz-style rimshot on a Ludwig Super-Sensitive maple-shell snare drum from the 1920's; in Figure 13, a pair of Sabian 19-inch Germanic crash cymbals; and in Figure 14, a Grover 10-inch triangle using a Stoessel beater.
      Figure 15. Professor F. Brock Fuller jangling his own ring of keys.
      Figure 16. The author, who is Pianist in Residence at California Institute of Technology, playing Steinway Concert Grand CD 25 in Dabney Lounge at Caltech. The piano had just been tuned by concert tuner and former Steinway Concert Technician Kenyon Brown.

 
Appendix B. Measurement Equipment
A Hewlett Packard 3567A FFT analyzer, which captures 262,144 samples per second and has a dynamic range of 150 dB and a signal-to-noise ratio of over 80 dB, was used with two quarter-inch microphones, one a Bruel & Kjaer 4135 with 2639 preamp and 2807 power supply. the other an Aco/Pacific model 7016 with 4012 preamp and PS9200 supply. A half-inch Aco/Pacific 7012 was used for collateral measurements.

 
References
[1] Personal communications from Mac MacDonald and Steven Bye, Hewlett Packard: Overload in one portion of a time capture does not corrupt analysis of other segments. (The H-P 3567A analyzer shows overload by an "OVLD" legend on the time trace and by changing the overloaded portion to red; so it is very easy to tell whether any given segment is usable or not. A few of my captures had tiny overloaded segments, but I did not use these for analysis.) [back]

[2] Personal communications from:
      Noland Lewis of Aco/Pacific: The specifications of the Aco 7016/4012 (mike/preamp) are the same as Bruel & Kjaer 4135/2639.
      Erling Frederiksen of Bruel & Kjaer: Distortion of the B&K 2639 preamp is negligible at any level found in this work. Distortion of the B&K 4135 microphone capsule is predominantly second harmonic with magnitude 0.1% at 136 dB SPL. Falling by a factor of ten for each 20 dB of level reduction down to 136 dB, the distortion continues to fall at lower levels. 
[back]

[3] Personal communication from Joe Chou of B&K. [back]

[4] Tsutomi Oohashi, Emi Nishina, Norie Kawai, Yoshitaka Fuwamoto, Hiroshi Imai, High-Frequency Sound Above the Audible Range Affects Brain Electric Activity and Sound Perception. Audio Engineering Society preprint No. 3207 (91st convention, New York City). Abstract, page 2. [back]

[5] Martin L. Lenhardt, Ruth Skellett, Peter Wang, Alex M. Clarke, Human Ultrasonic Speech Perception. Science, Vol. 253, 5 July 1991, pp. 82-85. Abstract, p. 82. [back]

[6] Ibid., p. 84, last paragraph of main text. [back]

 
Music Lab Home
James Boyk Home
Articles by James Boyk
Alive with Music!

Gran artículo sobre música, la escucha humana y el audio

Espero poder traducirlo en breve, me parece concreto y muy informativo.

El artículo original en: http://www.silcom.com/~aludwig/EARS.htm


Music and The Human Ear


This section contains information on the softest and loudest sounds we can hear, the range of frequencies we can hear, subjective vs. objective loudness, how we locate the source of a sound, and sound distortion. This section focuses mainly on the ear itself, but the brain is an integral part of the human hearing system. A separate section considers the function of the brain in more detail.
The human ear is a truly remarkable instrument. At one point in my life I designed Electronic Counter Measures (ECM) systems for the U. S. military. The primary function of an ECM system is to detect an enemy before he (it's rarely a she) detects you, for self-defense. It is interesting to compare the characteristics of a good ECM system and human hearing:
Comparison of characteristics
Characteristic
Good ECM system
Human hearing
Directional coverage
All directions
All directions
Source location accuracy
Within 1-5 degrees
About 5-degrees
Ratio of highest to lowest frequency (bigger the better)
20 : 1
1000 : 1
Ratio of strongest signal to weakest (the bigger the better)
Million : one
32 trillion : one

Human hearing is a superior defensive system in every respect except source location accuracy. Note: Jourdain (page 23) states that human accuracy is 1-2 degrees in azimuth.
 In contrast, a military system designed for communications (rather than detection) would typically have a much smaller ratio of highest-to-lowest frequency, no source location capability, and often a narrow directional coverage. For human communication a frequency ratio of 10:1 and a ratio of strongest to weakest signal of 10,000:1 would suffice. The far larger actual ratios strongly imply a purpose other than communication.
All of this tells me that the ear evolved primarily for self-defense (or perhaps hunting, as one reader pointed out), and language and enjoyment of music are delightful evolutionary by-products. A defensive purpose also suggests some direct hard-wiring between the ears and primitive parts of the brain, which may account for the powerful emotional impact of music - and its virtual universality among human cultures. A few years after writing this paragraph I found the very interesting book This is Your Brain on Music which confirms the speculation on wiring to primitive parts of the brain, but argues that music has a definite evolutionary function.
Soft Sounds and Loud Sounds
Acknowledgment: a good part of the material in the remainder of this section is derived from an excellent book The Master Handbook of Acoustics by F. Alton Everest, and from the chapter he contributed to the Handbook for Sound Engineers. See references. These sources also contain much additional interesting material. David Worrall has posted his course notes of Physics and Psychophysics of Music on the web, which includes an informative section on the physiology of hearing. A series of tutorial papers on hearing and other related topics has also been posted by HeadWize.
Sound pressure level (SPL) is given in dB SPL. This is a scale that is defined such that the threshold of hearing is close to 0 dB. The threshold of pain is about 135 dB. This is a logarithmic scale where power doubles for each 3 dB increase; the 135 dB difference between the thresholds of hearing and pain means the power doubles about 45 times - an increase of 32 trillion (32x1012) in the power level. This is an incredible dynamic range, and totally blows away anything human engineers are capable of creating. (Actually in a Dec 99 Newsgroup post Dick Pierce states that B&K 4138 microphones have a dynamic range of 140 dB, so I was underrating human engineers). At the low end of the range the ears lose function due to background noise. At 0 dB SPL noise created by blood flow in the ear is one source. It is shown elsewhere that the noise of molecules colliding with the eardrum is not far below this level. At the threshold sound level of 0 dB SPL Everest states that the eardrum moves a distance smaller than the diameter of a hydrogen molecule! (Correction! it is the stereocilia in the inner ear that move this much, not the eardrum). At first I was incredulous when I read this, but it is consistent with the change in diameter of the balloon example used in the previous section. For a 0 dB SPL the change in balloon diameter is 6x10-10 inches, which is about 1/10 of the diameter of a hydrogen atom. The sensitivity of the ear is truly mind-boggling.
Pressure is an objective physical parameter. The relationship of SPL to the subjective sensitivity to sound is discussed below. The human ear is most sensitive in a band from about 2,000-5,000 Hz. This is an important region for understanding speech, and could be construed to imply that hearing evolved to match speech. However, did the ear evolve to be sensitive to the speech frequency band, or did human speech evolve to match the band where the ear is most sensitive? (I read somewhere that babies cry in the frequency band where the ear is most sensitive). As measured by Voss and Allen, a typical eardrum absorbs about 75% of the incident sound energy at 5 kHz. The sensitivity vs. frequency behavior has a fair resemblance to the response of a piston load matched to the impedance of air, as shown in the physics section. Music levels vary from about 50 dB for quiet background music to maybe 120 dB for a very loud rock band. Subjectively, a 2-3 dB change in sound level is barely perceptible; if someone asks you to "turn up the volume a little," you will probably increase the sound by at least 3 dB. (Note that if you have a 100-Watt amplifier and it doesn't play loud enough, you need a 200-Watt amplifier to turn up the volume 3 dB. This can get very expensive very quickly). Interestingly there were some ABX test results on the web which indicate that a 0.3 dB difference in level can be detected (link no longer exists). However the test procedure allows switching between the two levels as much as you want before making a decision, and the test used pink noise for the sound. You can hear what a 3 dB difference sounds like yourself with sound files in the sound demo section.
A full orchestra can also hit a sound level of 110 dB and more, and then play a quiet passage at 20-30 dB. To reproduce this faithfully requires a recorded sound source capable of covering this 80+ dB dynamic range. (Everest quotes one researcher who claims a 118 dB range is required). A vinyl record is good for about 50-70 dB; a standard compact disc with 16-bit encoding can cover a 96 dB range, and the 24-bit DVD disk format a 144 dB range - in theory. Real D/A converters tend to be noise limited to a somewhat lower range.
A problematic aspect of music for a sound system designer is that there are brief transients ("spikes") in sound level that far exceed average power levels. Usually people talk about average, or root-mean-square (RMS) power. RMS power is really only important with respect to the generation of heat. In my opinion, peak power is far more important, since this is when a speaker could be driven into a non-linear region, and when an amplifier would clip. These two effects are major causes of distortion. Using Cool Edit 96, I recorded 10-20 second segments from Talking Heads "Burning Down the House," Diana Krall "All or Nothing at All," and Shostakovich Symphony #5. I then processed the cuts in Matlab, to generate the outputs of a 3-way crossover. The crossover frequencies are 300 and 3000 Hz. Both 1st order Butterworth and 4th order Linkwitz-Riley filters were modeled. Finally I calculated the average and peak power in each driver band, with results as shown in the tables below.
Average Power
Driver
Talking Heads
Diana Krall
Shostakovich
1st order
4th order
1st order
4th order
1st order
4th order
Tweeter
7%
3.4%
4%
1.6%
14%
4.4%
Midrange
28%
22%
32%
25%
66%
64%
Woofer
65%
66%
64%
60%
20%
4.6%

Peak Power
Driver
Talking Heads
Diana Krall
Shostakovich
1st order
4th order
1st order
4th order
1st order
4th order
Tweeter
18%
13%
53%
15%
15%
8%
Midrange
45%
35 %
83%
89%
53%
55%
Woofer
81%
88%
40%
31%
16%
4.3%

All powers are shown as a percentage of the same quantity in the unfiltered music. Note that the average power for the Butterworth adds to 100%, but the Linkwitz-Riley adds to less than 100%. The voltage output of a Linkwitz-Riley coherently adds to unity, but the power addition is less than unity. The peak power is obtained by computing the time-domain waveform of the signal output by the crossover. Then the peak value is found. Typically the peaks occur at different times for the tweeter, midrange, and woofer, so there is no physical significance to the sum of the three powers in this case. The startling result is that by far the greatest demands on peak power are in the midrange for the Krall and Shostakovich. The 4th order reduces the demands in the high and low bands, but there is little difference in the mid-band. Only the Talking Heads cut has a greater demand in the bass. It is also quite significant that even though the average tweeter power is low, the peak tweeter power is not all that much lower than other bands, and in fact is greater than the woofer in some cases!
When I play the Talking Heads cut, my CLIO sound measurement system shows a peak sound level of 100 dB SPL in the room, and an average of around 95 dB. Judging from the oscilloscope connected to the amp outputs, the average amplifier output power appears to be about 17 watts. The ratio of peak power to RMS power was 40:1, 40:1 and 30:1 for the Talking Heads, Diana Krall, and Shostakovich cuts respectively. Therefore, for 17 watt RMS, the peak power demands are on the order of 700 Watts. This indicates that either my amps can put out peaks much higher than their rated power (possible, but I'm not sure), or they are clipping. There are demo files In the sound demo section which simulate clipping by tube and solid-state amplifiers. For more on this subject see the section on amplifier distortion.
Jourdain (page 41) states that an orchestra produces 67 watts of acoustic power at full blast. Loudspeakers have efficiencies on the order of 0.5 to 2% converting electrical power to acoustic power. Even at 2% efficiency this implies that well over 3,000 watts of electrical power would be required to duplicate this sound level. Of course an orchestra plays in a large auditorium, and no doubt less power is needed for a small room. This still indicates that power requirements should not be underestimated.
A major criterion of a good sound system is its frequency response. The usual frequency range considered "hi-fi" is 20-20,000 Hz. These sample tones are audible with good loudspeakers or headphones, but many computer speakers will not reproduce them at all: a 100 Hz tone, (12 kb wav file) and a 10,000 Hz tone (44 kb wav file). Yesterday I did a test using the very accurate signal generator built into my CLIO system. I can clearly hear, and certainly can feel, a 10 Hz tone. My sound system totally poops out below 10 Hz, so I can't test any lower than that. The lowest notes on organs and pianos are 16.4 and 24.5 Hz respectively. Testing at the other extreme, as a 61 year-old male (when I originally wrote this) I can hear a 13,500 Hz tone, but no higher. (It is generally agreed that women are more sensitive to high frequencies). However, good high frequency response is required to produce sharp transients, such as a snap of the fingers. I performed a test using a Ry Cooder CD, "Talking Timbuktu." Track 10 on this disk has some very sharp transients that just leap out at you from a good sound system. My pre-amp has a filter that cuts off frequencies above 12,000 Hz. With this filter in, the transients limp out rather than leap out. This shows that even though I cannot hear a pure tone in most of the range of frequencies cut out by the filter, I can clearly hear the difference in the sound quality of the transients. I repeated this test recently (at age 67) with a segment of this cut recorded as a .wav file, and digitally processed with a 12kHz filter. This time the test was a double-blind ABX test, and I can't reliably detect any difference (I can still hear a 13,000 Hz tone). I now doubt the validity of the earlier test. See the discussion on high frequency tests in the section on sound demos.
James Boyk at Caltech has posted an interesting paper on the frequencies generated by musical instruments between 20kHz and 102 kHz! He also cites a paper that states that people react to sounds above 26 kHz even when they cannot consciously hear the sound. Jourdain (page 42) states that sound can be heard up to 40 kHz if sufficiently loud (A knowledgeable reviewer of the book is skeptical about this claim. Unfortunately the link to the review no longer works).
The ear tends to combine the sound within critical bandwidths, which are about 1/6 octave wide (historically thought to be 1/3 octave). This has led to the practice of averaging frequency response over 1/3 octave bands to produce beautiful-looking frequency response curves. In my opinion this is misleading. Suppose a loudspeaker has a bad dropout (very weak response) over a narrow frequency range; the dropout will be totally obscured by averaging. But when a musical instrument plays a note that just happens to fall in the dropout notch, you will not be able to hear the note. See the example of a warts-and-all response (28.2 kb) vs. a 1/3 octave smoothed response (24.5 kb) from my final system measurements section. Since we can barely hear a 2-dB difference in sound level, it is reasonable to accept ±2 dB as an excellent level of performance for frequency response. In fact this is impossible to achieve in the real world, due to room acoustics. (see the section on room acoustics). Personally I would say a more-or-less practical goal for a sound system installed in a room is a frequency response ±5 dB from 200-20,000 Hz, and maybe ±10 dB from 10-200 Hz. It is also worth noting that the ear itself has a quite variable frequency response, as shown by measured data on head-related transfer functions, and as discussed in the next section.
What is the minimum audible change in frequency? I created two .wav files: case #1 was a series of 1/2 second tone bursts, all at a frequency of 800 Hz; for case #2 the bursts alternated between 800 and 805 Hz. I can reliably distinguish between these two cases in a double-blind test. This difference in frequency is less than 1/100 of an octave. I could also distinguish between 400 and 402 Hz. According to Jourdain (page 18) this is about normal for a young person; at age 61 I'm not supposed to be able to detect a difference of less than about 8 Hz at 400 Hz. But I can. (I repeated this test at age 67, and I still can do it). Sample files are described in the sound demo section. An interesting detail is that tone bursts that start and stop abruptly are easier to discriminate than bursts with a fade-in fade-out. I don't know if this is simply a timing issue, or if the brain is making use of the higher Fourier transform sidelobes that occur for a square window (the spectrum for a tapered burst is extremely narrow, the square burst spectrum has extensive sidelobes about 40 dB below the peak).
For music the audio spectrum is divided into discrete notes. A brief discussion of the interesting subject of musical scales is given in a separate section.
SPL is an objective measurement of sound pressure, or power in watts, and is independent of frequency. In 1933 Fletcher and Munson of Bell Labs did a study that showed that subjective sound levels varied significantly from the SPL level. That is, when two tones were played at the exactly the same SPL level, one sounded louder than the other. And the results were very dependent on how loud the tones were to begin with. This is illustrated by the set of Fletcher-Munson curves [102 Kb]. The vertical axis is the objective SPL sound level. Each of the curves in the graph represents a constant subjective sound level, which are in units called "phones." The lowest curve is the minimum audible level of sound. As noted above, the ear is most sensitive around 2-5 kHz. To be audible at this minimum level, a sound at 20Hz must be 80 dB (100 million times!) more powerful than a sound at 3 kHz.
Near the top, the curve at 100 phones is a fairly loud level. To sound equally loud at this level the sound at 20 Hz must be about 40 dB more powerful. This change in subjective level for different loudness levels means that music played softly will seem to be lacking in bass. For years pre-amps have come equipped with "loudness" controls to compensate for this. For me, part of "Hi-fidelity" means playing music at the same level it was originally played, so this is all academic - but interesting none the less.
An important characteristic of a sound system is the "sound image." An ideal system would create a vivid illusion of the location of each musical instrument. In designing a system it is important to understand, as well as current knowledge permits, how we locate the source of a sound. One thing that is clear is that the brain processes several different types of data to extract directional information. The data include:
  • shape of the sound spectrum at the eardrum
  • difference in sound intensity between the left and right ears
  • difference in time-of-arrival between the left and right ears
  • difference in time-of-arrival between reflections from the ear itself
A remarkable fact is that the pinna, the cartilage-filled structure surrounding the ear canal (commonly simply called the "ear"), is a vital part of direction sensing. Test subjects can be trained to locate sound using only one ear. But when the ridges of the pinna are gradually filled in, the ability is lost, in proportion to the filled in area. Apparently the brain uses reflections from the ridges of the pinna (19.4 kb) to determine direction. The head and pinna have a major effect on the sound that arrives at the ear. This effect is mathematically represented by a head-related transfer function (HRTF). There are files in the sound demo section where a monophonic sound source is processed with HRTFs to synthesize sound arriving from various directions. The full HRTFs contain both the difference in sound intensity, and difference in time-of-arrival. There are two other demo files where only one of these two differences are retained. When I listen to these files I perceive the apparent direction almost equally well with all three files, indicating that the brain has a remarkable capability of making good use of whatever information it gets.
The significance of the pinna reflection experiments for a sound system designer is that time delays on the order of 0.1 millisecond can effect sound imaging. Time delays between the left and right ear are on the order of 0.5 milliseconds, and are quite important. On the other hand, researchers have found that echoes in the range of 1 to 50 milliseconds are lumped together by the brain with the direct sound, so they are not actually heard as distinct echoes. Delays greater than 50 milliseconds are heard as echoes. My own echo research is described in the sound demo section, and you can listen to the results yourself. Echoes in the range of 25 to 100 milliseconds give a "cavernous" quality to the sound. What is commonly called an "echo," a distinct repetition of the original sound, only occurs for echoes of 400 milliseconds or longer. Echoes in the range of 0.1 to 2 milliseconds do cause changes in the apparent direction of the source.
A regular CD sampled at 44.1 kHz is theoretically capable of reproducing frequencies up to 22 kHz, which corresponds to a transient duration of .05 milliseconds. However, as discussed in a recent paper by Mike Story (e-mail mstory@dcsltd.co.uk to request a copy) the anti-aliasing filters required to record within this band cause the transients to be blurred, in effect smearing the ability of our ears to distinguish direction. Mike reports that in listening tests 96 kHz recordings provide notably better spatial resolution. In the Handbook for Sound Engineers Steve Dove says anti-aliasing filters "....exhibit serious frequency dependent delay and convoluted frequency/phase characteristics... leaving mangled audio in their wake". He also advocates sampling around 100 kHz, and says the result is a more open and spacious sound. Humans perceive left-right direction more accurately than up-down direction. Presumably this is due to the fact that we generally move in two dimensions along a more-or-less level surface. All of this information is important for the sound system designer, particularly regarding the control of sound diffraction and reflection, both of which can muddle the sound image.
Distortion is a commonly accepted criterion for evaluating high-fidelity sound equipment. It is usually understood to mean the tones in the reproduced sound that were not present in the original sound. An ideal sound system component has a perfectly linear response. This means that the ratio of the output and the input signal magnitude is always exactly the same, and the relative phase is constant, regardless of the strength of the signal. For a non-linear response (anything other than a linear response), distortion will occur. It is commonly categorized as total harmonic distortion (THD) and intermodulation distortion. Harmonic distortion means that a pure 1000 Hz input tone results in spurious outputs at 2000 Hz, 3000 Hz, and other integer multiples of the input frequency. Intermodulation distortion means two input tones at 1000 Hz and 100 Hz result in spurious outputs at 900 Hz, and 1100 Hz, among others.
The audibility of phase distortion is controversial. Some loudspeaker manufacturers, such as Dunlavy (apparently now out of business), cite flat phase response as a significant feature of their products. There is no question that under some artificial circumstances phase distortion is audible. Further discussion on the interesting topic of phase audibility can be found here.
So called "Doppler" distortion is produced by the motion of the loudspeaker cone itself. This creates some harmonic distortion, but the most significant effect is intermodulation distortion. This class of distortion can only be reduced by reducing the cone motion. A large surface, such as the membrane of an electrostatic speaker, will produce very little Doppler distortion. See the analysis for a piston in a tube for technical details.
Also see the discussion above on "clipping."
Everest quotes research indicating that amplitude distortion has to reach a level of 3% to be audible. However this varies greatly depending on the distortion harmonic products, and on the sound source. More on this below. Good CD players, amplifiers and pre-amplifiers typically have distortion levels of 0.1% or less. (Tube amps typically have higher distortion). Loudspeakers are the weak link regarding distortion. It is hard to even get information on loudspeaker distortion since it looks embarrassing compared to the values advertised for electronics. I measured 2nd and 3rd harmonic distortion of my sound system end-to-end using my CLIO sound measuring system. Since speaker distortion dominates, this is essentially a measurement of speaker distortion. The measurement was made using one speaker; with two speakers the distortion would be the same, but the SPL levels would increase 6 dB for the two lower frequency bands, and 3 dB for the upper bands. The entire measured distortion curve at the higher power level is shown in the section on final system measurements.
Measured harmonic distortion
Frequency
80 dB SPL
86 dB SPL
92 dB SPL
20 - 40 Hz
6%
8%
10%
100 - 200 Hz
1%
1%
1%
1000 - 5000 Hz
0.4%
1%
1.5%
5 - 10 kHz
0.14%
0.25%
0.44%
Distortion is universally considered to be bad, and it is perhaps not generally realized that musical instruments introduce overtones that have similarities to distortion. I imagine most music lovers are aware that all musical instruments produce a fundamental tone (the "note"), and a series of overtones. The overtones are at frequencies higher than the fundamental tone, and give the sound a rich quality not possessed by a pure tone. Overtones are generally harmonics (integer multiples) of the fundamental frequency. The relative strength of the various harmonics gives the instrument its characteristic sound. You can hear a comparison of a real piano note [42kb] and a tone {42kb] with the same fundamental frequency, but lacking in overtones. There is also additional description of the spectrum of this note.
The ear is not perfectly linear and produces distortion. A short discussion of the non-linear behavior of the ear can be found in a separate section. Finally, air itself is non-linear, and harmonic distortion grows steadily as a wave propagates (see plane waves in the physics section). This is usually a very small effect, but can be significant in the throat of a horn speaker.

The subject of sound quality is not at all clear-cut. Even though tube amplifiers have higher measured distortion, a lot of knowledgeable people swear that they sound better. I finally dove into this subject in August 2006. I can clearly hear THD at 0.5% for a pure 440 Hz tone and the type of harmonics produced by a typical solid-state amp; for the type of harmonics produced by a single-ended triode amp I could not detect distortion until it reached a level of 10%. This amazing difference is covered in detail in the section on amplifier distortion. For music samples the difference is not quite as big, but is still quite significant. Many people have come to the conclusion that THD is a terrible way to judge amplifier quality, and I totally agree. Norman Koren, an advocate of tube amplifiers, has posted a very interesting commentary on the subject of distortion and the effect of feedback.

Los registros del comienzo y del fin de lo que oímos son procesados por distintos canales neuronales

http://www.tendencias21.net/Descubren-como-el-cerebro-escucha-el-sonido-del-silencio_a4104.html

Hasta ahora, se pensaba que la percepción del inicio y del final de los sonidos se procesaba en el mismo canal neuronal. Ahora, un nuevo estudio ha demostrado que el cerebro emplea dos canales de conexiones neuronales distintos e independientes entre sí para procesar el inicio o el final de los sonidos. Este hallazgo, que aclara, por ejemplo, cómo somos capaces de conocer el límite de las palabras, servirá para mejorar las terapias para personas con déficit en el lenguaje, y también para diseñar dispositivos de ayuda a la audición más eficientes. Por Yaiza Martínez.

Un equipo de investigadores de la Universidad de Oregón, en Estados Unidos, ha conseguido definir un canal de sinapsis o conexiones neuronales vinculado a la audición e independiente dentro de la corteza auditiva del cerebro.

Este canal, afirman los científicos, se ocuparía específicamente de detener el procesamiento del sonido por parte del cerebro en el momento adecuado y, por tanto, resultaría clave para la escucha y para la comprensión de lo que escuchamos.

Hasta ahora, se creía que el registro de la aparición de un sonido y el registro de su subsecuente desaparición por parte del cerebro eran ambos llevados a cabo por el mismo canal, por lo que este nuevo descubrimiento contradice una suposición anterior, que ha sido mantenida durante mucho tiempo.

De la aurícula al lóbulo temporal

Por el contrario, el presente hallazgo respaldaría una hipótesis emergente que señalaba que un conjunto separado de sinapsis podría ser el responsable del procesamiento del fin de las señales sonoras, informa la Universidad de Oregón en un comunicado.

Según explican los científicos en un artículo aparecido en la revista especializada Neuron, las neuronas de la corteza visual, somatosensorial y auditiva pueden responder todas tanto a la finalización como al inicio de los estímulos sensoriales.

En lo que respecta a la corteza auditiva, hasta ahora se había pensado que las respuestas a dicha finalización de las señales sonoras surgirían a partir de un rebote post-inhibitorio, pero esta hipótesis nunca había sido comprobada directamente.

Michael Wehr, profesor de psicología, miembro del Instituto de Neurociencias de dicha universidad y uno de los autores de la presente investigación, señala que, gracias las comprobaciones realizadas en esta nueva investigación, se ha constatado la existencia de un canal completo e independiente que va desde la aurícula al cerebro, y que está especializado en el procesamiento de los desequilibrios sonoros.

Éste y otro canal alcanzarían finalmente juntos una región del cerebro llamada corteza auditiva y que está situada en el llamado lóbulo temporal del cerebro, un área que contiene las neuronas que captan las características sonoras. El lóbulo temporal también contiene neuronas relacionadas con la comprensión del lenguaje, con la memoria y con el aprendizaje.
 
Comprobado en ratas

Para la investigación, Wehr y dos estudiantes colaboradores (Ben Scholl y Xiang Gao) registraron la actividad de las neuronas y sus sinapsis conectoras en cerebros de ratas, que fueron expuestas a apariciones sonoras de milisegundos de duración.

Las respuestas neuronales a estas señales sonoras fueron medidas en el inicio y al final de cada sonido. Los científicos probaron varias frecuencias y duraciones de los sonidos en una serie de experimentos.

De esta forma, se constató que un conjunto de sinapsis respondían “muy fuertemente al inicio de los sonidos”, y que era otro grupo diferente de sinapsis el que respondía a la repentina desaparición de dichos sonidos.

Por otro lado, se pudo ver que no existía superposición alguna entre los dos conjuntos de neuronas activados al inicio y al final de los sonidos.

Es decir, que el final de un sonido no afectaba a la respuesta neuronal ante otro sonido nuevo, lo que refuerza aún más la idea de canales distintos de procesamiento del inicio y del final de las señales sonoras.

Por otra parte, los investigadores de la Universidad de Oregón han podido constatar que las respuestas al final de un sonido implican una frecuencia de afinación, una duración y una amplitud neuronales diferentes a las que se producen en el procesamiento del inicio del sonido.

Estas diferencias en los modos de procesar las señales auditivas al inicio y al final de éstas coinciden con planteamientos aparecidos en al menos tres estudios anteriores realizados al respecto en la última década.

Posibles aplicaciones

Según explica Wehr, “ser capaces de percibir cuando se detiene un sonido resulta muy importante para el procesamiento del discurso. Uno de los problemas verdaderamente difíciles del discurso es encontrar los límites de las palabras. En realidad, aún no se comprende muy bien cómo el cerebro establece esa diferencia”.
Pero el presente estudio, según cree Wehr, ha dado a conocer ciertos mecanismos cerebrales esenciales para la identificación de los límites necesarios entre palabras, y que nos permiten reconocer y escuchar con acierto el discurso de otros.
Estos hallazgos, que han aumentado el conocimiento sobre cómo el cerebro procesa las señales sonoras, podrían propiciar la aparición de nuevas terapias especializadas o la mejora de los dispositivos de ayuda a la audición.

Por otro lado, podrían resultar útiles a la hora de diseñar tratamientos destinados a niños con déficits en el lenguaje y en el aprendizaje. Por ejemplo, se sabe que las personas con dislexia tienen problemas para definir los límites de los sonidos en el discurso, por lo que tratar las áreas identificas podría ayudar a potenciar sus capacidades.