How do I reduce muffle on speech?
Reply #1 – 2007-06-22 14:31:48
I'd be tempted to try limiting the bandwidth to telephone-like frequency range, i.e. a bandpass of 300 to 3400 Hz with relatively soft cut-off may aid intelligibility and eliminate low frequency vibrations and rumble which might overpower weakly-heard voices. I seem to recall that programs like Cool Edit (now Audition) featured a "Telephone" preset in their FFT filter function. For simplicity and rapid auditioning, you might wish to try a similar approach using foobar2000 audio player with the Equalizer DSP enabled. You can lower the response gradually outside the passband until you're happy, and indeed you could enable the same DSP in the Convert function, thereby outputting a processed sound file. Likewise, vLevel or some other Dynamic Compression or Automatic Gain Control used after the EQ might make volume a bit more even for near and far voices. Another thing that springs to mind is a patent application I found from renowned digido.com mastering engineer Bob Katz (Robert A Katz), which doesn't appear to have been granted so far, but has, I believe been put into one of his commercial processors (I think it's K-Stereo). The patent application concerned stereo enhancement and ambient field "extraction" using the Madsen Effect, where the human auditory system perceives the original direct sound's source to be the stereo location and perceives reflected sounds within the fusion zone (about 40ms delay) to be information about the ambience, cause by reflections from surfaces, such as walls in a room. Outside the fusion zone, echo is perceived instead, sounding like repeated sounds, not one sound. The idea was that by creating artificial stereo without coloration or perceptible comb-filtering, one could enhance music and additionally, one could use it for forensic audio restoration, where voices are hard to pick up, extracting the ambient reflections into a stereo spatial field and apparently allowing the auditory system to focus on the individual speaker's voice partly from the stereo location of the initial direct sound. I'd imagine that this is more advantageous with headphones than loudspeakers. I can sort of understand how this may be why binaural hearing-aids help in picking out conversations in noisy environments (esp. if they provide frequency correction and levelling for each impaired ear), though it seems a bit of a leap to somehow extract or unscramble delayed reflections in this way - but the brain is very clever. The stereo-convolver plugin for foobar which allows cross-channel convolution, now seems an easier approach than those I've used in the past to experiment on short samples, and the algorithm is remarkably simple. Of course, out of respect for Bob Katz's intellectual property, this is something to be done for private educational purposes and curiosity, not commercial gain.