Wednesday, February 01, 2006


Moving over to

I am moving my blog over to Blogger is nice when you are dipping your toes in blogging waters but you soon find that you need more than just a life jacket. You need scuba diving gear if you want to go deep. A nice comparison of Blogger and is here.
This is the direct link for my new RSS feed.

Tuesday, November 08, 2005


How to fix iPod Eq distortion.

The Problem:
The iPod gives a somewhat harsh and metallic sound when the equalizer is turned on. It does not matter which preset you choose (except Flat). If you don’t keep the equalizer off, the sound comes out distorted and is not pleasant to listen to. It happens on all the iPod models, even on the latest 5G one which has the best sound among hard disc based models. The 5G fixes lot of problems with the earlier iPods like Noise Defect and comes with a number of improvements, especially in bass performance but the sound still clips with the equalizer.

The Reason:
There is nothing wrong with iPod. Really.

It has as good a sound as any other player. Though it sounds a little bright, the sound has lot of detail and really insignificant amount of distortion. Even the equalizer is well designed and behaves as it is supposed to. But that metallic sound? That clipping? Well, it’s a classic case of “garbage in garbage out”.

It’s those damn mp3s. Or rather the original CDs themselves.

In a race to sound louder and louder, the CD mastering engineers push the recording level to its limits. This is especially true for mainstream pop/rock music. Somehow the producers think that the CDs have to be really loud to make a better impression.

Now when these hot mastered CDs (or mp3s made from them) are played with the equalizer on, the total loudness level goes beyond what iPod can produce. Some of my mp3s had a loudness level of 99.6db!! Yikes! What can the poor iPod do when presented with this crap? Say I choose Dance preset which would typically apply a boost of 6 db. Add this to 97db (a typical figure) and you get a level of 103 db. Most players have an upper limit of about 95db, some even going till 100db but that’s it. No wonder the sound comes out distorted and harsh.

The Fix:
All we need to do is bring down the loudness level of the mp3s down so that we get a little headroom to apply the Eq. The generally agreed upon loudness level for this is 89db.

Download mp3gain. It’s an open source freeware program. Add the folder containing the mp3s. Choose ‘track gain’ and click on the ‘track analysis’. It will calculate and display the loudness level for each song and also how much correction needs to be applied. When this analysis is done, just hit ‘track gain’ and it will apply the required correction to each song.
Mp3gain does not re-encode or otherwise modify/degrade the file in anyway. All it does is set a flag in the file. When a player reads this flag, it knows how loud to play this song.

This whole correction process is a little slow. It took about one hour per GB on my Pentium 4 machine, 30 hours total for my entire collection. It’s the analysis part that is slow. The correction is instantaneous. So instead of hitting the analysis and waiting for 30 hours to do the correction, a better way to do this is to hit the track gain directly. It will analyze and correct in one step.

After all this is done, erase all the songs from your iPod, resync, and enjoy!


  1. You can directly point mp3gain to iPod_control folder on the iPod but it is not recommended. You don't want that tiny hard disc on your iPod to be spinning continously for 30 hours. Moreover, if you do it on the PC, other mp3 players can also use this info.
  2. You can stop/cancel the track gain process anytime you want. Mp3gain will pick up from where it left when you start it next time.
  3. Mp3gain is more accurate at doing normalization than SoundCheck feature found in iTunes/iPod. Mp3gain is based on ReplayGain standard which takes into account the mechanism of how humans perceive loudness. Turns out that human ears use average energy over time to perceive how loud a certain sound is. So mp3gain divides each file into 50ms blocks and calculates the RMS energy value of each of these blocks. These values are then used to arrive at the overall RMS energy of the entire song. This is the value which is then used to normalize the song file.

    SoundCheck works in similar way but it seems to take much longer blocks (fewer samples) to calculate the average RMS energy. This makes it less accurate (but a lot faster) than mp3gain.

Friday, August 12, 2005


Best values for digital camera settings!

Back in the days of film, I used to own a Yashica MF-2 camera. It was as simple as a camera could get. There were no menus to fiddle with, no knobs to turn, and just one button to release the shutter. It didn't even need the batteries if you didn't want flash. And I should mention, it has never given me a technically bad shot - ever. All you had to do was press the button. How could you go wrong with that?

Then came the digital cameras and the question became - how could you ever get it right? My digital camera, an Olympys C-760, has more than 50 different settings! As Scot Adams said in Dilbert - we have come from simple tools like pointy wooden sticks to convolulted things like computers but our brains have not evolved at the same rate. And looking at my digital camera, I can personally vouch for that.

There are two type of settings in a digital camera. Ones that are same for each shot and ones that vary from shot to shot. Image quality (jpeg compression) is a fixed type of setting. You don't want a lower quality for one shot and higher quality for another, assuming you don't have any kind of strange fetish related to images. White Balance, on the other hand, has to be changed from shot to shot. Actually I don't change it that often but all the pros say that they do. So there must be something to it. Here is what I do for fixed settings:

Sharpening: Keep it 'Off'. If your camera doesn't have an explicit 'Off' setting, keep it to a minimum. I always used to keep it at 0, thinking that this would turn the sharpening off. It was only later I found that the scale was actually from -5 to +5. There is a reason people recommend to RTFM. A few reasons why in-camera sharpening is bad -

  1. Different photos need different type and differnt amount of sharpening. Digital cameras usually employ Unsharp Masking but the only thing that you can vary in that is the amount, not the radius, or the threshold. And that's where the trouble lies. A tree shot with lots of leaves needs lower threshold than a Facial close up. Even if you could vary those other parameters, it would be too tedious to adjust it for shot to shot. So just turn it off and use your judgement later instead of leaving it to the camera.
  2. The sharpening should always be assessed at 100% size. Those tiny 2" LCDs just don't cut it. And you have to fiddle with the sliders a lot before you get the look you want.
  3. USM is not the best technique to sharpen photos. There are lots of other techniques which are More flexible and result in less artifacts. This alone mandates that you use PC to sharpen your images.
  4. Sharpening should be the last step in the workflow, after all the levels/curves/saturation etc corrections have been done. And then too it depends upon the intended use of the picture. Web only images need different sharpening than the ones which are to be printed. Usually you shouldn't apply sharpening until it's absolutely needed. Just save your edited file in PSD or XCF format without sharpening and sharpen just before using it.

Contrast: This should be set to its minimum possible value. This results in a wider dynamic range which allows you to capture more detail. Increasing the contrast in-camera is like applying levels correction to the photo. A pixel which was 240 would now read 250 and the pixel which was 252 would be lost. Same applies to the shadow details too. So keep the contrast to a minimum and make sure that your histogram stretches from end to end. Even if the histogram is bunched up, it's always possible to strech it later in photshp but there is no way to get back the details that got lost during capturing itself.

I always check the histogram right after taking the shot. Generally I end up shooting the same thing 3-4 times before I get it right but that's ok. Good shots are worth this much trouble.

Image Size: Use the biggest size your camera can capture. I always shoot at 2048x1536, the highest my camera can go. It's all the more important if you shoot in jpeg. If you shoot at a lower resolution, the camera does not use less pixels to begin with. It can not do that. It would always capture the image at its native (highest) resolution and then resize it. And camera's resampling algorithm are no match for photoshop's bicubic resampling. So just use photoshop to resize later if you feel you don't need the full resolution but always capture the image at the highest resolution possible.

Another reason to use full resolution is if you intend to print your shots. Digital images are typically printed at 300 ppi. So if you shoot at 1024x768, you can make a print of only 3x2.5 inches. Sure you can print this upto 4x6" but it won't look as good or as crisp.

Image quality: Again, use the highest quality setting possible. JPEG is already a lossy format and further compression makes it even worse. The lower the quality (higher compression), the lesser the details in the image. That's why most of the amateur flower shots have petals made of pure color, without any detail. At a resolution of 3 megapixels, the lowest quality JPEG file out of my camera is about 100 KB and highest quality file is at 1.5 MB! Those extra pixels count for something. The quality settings are generally named as fine, superfine etc and vary from camera to camera. Make sure you know that the setting of 'High' is really higher than fine or was it the other way around??. With storage being so cheap, there is just no excuse to throw away all that information. Get a higher capacity card if you have to. I use a 512 MB card which captures about 270 shots at highest resolution.

Friday, July 22, 2005


The ultimate Winamp setup guide!

When we call someone an "audiophile", we generally mean that the guy spends insane amounts of money on the audio gear. This is a wrong defintion, and a derogatory one too. To me, an audiophile is someone who is passionate about music. Who cares about music enough to invest in high quality gear. But it does not mean that we can't be audiophiles if we don't have all the high-end stuff. All it takes is right set of tools and a little bit of fiddling around.

A typical audio path looks like this: CD --> Encoder --> Winamp (Decoder) --> DSP --> Kmixer --> Soundcard --> Headphones Let's see what we can do to optimize each stage.

Encoder: If most of your music comes from internet and other sources in MP3 form, there is nothing you can do about its quality. But if you rip the CDs yourself, read on. MP3/OGG/AAc sound great but why lose quality when you can rip to a lossless format. These lossless formats are really lossless. These are like winzip for audio. You get back the exact same data, with each and every bit intact. The lossless formats do take more space (5 MB for MP3 vs 25 MB for FLAC) but with harddisk space being so cheap, there is no reason to lose quality. The best and most popular foramt for lossless audio is FLAC. I use Yahoo music engineto rip the CDs to FLAC and this plugin to play it in Winamp. Even though a few of the portable audio players support this format, it is not yet widely available on DAPs. So if your player does not support FLAC and if DAP is your primary source of music, you'd be better off with MP3. Just make sure that you use a high enough bitrate (192 kbps or greater) and LAME to encode. Some people claim that AAC has better quality at same bitrates but I can't vouch for that. These are more like Nikon vs Canon debates.

Decoder: The built-in MP3 decoder in winamp has been licensed from FhG which is pretty decent but not the best sounding one. We need something that can do dithering and noise shaping. I would use foobar but iZotope Ozone does not work with that. Fortunately, there is MAD plugin which does all this and it is free too. I must say that the difference in the decoders is very subtle and for the most part, insignificant. It is more of a purist thing. So just use whatever came with Winamp.

DSP: The next step in the chain is signal processing. This includes things like equalizer, normalizer, and other sound enhancing plugins. It takes a really good understanding of audio fundamentals to not screw up the sound with these things. You can either read tomes on this subject and try to play around with the settings yourselves. This is like using layers and masks and whatnot in photoshop to sharpen the picture where Photokit Sharpener can do a much better job at the press of a button. In other words, leave it to the pros. And the best DSP plugin for Winamp is by iZopte called Ozone. The basic plugin is free but it is so good that you'd never miss the pro version. I brings out the details in the music that you never even knew about. Especially in the mid and high frequency range. The music sounds more lus and the instrument separation improves to a remarkable degree. Listen to a well mastered CD like Gladiator or "The passion of the Christ" with and without this plug-in to fully appreciate it.

You don't need any other plugin with this, not even Winamp equalizer. I just leave the settings at default for both headphones and speakers. If you start experiencing listening fatigue on headphones, try switching to headphone preset. This mode adds a little crossfeed and HRTFs to make you feel as if you are listening to speakers. Or if you don't like too much zing in your music, try the preset with less sparkle. I found that default mode gives me the purest sound. Just play aroud with different presets and see what you like.

Kmixer: The signal coming out of DSP goes into a Windows component called kmixer. All it does is mix different audio streams so that two or more applications can play back audio at the same time. It does something else too. It resamples everything to 48 KHz (the native audio data is at 44.1 KHz) which results in a slight loss of quality. We can bypass this thing by using an output mode called 'Kernel Streaming'. It is more of a hack (experimental feature) to support applications that need very low-latency. This plugin does not support buffering (as of now) so the music might skip if you use your PC too heavily while Winamp is running. It should be fine for normal use though. In case you are not happy with KS, use directsound for Win 2K and XP and waveout for Win98/ME.

Update(Dec 07, 2005):
The updated version of kernel streaming plugin can be found in this HA thread here.

Soundcard: This is one of the most misunderstood pieces of hardware. Everyone knows that a Geforce4 is better than Geforce2 video card. You get much better frame rates and almost realistic texture. But a better soundcard? What could a better soundcard do? Play music faster?

Explaining what a good soundcard can do is like explaining color to a blind person. You have to listen to a good setup to really appreciate it. Try not to buy a Creative soundcard. These cards have been optimized for gaming with EAX effects and other 3-D trickery. You want a card built specifically for audio. Chaintech AV-710 and M-audio revolution cards are cheap and one of the best in this category. If nothing else is available, Creative Soundblaster is a pretty decent solution. Just make sure that you turn off all the so called sound enhancing settings in the soundcard driver.

Headphones: Much of what has been said about soundcards applies to headphones too. In fact, having a better headphone is more important than having a better soundcard because ear drums are a precious thing. Once damaged, they stay that way. Better buy a decent headphone and treat them nice.

A word of caution: Do not go overboard in trying to get better sound. Do not give the term audiophile its common misunderstood meaning. Listen to the music for music and become a real audiophile.


Wednesday, June 15, 2005


How to simulate speakers on the headphones.

One of the reasons that we have two ears (or two eyes) is that it allows us to experience this world in three dimensions. We can see just fine with one eye but we need two eyes to gauge the depth of the scene. Similarly, two ears allow us to have spatial hearing. I.E we can localize the source of the sound. We can figure out whether the sound is coming from left or right, from up or down etc.

Say we are listening to some music being played on a speaker. Typically, the sound from the speaker would reach one of the ears earlier than the other (far) ear. This is because our ears are separated by about 10 inches and sound has a finite velocity. This time difference is called Interaural Time Difference (ITD) and is generally in microseconds. And the ear closer to the speaker would receive the sound waves directly, making the signal level at this ear slightly stronger than the far ear. This difference in level is called Interaural Level Difference (ILD). Together, ITD and ILD allow us to localize the azimuth (angle in the plane) of the sound source. These are also called binaural cues since these involve both the ears. We cannot determine the elevation of the sound source using just these cues though.

The sound that comes out of the speaker and the sound that we hear are not the same. Several things affect the sound before it can be heard by us. The distance of the speakers from the ear, shape of our head, the structure of our outer and inner ear, the angle at which the sound waves strike the ear, indirect sound from reflections off the walls etc. All these influences on the sound waves are collectively called as head related transfer functions (HRTFs). Mathematically, HRTF is the impulse response of the medium that carries the sound. These HRTFs allow us to determine the elevation of the sound source. These are also called monaural cues.

But what happens when both ears receive the same level of sound at the same time and there are no external influences on the sound? Since both ITD and ILD are zero and HRTFs are absent, there is no information to localize the sound. So the sound seems to come from the center of the head. This is what happens with the headphones. Our brains loses all its localization clues and constantly tries to figure out where the sound is coming from. This results in fatigue and uneasiness, the extreme case of which happens when we listen with only one headphone (right or left). Speakers, on the other hand, sound much more natural and pleasant since brain has all the information needed to figure out the spatial location of the sound. And all the music itself is recorded in such a way that it would sound most plesant when played back on speakers only.

So if we have to simulate the speakers on the headphones, all we have to do is make ITD and ILD non-zero and introduce some HRTFs and we are set. One of the best Winamp plug-ins to do this is Speakers Simulator by Vladimir Kopjov.

Interaural level difference (ILD):
When listening to speakers, both ears receive the sound from both the speakers. But in case of headphones, left ear receives the sounds only from the left channel and right ear receives the sound only from the right channel. To simulate this on headphones, we have to take a bit of left channel and send it to right ear and vice versa. This is called 'crossfeed'. Its value varies from 0 to 80% in the above plug-in, default being 70%. The higher the crossfeed, the stronger the localization but lesser scene width.

Interaural time difference (ITD):
The sound coming from, say, left speaker would reach the near ear earlier than the far ear. To simulate this in headphones, the signal that is cross fed is also delayed by a small amount, the delay being roughly equivalent to what would be realized in practice. In the speaker simulation plug-in, the default setting for delay is 113.38 microsecond. More delay gives more scene width but less focus. If increased too much, this delay can cause some sound artifacts.

Head related transfer functions (HRTFs):
This is one of the most difficult parameters to measure because of interaction of so many different factors and different head/ear structures in different people. Lord Rayleigh modeled it assuming our head to be a perfect sphere and using wave propagation equations along a curved surface. The results show that the high frequencies get attenuated much more as they travel than the low frequencies. Though they are mathematically complex, the HRTFs are really easy to simulate. All we have to do is make the high frequencies roll off gradually and we'll get the same effect as the speakers. And unless your headphones cost $500 or more, it already has this desired frequency response (it is not so by design but rather from a desire to keep the costs down). So just make sure that the equalizer in winamp is flat for high frequencies. My Winamp equalizer looks like this.

The effect of speaker simulation plug-in is very subtle. It doesn't seem to have much effect in beginning, but after a few hours of listening, the difference is clear as day and night. And, of course, it is much more pleasant this way and you can listen for longer periods of time without having any fatigue.


  1. Sound Localization Using Head Related Transfer Functions
  2. Speakers Simulator plug-in
  3. Sound localization on Wikipedia
  4. Comments by Chris on HydrogenAudio forums

After trying out various other plug-ins, I have finally settled on 4Front Headphones. It does not apply as much equalization as Speaker Simulator so the sound is more or less unchanged. It also gives much better stereo imaging and puts the sound directly in front of you. The default setting of 30% gives a very distorted and artificial sound but at 10%, it sounds superb.

This page is powered by Blogger. Isn't yours?