Perfect Match!

Since Apple took over in 2018 for a minuscule fee of $400 million, the magical power-granting machine that finds songs for you in a matter of seconds has since become a global phenomenon, among fans both of music and those of the tech world.

Find the one that got away in seconds

With nothing but seconds to find that song, (and now) advertisement or television show, it has captivated thousands of users across the world. And with it being personally interconnected with Apple’s Siri, we have a walking musicopedia at the tip of our fingers.

It doesn’t pick up humming however, so I’d advise stop trying it.

So, how does it actually work?

Shazam has a library of more than 8 million songs, and with it, it has devised a unique fingerprint signature that is specific to each track in the library.

The main idea is like creating a “‘fingerprint‘ of each performance” (Fisher, CEO).

1. So for instance, when you hold up your phone for a song you’d like to identify, Shazam turns your captured sound into an acoustic signature.

2. From here, it’s just a simple matter of pattern-matching. From an audio fingerprint, an acoustic fingerprint is sought in its database of its 8 million songs that matches the same fingerprint as your recorded sample.

3. Once matched to that specific fingerprint, it feeds back all the useful information back to the user. Simple right?


But how does Shazam make these fingerprints?

The Spectrogram

The only possible way for Shazam to possibly find a song in quick succession, is to rather ignore the majority on the song, and really just focus on the “key” or “intense” moments in the song.

And so, Shazam created a spectrogram that does exactly that. A graph which plots and marks three dimensions of music. Shazam’s algorithm just picks out those points that represent the peaks of the graph – notes that contain “higher energy content” than other notes, really.

Some guiding principles for the attributes to use as fingerprints are that they should be temporally localised [restricted], translation-invariant [same outcome regardless of different input], robust [lack of noise distortion], and sufficiently entropic [state of order]. 

Avery Wang, 2003, Co-Founder, Scientific American

In Layman’s Terms …

I don’t really want to go too deep with this one, of course. You can certainly imagine though that, Shazam‘s algorithm delves deeper into the minds of science, and the majority of it cannot be explained in layman’s terms. And, by that, I mean, I don’t understand it when they use scientific jargon. =I’m sure they do it on purpose to stand in awe with how much work goes into it.

The target zone of a song scanned by Shazam
Song sample example with peak intensity points marked

Frequency …

To make it easier to understand, here – I’ll number the process for you.

1. By identify frequencies of “peak intensity,” these peak points keep track of the frequency and the amount of time at the beginning of the track and are logged as “the target zone.” Which sounds far scarier than it is.

2. On the graph above, you can see that each point on the graph represents the intensity of a given frequency at a specific point in time. (For the visually impaired, these are the red marks spotted on the graph above.)

3. The amount of frequency is then matched in the database to multiple songs that may have this amount of frequency – whether that is 5 songs or a hundred thousand. It is then placed in aspect of where this frequency occurs in the song.

834.44 Hz = 10secs into SONG A

834.44HZ = 12secs into SONG B

Time …

It then checks if these frequencies correspond in time. If there is a relation between the set of points of the frequency appearing at the same time as it is on the recorded sample, then the stars will align and there’s your song.

Perfect match!

That’s the simple explanation anyway, and that’s good enough for me, to be honest.

All this is well and good but, Shazam never discusses their hit-or-miss rates. Of course, there will be some errors in the system, regarding recorded samples that have huge background noises and sometimes the recorded samples may not be in the database of 8 million. Hard to imagine it, but there you are.

But, the fact that it can single out the right song that is supposedly “the same” chord-wise – especially chart music – it truly emphasises the fact that it is a pretty impressive piece of kit.

It is no wonder Apple purchased it with their pocket money.

Want to read more?

Further reading:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Latest Stories

%d bloggers like this: