Behind the Button: The Fascinating Process of Shazam’s Song Detection

Shazam’s ability to instantly identify songs from just a few seconds of melody has made it a popular app for music fans looking to quickly recognize tunes. But how does this seemingly magical song identification work? Let’s dive into the technology behind Shazam.

1. Digital Fingerprinting of Audio

Audio Sampling: When you press “Shazam”, the microphone on your device starts recording a snippet of the ambient sound. This recording is typically a few seconds long and is converted into a digital format that can be analyzed by algorithms.
Spectrogram Creation: The digital audio is transformed into a spectrogram using the Fourier transform. A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. This process helps in isolating unique elements of the sound that can be used to distinguish it from millions of other songs.

2. Feature Extraction

Identification of Unique Points: Shazam’s algorithm scans the spectrogram to identify high-intensity points in the audio spectrum, which represent moments of high energy or distinctiveness in the song. These points are referred to as “peaks” or “anchors”.
Hashing: Each of these distinctive points is encoded into a unique hash code by considering the point’s frequency and its relative timing with other distinctive points. This process reduces the song to a series of data points that can efficiently represent the song’s unique audio signature or fingerprint.

3. Database Matching

Querying the Database: The generated hash codes are sent to Shazam’s servers, where they are compared against a pre-existing database of millions of songs’ fingerprints. This database is structured in a way that allows for quick searching to find matching fingerprints.
Time Alignment: The algorithm also considers the timing between the distinctive points in the snippet and the full tracks in the database. This step is crucial for ensuring that the identified song matches the recorded snippet not just in the audio features but also in the sequence and timing of those features.

4. Song Identification

Finding the Best Match: If the algorithm finds a song in the database with a significant number of matching data points and in the correct sequence, it identifies that song as the match.
Retrieving Metadata: Once a match is found, Shazam retrieves the song’s metadata, which includes the song title, artist, album, and more. This information is compiled from various music metadata providers and record labels.

5. User Interaction

Display Results: The identified song’s information is then displayed to the user within the app. Users can interact with this information by playing a preview of the song, viewing lyrics, watching music videos, or linking to music streaming platforms to listen to the full song.
Social and Sharing Features: Users can also share their discoveries on social media, create playlists, and explore more music by the same artist or in the same genre.

Sequence diagram illustrating the process of Shazam’s music recognition from pressing the “Shazam” button to displaying song information

Shazam’s technology represents a sophisticated blend of signal processing, database search algorithms, and machine learning techniques. It’s designed to be highly efficient and accurate, even in challenging listening conditions with background noise or poor sound quality. This efficiency is a key reason why Shazam has become a popular tool for music discovery globally.

Watch the Microsoft Build 2023 keynote in under 10 minutes

Looking ahead to the AI ​​future of Visual Studio

ChatGPT – The time of Artificial Intelligence (AI) is here