Glossary of Terms¶
Python Libraries & Classes¶
Librosa¶
Python library
librosais a popular open-source Python library for audio and music analysis. It is a powerful toolkit that helps you process, analyse, and understand sound.librosaprovides a wide range of Python functions for common tasks in music information retrieval (MIR) and audio processing, such as:Loading audio files: Reading various audio formats (
.wav,.MP3, etc.) into a format Python can easily work with.Feature extraction: Converting raw audio into meaningful numerical descriptions (features) that classification systems can use, like MFCCs (Mel-frequency cepstral coefficients), spectral centroid, rhythm features, etc.
Time-frequency analysis: Analysing how the frequency content of sound changes over time (e.g., creating spectrograms).
Beat and tempo detection: Identifying the pulse or speed of music.
Pitch tracking: Estimating the fundamental frequency of a sound.
Onset detection: Finding the precise moments where sounds begin (like a snare hit).
In DrumScript, librosa is crucial because it’s the underlying library that audio_loader.py, feature_extractor.py, and onset_detector.py use to actually perform the low-level audio processing and extract the characteristics of your drum sound and audio recordings. Librosa’s beat and tempo detection functions are specifically used in the tempo_detector.py and tempogram.py scripts.
StemSplitter¶
Class
Meaning: A class within
drumscript.audio_processorresponsible for separating source audio into distinct stems (drums, bass, vocals, other). It utilizes theDemucsmodel to perform high-quality source separation.Key Method:
split_drums(input_file, output_dir)- Specifically targets and returns the file path of the isolated drum track.
Example:
splitter = StemSplitter() drum_path = splitter.split_drums(*song.mp3*, *./stems*)
Demucs¶
External Library / Model
Meaning: A state-of-the-art music source separation model architecture.
DrumScriptwraps this technology to isolate drum frequencies from complex audio mixes, ensuring the classification engine receives clean drum audio even from full songs.
Definitions¶
Discrete Fourier Transform (DFT)¶
Mathematics
Meaning: The
Discrete Fourier Transform(orDFT) is a mathematical operation that converts a sequence of individual, distinct data points from their original domain (like time, or space) into a frequency domain representation. TheDFThelps reveal the underlying cycles or periodic patterns present within that (discrete) data. In short, theDFTbreaks down the complex temperature pattern into its repeating components.
Example:
Imagine you have a discrete sequence of numbers, such daily temperature readings for a city over a year. The DFT can analyse this data to tell you if there are dominant cycles, such as a strong yearly temperature cycle, a weaker weekly cycle (e.g., warmer weekends), or even daily temperature fluctuations (if the data were more granular).
Fast Fourier Transform (FFT)¶
Mathematics
Meaning: The Fast Fourier Transform (or FFT) is an efficient algorithm used to compute the Discrete Fourier Transform (DFT). In essence, it takes a sequence of numbers (discrete data points, like audio samples) and transforms them into a sequence of numbers that represent the different frequencies present in the original data, along with their magnitudes and phases.
Example:
Given a finite set of digital audio samples, the FFT quickly calculates the exact set of sine and cosine waves (each with a specific frequency and strength) that, when added together, perfectly reconstruct the original sequence of samples. This transformation is crucial for analyzing digital signals in frequency domain.–>
hop_length¶
Variable
Meaning: How far the listening window (
n_fft) slides forward (in number of samples) to take the next frequency snapshot.
Example:
If
hop_length=512, the window (orobject_event) moves 512 samples to the right for the next analysis, overlapping with the previous window (object_event).
Playing around with the
hop_lengthis often crucial for finding the right split of intervals in a given audio sample. Thehop_lengthalso depends on theSAMPLE_RATEdefined (in the case ofDrumScriptwe have chosen asample_rate=441000across the library)
There us a more detailed explanation of
hop_length, along with calculated examples in the README.md
n_fft¶
Variable
Meaning:
n_fft= Number of Fast Fourier Transform points. The size of the listening window (in number of samples) that is used to analyse the frequency content of anaudio_segment.
Example:
If
n_fft= 2048, the analysis looks at 2048 samples at a time to determine frequencies. If youraudio_segmentis shorter than 2048 samples, you get a warning.
sample_rate (sr)¶
Variable
Meaning: How many snapshots of the sound wave are taken per second when audio is digitised. A higher number means more detail.
Example:
> If **`sr = 22050 (Hz)`**, it means **`22,050`** sound measurements are recorded **every second**.
Example:
> If **`sr=22050 (Hz),`** and the **`input_signal`** of **one of your drum notes** is *`1744 milliseconds`* in length, then your **`sample_rate = ~38450 samples`**
Spectral Centroid = Brightness or Center of Gravity (sc)¶
Imagine the full range of sound (Frequency Spectrum) is a long seesaw or balance beam.
Left side: Low bass frequencies (the Thud).
Right side: High treble frequencies (The Hiss, Click, or Sizzle).
The Spectral Centroid is the exact point where you would place your finger under the beam to make it balance perfectly.
Low Centroid (< 150 Hz): The balance is heavy on the left. The sound is muffled, dark, and deep. Think of a heartbeat, or a distant explosion.
High Centroid (> 2000 Hz): The balance is heavy on the right. The sound is bright, sharp, or tinny. Think of a whistle, or a cymbal.
Why sometimes classification can fail due to mis-specified Spectral Centroid (sc)
You set the rule to a certain number, ie
<150Hz>indrumscript.utils.constants. However, almost all modern kick drums consist of two parts:
The Thud (50-100 Hz): The body of the sound.
The Click (2000-5000 Hz): The attack of the beater hitting the skin.
…sometimes, feature extraction can struggle to deal with this conflict.