Testing Guidance

DrumScript ships with a pytest test suite organised around a clear separation between fast unit tests and slower integration tests. This page covers the philosophy and contribution conventions for the suite. For a copy-pasteable command reference, see the tests README on GitHub.


The Test Pyramid

back to top

DrumScript follows the classic test pyramid:

Layer

Speed

Volume

What it covers

Unit

milliseconds

many (~75 tests)

Pure functions, helper logic, no I/O

Integration

seconds–minutes

few (~8 tests)

Real Demucs runs, real ffmpeg, real files

End-to-end

minutes

very few

Full pipeline: audio → MIDI/PDF/XML

The dev loop (pytest -m "not slow") runs only the unit layer, which finishes in well under 5 seconds. Integration tests are opt-in, so they don’t slow down day-to-day development but can be triggered before a release with a plain pytest.

Why this matters

Trying to test everything end-to-end is the most common testing mistake. It gives you a suite that takes 20 minutes to run and tells you nothing useful when something fails. Unit tests catch most bugs much faster.


Suite Layout

back to top

tests/
├── conftest.py              ← shared fixtures (auto-discovered)
├── fixtures/audio/          ← real audio files (empty; synthesised in conftest)
├── unit/                    ← fast, no I/O, no subprocess
│   ├── test_audio_loader.py
│   ├── test_helpers.py
│   ├── test_stem_splitter_helpers.py
│   ├── test_tempo_detector.py
│   ├── test_onset_detector.py
│   └── test_classify.py
└── integration/             ← real Demucs / ffmpeg / files (slow)
    └── test_stem_splitter_real.py

Tests are auto-discovered by pytest — any file matching test_*.py under tests/ is picked up automatically. There is no central registry to update when adding new files.


Markers

Two custom markers are defined in pytest.ini:

@pytest.mark.slow : Tests that take more than a second or two. Skipped by default during development.

@pytest.mark.integration : Tests that require external dependencies — Demucs, ffmpeg, real audio files. Always combined with slow.

# Fast loop (default)
pytest -m "not slow"

# Integration tests only
pytest -m integration

# Everything (e.g. before a release)
pytest

Strict markers

pytest.ini enables --strict-markers, which means typos like @pytest.mark.slwo will fail loudly instead of silently applying to nothing. If you add a new marker category, register it in pytest.ini first.


Writing a New Test

1. Pick the right layer

Ask yourself: does my test need to touch the filesystem, run a subprocess, or load audio that takes more than 100ms?

  • No → put it in tests/unit/. No marker needed.

  • Yes → put it in tests/integration/ and decorate with both @pytest.mark.slow and @pytest.mark.integration.

2. Follow the file conventions

  • Filename: test_*.py

  • Class names: Test*

  • Function names: test_*

  • Group related assertions inside a Test* class for readability — pytest treats each method as a separate test.

3. Use Arrange / Act / Assert

Every test should follow this shape:

def test_normalises_to_unit_peak():
    # Arrange — set up inputs
    audio = np.array([0.0, 0.5, -0.25, 0.1], dtype=np.float32)

    # Act — do the thing
    result = normalise_audio(audio)

    # Assert — verify the outcome
    assert np.isclose(np.max(np.abs(result)), 1.0)

One test = one act = one (or a few related) asserts. If you find yourself writing more than one act in a single test, split it into two.

4. Use tmp_path for file output

Pytest gives you a fresh temp directory per test, automatically cleaned up. Always use it instead of writing to the working directory or hardcoded paths.

def test_writes_output_file(tmp_path):
    output = tmp_path / "out.wav"
    # ... do stuff that writes to output ...
    assert output.exists()
    assert output.stat().st_size > 0

5. Reuse fixtures from conftest.py

The shared fixtures already cover most needs:

Fixture

What it gives you

sine_wave

1-second 440 Hz sine, mono float32

silent_audio

5 seconds of silence

click_track_120bpm

Deterministic click track for tempo tests

stereo_constant_audio

1 second of stereo audio with known amplitudes

small_wav_file

Sine wave written to disk in tmp_path

stem_files

Three distinct stems written to disk for mixing tests

Add new fixtures to conftest.py only if more than one file will use them. Single-use fixtures belong in the test file itself.


Patterns You’ll Use Often

Parametrised tests

When the same logic needs to be checked against many inputs, use @pytest.mark.parametrize rather than copying the test:

@pytest.mark.parametrize("input_beats,subdivision,expected", [
    (0.24, 4, 0.25),
    (0.51, 4, 0.50),
    (0.0, 4, 0.0),
    (1.99, 4, 2.0),
])
def test_round_to_nearest_subdivision(input_beats, subdivision, expected):
    assert round_to_nearest_subdivision(input_beats, subdivision) == pytest.approx(expected)

Each tuple becomes a separate test in pytest’s output, so you get clear per-case pass/fail reporting.

Approximate snapshot tests for DSP code

DSP algorithms (tempo detection, onset detection, etc.) produce approximate results. Direct equality assertions are too brittle. Use tolerance bands instead:

def test_tempo_detection_on_120bpm_click(click_track_120bpm):
    audio, sr = click_track_120bpm
    bpm = estimate_tempo(audio, sr)
    # ±10% — accepts 108-132 BPM. Wide enough for librosa's variance,
    # narrow enough to catch real regressions.
    assert 108 <= bpm <= 132

Mocking subprocess calls

The unit tests for stem_splitter don’t actually run Demucs — that would be too slow. Instead, they mock subprocess.run and verify the command being constructed. The real Demucs run lives in tests/integration/test_stem_splitter_real.py.


Coverage Reports

Once pytest-cov is installed (it’s part of the dev group), you can see which lines of source code your tests are exercising:

pytest --cov=drumscript --cov-report=term-missing

term-missing adds a column listing the line numbers that aren’t covered. That’s the column to look at when deciding what to test next.

Aim for value, not 100%

Don’t chase 100% coverage. The classification logic deserves rigorous tests; the __main__ blocks at the bottom of files don’t. High-value coverage is better than total coverage.


Common Pitfalls

  1. Don’t test third-party libraries. Test how your code uses them. An assertion like assert librosa.load("x.wav") is testing librosa, not DrumScript.

  2. Don’t test private implementation details. Test public behaviour. If you rename _read_stem_as_array to _load_stem, your tests for mix_stems should still pass.

  3. Don’t put real audio files larger than ~100KB in the repo. Use small fixtures and synthesise the rest in conftest.py.

  4. Don’t write the test only after the bug. When you fix a bug, write the test that would have caught it. This is the single most valuable kind of test you can add.


Continuous Integration

CI configuration is on the roadmap. The intended setup is:

  • Run pytest -m "not slow" on every push and pull request

  • Run the full suite (pytest) on tagged release commits

  • Publish coverage reports as a build artifact

Until CI is in place, contributors are expected to run pytest -m "not slow" locally before opening a pull request.


See Also


–>