Testing Guidance¶

DrumScript ships with a pytest test suite organised around a clear separation between fast unit tests and slower integration tests. This page covers the philosophy and contribution conventions for the suite. For a copy-pasteable command reference, see the tests README on GitHub.

The Test Pyramid
Suite layout
Writing a new test
Patterns you’ll use often
Regression tests
Coverage reports
Common pitfalls
Continuous integration
See also

The Test Pyramid¶

back to top

DrumScript follows the classic test pyramid:

Layer	Speed	Volume (as of v0.1.6)	What it covers
Unit	milliseconds	~131 cases across 11 files	Pure functions, helper logic, no I/O
Integration	seconds–minutes	~8 cases	Real Demucs runs, real ffmpeg, real files
End-to-end	minutes	very few	Full pipeline: audio → MIDI/PDF/XML

The dev loop (pytest -m "not slow") runs only the unit layer, which finishes in well under 10 seconds. Integration tests are opt-in, so they don’t slow down day-to-day development but can be triggered before a release with a plain pytest.

Why this matters

Trying to test everything end-to-end is the most common testing mistake. It gives you a suite that takes 20 minutes to run and tells you nothing useful when something fails. Unit tests catch most bugs much faster.

Suite Layout¶

back to top

tests/
├── conftest.py              ← shared fixtures (auto-discovered)
├── fixtures/audio/          ← real audio files (empty; synthesised in conftest)
├── unit/                    ← fast, no I/O, no subprocess
tests/
├── README.md
├── conftest.py
└── unit
    ├── __init__.py
    ├── test_audio_loader.py
    ├── test_benchmarks_run.py
    ├── test_classify.py
    ├── test_cli_args.py
    ├── test_deprecation_warnings.py
    ├── test_helpers.py
    ├── test_idmt_dataset.py
    ├── test_onset_detector.py
    ├── test_stem_splitter_helpers.py
    ├── test_tempo_detector.py
    └── test_transcribe.py
└── integration/             ← Demucs / ffmpeg / files (slow)
    └── test_stem_splitter_real.py

Tests are auto-discovered by pytest — any file matching test_*.py under tests/ is picked up automatically. There is no central registry to update when adding new files.

Markers¶

Two custom markers are defined in pyproject.toml (under [tool.pytest.ini_options]):

@pytest.mark.slow : Tests that take more than a second or two. Skipped by default during development.

@pytest.mark.integration : Tests that require external dependencies — Demucs, ffmpeg, real audio files. Always combined with slow.

# Fast loop (default)
pytest -m "not slow"

# Integration tests only
pytest -m integration

# Everything (e.g. before a release)
pytest

Strict markers

The pytest configuration enables --strict-markers, which means typos like @pytest.mark.slwo will fail loudly instead of silently applying to nothing. If you add a new marker category, register it in pyproject.toml first.

Writing a New Test¶

1. Pick the right layer¶

Ask yourself: does my test need to touch the filesystem, run a subprocess, or load audio that takes more than 100ms?

No → put it in tests/unit/. No marker needed.
Yes → put it in tests/integration/ and decorate with both @pytest.mark.slow and @pytest.mark.integration.

2. Follow the file conventions¶

Filename: test_*.py
Class names: Test*
Function names: test_*
Group related assertions inside a Test* class for readability — pytest treats each method as a separate test.

3. Use Arrange / Act / Assert¶

Every test should follow this shape:

def test_normalises_to_unit_peak():
    # Arrange — set up inputs
    audio = np.array([0.0, 0.5, -0.25, 0.1], dtype=np.float32)

    # Act — do the thing
    result = normalise_audio(audio)

    # Assert — verify the outcome
    assert np.isclose(np.max(np.abs(result)), 1.0)

One test = one act = one (or a few related) asserts. If you find yourself writing more than one act in a single test, split it into two.

4. Use `tmp_path` for file output¶

Pytest gives you a fresh temp directory per test, automatically cleaned up. Always use it instead of writing to the working directory or hardcoded paths.

def test_writes_output_file(tmp_path):
    output = tmp_path / "out.wav"
    # ... do stuff that writes to output ...
    assert output.exists()
    assert output.stat().st_size > 0

5. Reuse fixtures from `conftest.py`¶

The shared fixtures already cover most needs:

Fixture	What it gives you
`sine_wave`	1-second 440 Hz sine, mono float32
`silent_audio`	5 seconds of silence
`click_track_120bpm`	Deterministic click track for tempo tests
`stereo_constant_audio`	1 second of stereo audio with known amplitudes
`small_wav_file`	Sine wave written to disk in `tmp_path`
`stem_files`	Three distinct stems written to disk for mixing tests

Add new fixtures to conftest.py only if more than one file will use them. Single-use fixtures belong in the test file itself.

Patterns You’ll Use Often¶

back to top

Parametrised tests¶

When the same logic needs to be checked against many inputs, use @pytest.mark.parametrize rather than copying the test:

@pytest.mark.parametrize("input_beats,subdivision,expected", [
    (0.24, 4, 0.25),
    (0.51, 4, 0.50),
    (0.0, 4, 0.0),
    (1.99, 4, 2.0),
])
def test_round_to_nearest_subdivision(input_beats, subdivision, expected):
    assert round_to_nearest_subdivision(input_beats, subdivision) == pytest.approx(expected)

Each tuple becomes a separate test in pytest’s output, so you get clear per-case pass/fail reporting.

Approximate snapshot tests for DSP code¶

DSP algorithms (tempo detection, onset detection, etc.) produce approximate results. Direct equality assertions are too brittle. Use tolerance bands instead:

def test_tempo_detection_on_120bpm_click(click_track_120bpm):
    audio, sr = click_track_120bpm
    bpm = estimate_tempo(audio, sr)
    # ±10% — accepts 108-132 BPM. Wide enough for librosa's variance,
    # narrow enough to catch real regressions.
    assert 108 <= bpm <= 132

Mocking subprocess calls¶

The unit tests for stem_splitter don’t actually run Demucs — that would be too slow. Instead, they mock subprocess.run and verify the command being constructed. The real Demucs run lives in tests/integration/test_stem_splitter_real.py.

Asserting on warnings¶

When testing deprecated APIs or other code paths that should emit warnings, use pytest.warns() rather than try/except:

def test_deprecated_param_warns():
    with pytest.warns(DeprecationWarning, match="verbose"):
        ds.detect_tempo(audio, full=True)

The `match` argument is a regex against the warning message — useful for locking in that the warning text actually points users at the replacement.¶

Regression tests¶

back to top

Some test files exist specifically to lock in behaviour that was previously inconsistent, ambiguous, or breaking. They sit alongside the normal unit tests but should not be removed without a deliberate decision:

test_cli_args.py : Locks in --full-song (hyphenated) as the canonical CLI flag after the v0.1.6 rename from --full. Also asserts that the underscore variant --full_song is rejected, and documents that --full continues to work as a backwards-compat prefix.

test_deprecation_warnings.py : Locks in the full → verbose deprecation shim on the Python API (transcribe, extract_stems, detect_tempo). Asserts that full=True still works, emits a DeprecationWarning, and that the warning text mentions both the replacement parameter (verbose) and the removal version (v1.0.0). Delete (or flip to expect TypeError) when full is removed in v1.0.0.

When you find a class of bug that previously slipped through, the corresponding test belongs here too — not just as a pass/fail check, but with comments explaining why the test exists so it isn’t deleted by accident later.

Coverage Reports¶

Once pytest-cov is installed (it’s part of the dev group), you can see which lines of source code your tests are exercising:

pytest --cov=drumscript --cov-report=term-missing

term-missing adds a column listing the line numbers that aren’t covered. That’s the column to look at when deciding what to test next.

Aim for value, not 100%

Don’t chase 100% coverage. The classification logic deserves rigorous tests; the __main__ blocks at the bottom of files don’t. High-value coverage is better than total coverage.

Common Pitfalls¶

Don’t test third-party libraries. Test how your code uses them. An assertion like assert librosa.load("x.wav") is testing librosa, not DrumScript.
Don’t test private implementation details. Test public behaviour. If you rename _read_stem_as_array to _load_stem, your tests for mix_stems should still pass.
Don’t put real audio files larger than ~100KB in the repo. Use small fixtures and synthesise the rest in conftest.py.
Don’t write the test only after the bug. When you fix a bug, write the test that would have caught it. This is the single most valuable kind of test you can add.
Don’t conflate filter configuration with behaviour. Tests that use pytest.warns() and recwarn capture warnings regardless of how Python’s warning filters are configured. Don’t add warnings.simplefilter("always") inside tests — pytest handles this for you.

Continuous Integration¶

CI runs via GitHub Actions on every push and pull request. The current configuration:

pytest -m "not slow" runs on every push and pull request via .github/workflows/tests.yml
Full suite (pytest) runs on tagged release commits
Publish (.github/workflows/publish.yml) runs on release creation

Contributors should still run pytest -m "not slow" locally before opening a pull request so the feedback loop is fast.

Testing Guidance¶

The Test Pyramid¶

Suite Layout¶

Markers¶

Writing a New Test¶

1. Pick the right layer¶

2. Follow the file conventions¶

3. Use Arrange / Act / Assert¶

4. Use tmp_path for file output¶

5. Reuse fixtures from conftest.py¶

Patterns You’ll Use Often¶

Parametrised tests¶

Approximate snapshot tests for DSP code¶

Mocking subprocess calls¶

Asserting on warnings¶

The match argument is a regex against the warning message — useful for locking in that the warning text actually points users at the replacement.¶

Regression tests¶

Coverage Reports¶

Common Pitfalls¶

Continuous Integration¶

See Also¶

4. Use `tmp_path` for file output¶

5. Reuse fixtures from `conftest.py`¶

The `match` argument is a regex against the warning message — useful for locking in that the warning text actually points users at the replacement.¶