test(separation): characterize stem-separation quality against known sources by seonghobae · Pull Request #557 · ContextualWisdomLab/bandscope

seonghobae · 2026-07-05T06:45:16Z

What

Adds two characterization tests (tests/test_separation_quality.py) that measure how well the local stem separator recovers a known source from a mixture, using ground-truth signals we control.

Why

The stem separator is a frequency-band FFT heuristic, not neural source separation — yet nothing measured its actual separation quality. Existing test_separation.py only covers role-keyword mapping, pure-tone band routing, and error handling. There was no test that a known source is actually separated.

The tests (measured facts, not a quality bar)

test_recovered_bass_is_not_high_fidelity_isolation — recovered bass-stem SI-SDR vs the true bass source stays below a clean-isolation bound (~9 dB measured; a neural model would exceed ~20 dB on a signal this trivial).
test_bass_source_energy_leaks_across_stems — a lone harmonic bass source leaks ~11% of its energy into other stems, proving it splits by frequency band, not by source.

These pin current behaviour as a regression guard. If a real separation model (e.g. demucs) is introduced, SI-SDR will rise past these bounds and the assertions should be re-baselined.

Verification

uv run pytest tests --cov=src/bandscope_analysis --cov-fail-under=100 — 435 passed, 100% coverage. ruff check + ruff format --check clean.

🤖 Generated with Claude Code

https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C

…sources The local stem separator is a frequency-band FFT heuristic, not neural source separation — but nothing measured how well it recovers a known source from a mixture (existing tests only check role keyword mapping, band routing of pure tones, and error handling). Add two characterization tests over a controlled ground-truth mix (harmonic-rich bass + vocal-band tone): - recovered bass stem SI-SDR stays below a clean-isolation bar (~9 dB measured; a neural model would exceed ~20 dB on a signal this simple) - a lone bass source leaks a meaningful energy share (~11%) into other stems, proving it splits by frequency band, not by source These pin current behaviour and act as a regression guard; the bounds should be re-baselined if a real separation model is introduced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C

On overlapping instruments (bass/keys/voice sharing bands) plus broadband drums, the band-split heuristic scores a NEGATIVE mean SI-SDR — for most stems the output is further from the true source than the mixture itself. Real neural separators are positive here (Demucs ~+9 dB, Open-Unmix ~+5 dB on MUSDB18). This pins that the current feature is not source separation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C

The separator is coarse FFT band-splitting (audio_separator.py: 'coarse canonical frequency and percussion bands'), not source separation — measured SI-SDR ~ -39 dB on a realistic mix vs ~+9 dB for demucs (characterization tests in PR #557). 'rough stem previews' matches the README's own hedged voice ('likely harmony', 'visible confidence') and the existing scope disclaimer, without overclaiming DAW-grade separation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C

seonghobae and others added 2 commits July 5, 2026 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(separation): characterize stem-separation quality against known sources#557

test(separation): characterize stem-separation quality against known sources#557
seonghobae wants to merge 2 commits into
developfrom
test/stem-separation-quality-baseline

seonghobae commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

seonghobae commented Jul 5, 2026

What

Why

The tests (measured facts, not a quality bar)

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant