test(separation): characterize stem-separation quality against known sources#557
Open
seonghobae wants to merge 2 commits into
Open
test(separation): characterize stem-separation quality against known sources#557seonghobae wants to merge 2 commits into
seonghobae wants to merge 2 commits into
Conversation
…sources The local stem separator is a frequency-band FFT heuristic, not neural source separation — but nothing measured how well it recovers a known source from a mixture (existing tests only check role keyword mapping, band routing of pure tones, and error handling). Add two characterization tests over a controlled ground-truth mix (harmonic-rich bass + vocal-band tone): - recovered bass stem SI-SDR stays below a clean-isolation bar (~9 dB measured; a neural model would exceed ~20 dB on a signal this simple) - a lone bass source leaks a meaningful energy share (~11%) into other stems, proving it splits by frequency band, not by source These pin current behaviour and act as a regression guard; the bounds should be re-baselined if a real separation model is introduced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C
On overlapping instruments (bass/keys/voice sharing bands) plus broadband drums, the band-split heuristic scores a NEGATIVE mean SI-SDR — for most stems the output is further from the true source than the mixture itself. Real neural separators are positive here (Demucs ~+9 dB, Open-Unmix ~+5 dB on MUSDB18). This pins that the current feature is not source separation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C
seonghobae
added a commit
that referenced
this pull request
Jul 5, 2026
The separator is coarse FFT band-splitting (audio_separator.py: 'coarse canonical frequency and percussion bands'), not source separation — measured SI-SDR ~ -39 dB on a realistic mix vs ~+9 dB for demucs (characterization tests in PR #557). 'rough stem previews' matches the README's own hedged voice ('likely harmony', 'visible confidence') and the existing scope disclaimer, without overclaiming DAW-grade separation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds two characterization tests (
tests/test_separation_quality.py) that measure how well the local stem separator recovers a known source from a mixture, using ground-truth signals we control.Why
The stem separator is a frequency-band FFT heuristic, not neural source separation — yet nothing measured its actual separation quality. Existing
test_separation.pyonly covers role-keyword mapping, pure-tone band routing, and error handling. There was no test that a known source is actually separated.The tests (measured facts, not a quality bar)
test_recovered_bass_is_not_high_fidelity_isolation— recovered bass-stem SI-SDR vs the true bass source stays below a clean-isolation bound (~9 dB measured; a neural model would exceed ~20 dB on a signal this trivial).test_bass_source_energy_leaks_across_stems— a lone harmonic bass source leaks ~11% of its energy into other stems, proving it splits by frequency band, not by source.These pin current behaviour as a regression guard. If a real separation model (e.g. demucs) is introduced, SI-SDR will rise past these bounds and the assertions should be re-baselined.
Verification
uv run pytest tests --cov=src/bandscope_analysis --cov-fail-under=100— 435 passed, 100% coverage.ruff check+ruff format --checkclean.🤖 Generated with Claude Code
https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C