fix(tts): Remove 440Hz beep, implement ALBERT encoder (#179) #183
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #179 - TTS sample outputs beep sound (440Hz sine wave) instead of actual speech.
Changes:
_forward_simple()that was causing the beepWeightNormConv1d: Convolution with weight normalization (weight_g/weight_v decomposition)InstanceNorm1d: Per-channel instance normalizationAdaIN: Adaptive Instance Normalization for style conditioningALBERTLayer/ALBERTEncoder: ALBERT with shared layer weightsKokoroTextEncoder: CNN (3 layers) + BiLSTM architectureAdaINResBlock: Residual blocks with AdaIN for style-conditioned decodingbuild_albert_from_weights(): Constructs ALBERT from weight dictbuild_text_encoder_from_weights(): Constructs text encoder from weight dictmodel.pyto use actual neural network layers instead of placeholderCurrent State:
Build Requirements
No C++/CUDA build required. This PR contains Python-only changes.
Linux CMake build should pass in CI without issues.
Test Plan
🤖 Generated with Claude Code