You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds Parakeet v3 model support and improves the build/setup documentation. The changes are functional and well-intentioned, but there are several areas that need attention.
// Workaround for v3 preprocessor bug: round sample count to nearest 1000// v3 fails on specific odd-length audio (e.g., 240,135 samples)
AudioSegment working_segment = segment;
constsize_t original_size = segment.pcm.size();
constsize_t round_to = 1000;
constsize_t rounded_size = ((original_size + round_to - 1) / round_to) * round_to;
if (rounded_size != original_size) {
working_segment.pcm.resize(rounded_size, 0.0F); // Pad with zeros
}
Issues:
This unconditionally copies the entire audio segment even when padding isn't needed (line 49)
The root cause of the "v3 preprocessor bug" is unclear - is this a model export issue, OpenVINO compilation issue, or actual model limitation?
Padding with zeros could affect transcription quality for audio that ends precisely at boundaries
The value 1000 appears arbitrary without justification
Recommendations:
Only copy the segment when padding is actually needed: if (rounded_size != original_size) { working_segment.pcm = segment.pcm; working_segment.pcm.resize(...); }
Investigate the root cause - this might be a model export issue that should be fixed upstream
Document why 1000 samples (62.5ms at 16kHz) was chosen
Consider if this should only apply to v3, not all models
2. Magic Number for Dynamic Encoder Fallback (parakeet_openvino.cpp:395-399)
// Fallback for dynamic shapes: Use 1250 frames (10 seconds at 125 frames/sec mel rate)if (impl_->encoder_expected_frames == 0) {
impl_->encoder_expected_frames = 1250;
}
Issues:
The comment says "125 frames/sec mel rate" but this needs verification
No explanation for why 10 seconds is the right chunk size for dynamic models
Should this be different for v2 vs v3?
Recommendation:
Add a constant: constexpr size_t DEFAULT_ENCODER_FRAMES = 1250; // 10s at 125 frames/sec
Document why this matches the preprocessor window size
Windows path is hardcoded, but vcpkg location varies
Recommendation:
Use [path-to-vcpkg] placeholder like the comment above it
Show both Windows and Linux examples
6. Incorrect Comment (README.md:147)
# Note: V3 currently only tested with English. Multilingual support coming soon.
Issue:
The code changes don't actually add language parameter support for v3
This creates false expectations
Recommendation:
Either implement language support or clarify that v3 multilingual is not yet integrated
Performance Considerations
7. Unnecessary Audio Copy
As mentioned in Issue #1, the audio segment is copied unconditionally for v3, even when no padding is needed. For long audio files, this could double memory usage temporarily.
Impact: Minor for typical use cases (<1 minute audio), but could be significant for long recordings.
8. No Benchmark for V3
The PR adds v3 support but doesn't mention if benchmarks were run to compare:
WER (Word Error Rate)
RTFx (Real-Time Factor)
Memory usage
Recommendation:
Run benchmark_librispeech.exe with both v2 and v3
Document results in PR description or BENCHMARK_RESULTS.md
Security Concerns
9. Integer Overflow Checks ✅
The existing code already has good overflow protection (lines 147-150, 209-213 in parakeet_preprocessor.cpp). No new issues introduced.
10. Input Size Validation⚠️
The rounding workaround could theoretically cause issues if:
Investigate and document the "v3 preprocessor bug" root cause
Add overflow check for padding calculation (Issue #10)
Should Fix:
4. Add at least basic unit tests for the new preprocessing logic
5. Run and document benchmarks comparing v2 and v3
6. Validate CLI model input against MODEL_MAP
7. Document when to use v2 vs v3
Nice to Have:
8. Extract magic numbers into named constants
9. Better commit message
10. Test with various audio lengths to validate the workaround
Overall Assessment: The functionality appears correct, but the preprocessing workaround is concerning without understanding the root cause. The documentation improvements are valuable. I'd recommend testing more thoroughly and adding some basic unit tests before merging.
Would you like me to help implement any of these suggestions?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
i decided to try to run v3 and v2 from scratched with a refresh git clone