Skip to content

Performance: Optimize VAD segment energy tracking to zero-allocation running sum#240

Open
ysdede wants to merge 1 commit intomasterfrom
bolt/optimize-vad-energy-arrays-10749343366007178863
Open

Performance: Optimize VAD segment energy tracking to zero-allocation running sum#240
ysdede wants to merge 1 commit intomasterfrom
bolt/optimize-vad-energy-arrays-10749343366007178863

Conversation

@ysdede
Copy link
Copy Markdown
Owner

@ysdede ysdede commented Apr 4, 2026

What changed

Replaced speechEnergies and silenceEnergies arrays in AudioSegmentProcessor with a running speechEnergySum and speechEnergyCount.

Why it was needed (bottleneck evidence)

The processAudioData function runs very frequently (every ~80ms per 16kHz chunk). It was pushing energy floats into arrays (speechEnergies and silenceEnergies) and later calling .reduce() to calculate the average. This caused unnecessary object allocations and array iterations, contributing to main-thread GC churn during active speech.

Impact

  • Eliminates one array .push() per audio chunk.
  • Removes an O(N) array iteration (.reduce()) to calculate the average energy when a segment finalizes.
  • Drops an unused state array (silenceEnergies).
  • Results in zero-allocation energy tracking inside the VAD processor loop.

How to verify

  1. Run npx vitest run src/lib/audio/AudioSegmentProcessor.test.ts to confirm VAD functionality remains identical.
  2. In a performance profile (e.g. Chrome DevTools), observe reduced minor GC events during continuous speech input.

PR created automatically by Jules for task 10749343366007178863 started by @ysdede

Summary by Sourcery

Optimize VAD energy tracking in AudioSegmentProcessor to avoid per-chunk allocations and O(N) averaging in the hot path.

Enhancements:

  • Replace per-chunk speech energy arrays with running sum and count to compute average energy in O(1).
  • Remove unused silence energy tracking to simplify VAD processor state and reduce memory usage.
  • Update project notes to capture the running-sum pattern for high-frequency VAD energy computation.

Summary by CodeRabbit

  • Performance
    • Enhanced audio processing efficiency through optimized energy calculation, reducing memory overhead in streaming audio scenarios.

Replaces `speechEnergies` and `silenceEnergies` arrays in `AudioSegmentProcessor` with a running `speechEnergySum` and `speechEnergyCount`. This eliminates per-chunk array allocations, `.push()` calls, and `.reduce()` operations in the hot loop, reducing CPU overhead and memory allocations.
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

📝 Walkthrough

Walkthrough

The pull request optimizes AudioSegmentProcessor energy averaging by replacing per-chunk energy arrays with running totals (sum and count). This eliminates continuous array allocations and reduces garbage collection pressure in the streaming hot path while preserving functional behavior.

Changes

Cohort / File(s) Summary
Documentation
.jules/bolt.md
Adds a dated entry (2026-02-18) documenting the performance optimization approach for energy averaging, explaining the shift from array-based collection to running totals for O(1) computation.
Implementation
src/lib/audio/AudioSegmentProcessor.ts
Refactors energy tracking from array-based storage (speechEnergies, silenceEnergies) to aggregated running totals (speechEnergySum, speechEnergyCount). Updates average calculation logic and removes silence energy bookkeeping in non-speech states.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐰 Arrays fade, totals rise,
Sum and count, a streaming prize—
No more allocations slow,
Garbage collectors? Let them go!
Energy flows through the light,
Running averages burn so bright! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Performance: Optimize VAD segment energy tracking to zero-allocation running sum' accurately and specifically describes the main change: replacing array-based energy tracking with a zero-allocation running sum approach in the AudioSegmentProcessor.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bolt/optimize-vad-energy-arrays-10749343366007178863

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Optimize VAD energy tracking to zero-allocation running sum

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Replace array-based energy tracking with O(1) running sum
• Eliminate per-chunk array allocations and .reduce() calls
• Remove unused silenceEnergies array tracking
• Reduce GC pressure in VAD hot loop
Diagram
flowchart LR
  A["Array push per chunk"] -->|"Replace with"| B["Running sum += energy"]
  C["reduce() on finalize"] -->|"Replace with"| D["sum / count division"]
  E["speechEnergies array"] -->|"Remove"| F["speechEnergySum number"]
  G["silenceEnergies array"] -->|"Remove"| H["Unused tracking"]
  B --> I["Zero-allocation VAD loop"]
  D --> I
Loading

Grey Divider

File Changes

1. src/lib/audio/AudioSegmentProcessor.ts Performance optimization +14/-13

Replace energy arrays with running sum tracking

• Replaced speechEnergies and silenceEnergies arrays with speechEnergySum and
 speechEnergyCount properties in ProcessorState interface
• Updated energy accumulation logic to use += operator instead of .push() calls
• Changed average energy calculation from .reduce() to simple division sum / count
• Removed silenceEnergies array initialization and tracking entirely
• Updated startSpeech() and startSilence() methods to initialize running sum values
• Updated reset() method to initialize numeric properties instead of empty arrays

src/lib/audio/AudioSegmentProcessor.ts


2. .jules/bolt.md 📝 Documentation +4/-0

Document VAD energy optimization learning

• Added learning note about O(1) running sum pattern for high-frequency VAD energy tracking
• Documented the problem of array allocations in 80ms hot-path loops
• Recorded action item to use running totals instead of primitive value arrays

.jules/bolt.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 4, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Consider explicitly resetting speechEnergySum and speechEnergyCount when a speech segment ends (right after computing avgEnergy) so the state stays self-consistent and future changes to the state machine can’t accidentally reuse stale energy data.
  • Now that silenceEnergies and any silence energy accumulation are gone, it may be worth adding a short code comment near startSilence or the silence-handling branches to document that silence energy is intentionally ignored and not used in any downstream statistics.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider explicitly resetting `speechEnergySum` and `speechEnergyCount` when a speech segment ends (right after computing `avgEnergy`) so the state stays self-consistent and future changes to the state machine can’t accidentally reuse stale energy data.
- Now that `silenceEnergies` and any silence energy accumulation are gone, it may be worth adding a short code comment near `startSilence` or the silence-handling branches to document that silence energy is intentionally ignored and not used in any downstream statistics.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/audio/AudioSegmentProcessor.ts`:
- Around line 252-253: The energy for a chunk seeded by a proactive split
restart is being double-counted: when startSpeech(...) seeds the split-boundary
chunk, the subsequent increments to this.state.speechEnergySum and
this.state.speechEnergyCount (the two spots reported) add the same chunk again
and distort avgEnergy/energyIntegral. Fix by tracking when a chunk was seeded by
startSpeech (e.g., add a boolean flag on this.state like
splitSeededOrSeededByStartSpeech) that startSpeech(...) sets for the seeded
chunk, then guard the increments to this.state.speechEnergySum and
this.state.speechEnergyCount so they only run when that flag is false; after the
first legitimate use clear the flag so future chunks are counted normally.
Ensure you apply the same guard in both increment sites (the two diffs
reported).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c3660bf9-6227-44ce-a433-2d242e612f1c

📥 Commits

Reviewing files that changed from the base of the PR and between 474dbe6 and b53c031.

📒 Files selected for processing (2)
  • .jules/bolt.md
  • src/lib/audio/AudioSegmentProcessor.ts

Comment on lines +252 to +253
this.state.speechEnergySum += energy;
this.state.speechEnergyCount++;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prevent split-boundary chunk from being counted twice.

After proactive split restart (Line 209), this same chunk is already seeded by startSpeech(...) and then re-added here, skewing avgEnergy/energyIntegral for split segments.

💡 Suggested fix
 processAudioData(
     chunk: Float32Array,
     currentTime: number,
     energy: number
 ): ProcessedSegment[] {
@@
-    // --- Proactive Segment Splitting ---
+    let speechStartedThisChunk = false;
+
+    // --- Proactive Segment Splitting ---
     if (this.state.inSpeech && this.state.speechStartTime !== null) {
@@
                 // Start new segment immediately
                 this.startSpeech(currentTime, energy);
+                speechStartedThisChunk = true;
             }
         }
@@
             this.startSpeech(realStartTime, energy);
+            speechStartedThisChunk = true;
@@
             if (silenceDuration < this.options.maxSilenceWithinSpeech) {
                 // Not yet enough silence to consider it a break
-                this.state.speechEnergySum += energy;
-                this.state.speechEnergyCount++;
+                if (!speechStartedThisChunk) {
+                    this.state.speechEnergySum += energy;
+                    this.state.speechEnergyCount++;
+                }
             } else if (isConfirmedSilence) {
@@
         } else {
             // Continue in current state
-            if (this.state.inSpeech) {
+            if (this.state.inSpeech && !speechStartedThisChunk) {
                 this.state.speechEnergySum += energy;
                 this.state.speechEnergyCount++;
             } else {
                 // Silence, do nothing with energies
             }
         }

Also applies to: 283-284

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 252 - 253, The energy
for a chunk seeded by a proactive split restart is being double-counted: when
startSpeech(...) seeds the split-boundary chunk, the subsequent increments to
this.state.speechEnergySum and this.state.speechEnergyCount (the two spots
reported) add the same chunk again and distort avgEnergy/energyIntegral. Fix by
tracking when a chunk was seeded by startSpeech (e.g., add a boolean flag on
this.state like splitSeededOrSeededByStartSpeech) that startSpeech(...) sets for
the seeded chunk, then guard the increments to this.state.speechEnergySum and
this.state.speechEnergyCount so they only run when that flag is false; after the
first legitimate use clear the flag so future chunks are counted normally.
Ensure you apply the same guard in both increment sites (the two diffs
reported).

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the AudioSegmentProcessor by replacing array-based energy accumulation with a running sum and count. This change converts the average energy calculation from an O(n) operation with frequent memory allocations to an O(1) zero-allocation operation, reducing GC pressure in the high-frequency audio processing path. The documentation in .jules/bolt.md has been updated to reflect this optimization. I have no feedback to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant