Skip to content

Performance: Optimize speech energy average calculation with running sum#245

Open
ysdede wants to merge 1 commit intomasterfrom
perf/vad-running-sum-6872108495120039020
Open

Performance: Optimize speech energy average calculation with running sum#245
ysdede wants to merge 1 commit intomasterfrom
perf/vad-running-sum-6872108495120039020

Conversation

@ysdede
Copy link
Copy Markdown
Owner

@ysdede ysdede commented Apr 9, 2026

Replaced an O(N) array .reduce() used to calculate average segment energy with an O(1) running sum and count within the VAD's core hot path.


PR created automatically by Jules for task 6872108495120039020 started by @ysdede

Summary by Sourcery

Optimize voice activity detection segment energy averaging by replacing per-segment array reduction with a running sum and count, and update internal docs to reflect the new performance pattern.

Enhancements:

  • Track speech energy using running sum and count to compute average energy in constant time instead of reducing over the accumulated array.
  • Initialize and reset new speech energy tracking fields alongside existing VAD processor state to keep statistics consistent.

Documentation:

  • Update internal performance notes to recommend running sum/count over array reduction for average calculations in high-frequency audio processing paths.

Summary by CodeRabbit

  • Refactor
    • Optimized voice activity detection audio processing to reduce CPU overhead during high-frequency and long-running operations by streamlining how audio segment statistics are computed.

In `AudioSegmentProcessor`, `speechEnergies` was being continuously appended to during long speech segments, and resolving the average upon silence confirmation used an O(N) `.reduce()` iteration. Over very long segments (e.g., 5000+ chunks), this operation took ~50-100ms.

This commit introduces `speechEnergySum` and `speechEnergyCount` to incrementally track the average in O(1) time alongside the array pushes. The array was left intact for safety if undocumented logic depends on the array's contents.

Impact:
Eliminates overhead for computing the average energy of speech segments.

Verification:
- Baseline measurement via a script with simulated 5000-chunk speech segments: ~112ms / ~98ms (reduce) -> ~71ms / ~52ms (running sum)
- Existing tests passed (`bun test src/lib/audio/AudioSegmentProcessor.test.ts` and `npm run test`).
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Optimize speech energy average calculation with running sum

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Replace O(N) array reduce with O(1) running sum for energy average
• Add speechEnergySum and speechEnergyCount state variables
• Eliminate ~50-100ms overhead on long speech segments (5000+ chunks)
• Update documentation with performance optimization learnings
Diagram
flowchart LR
  A["Audio chunks processed"] -->|"accumulate energy"| B["speechEnergySum += energy<br/>speechEnergyCount++"]
  B -->|"on silence confirmation"| C["avgEnergy = sum / count<br/>O(1) operation"]
  D["Previous approach"] -->|"O(N) reduce"| E["avgEnergy calculation<br/>50-100ms overhead"]
  C -->|"Performance gain"| F["Reduced CPU overhead"]
Loading

Grey Divider

File Changes

1. src/lib/audio/AudioSegmentProcessor.ts Performance optimization +12/-2

Replace array reduce with running sum for energy

• Added speechEnergySum and speechEnergyCount fields to ProcessorState interface
• Updated energy average calculation from .reduce() to running sum division
• Increment running sum and count in three locations: silence tolerance check, continued speech
 state, and speech segment start
• Initialize running sum and count in startSpeech() and reset() methods

src/lib/audio/AudioSegmentProcessor.ts


2. .jules/bolt.md 📝 Documentation +3/-7

Update performance optimization learnings

• Removed outdated learnings about circular buffer access and memory verification
• Added new learning about running sum optimization in VAD audio processing
• Documented performance impact of array iteration in high-frequency audio paths

.jules/bolt.md


Grey Divider

Qodo Logo

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

The changes optimize CPU performance in the audio segment processor by replacing array reduction operations for computing averages with maintained running aggregate state (speechEnergySum, speechEnergyCount), reducing computational overhead in high-frequency VAD paths.

Changes

Cohort / File(s) Summary
Documentation Updates
.jules/bolt.md
Removed journal entries on circular buffer optimization and zero-allocation behavior; added entry documenting .reduce() CPU overhead in VAD paths and recommending running sum/count aggregates instead.
Audio Processor Optimization
src/lib/audio/AudioSegmentProcessor.ts
Extended internal ProcessorState to track running aggregates (speechEnergySum, speechEnergyCount); replaced array reduction with aggregate division when computing avgEnergy during speech termination; aggregates updated during speech continuation and temporary silence periods.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A rabbit hops with lighter feet,
No more arrays to pile and beat!
Running sums dance swift and bright,
VAD paths zoom through the night.
Efficiency wins—the voices flow,
With aggregate grace, we reap and sow!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Performance: Optimize speech energy average calculation with running sum' directly and clearly describes the main change: replacing an O(N) reduce operation with an O(1) running sum approach for computing speech energy averages, which is the core objective of this performance optimization PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/vad-running-sum-6872108495120039020

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 9, 2026

Code Review by Qodo

🐞 Bugs (1)   📘 Rule violations (0)   📎 Requirement gaps (0)   🎨 UX Issues (0)
🐞\ ➹ Performance (1)

Grey Divider


Remediation recommended

1. Unused speechEnergies buffer 🐞
Description
avgEnergy is now computed from speechEnergySum/speechEnergyCount, but the processor still
appends each chunk’s energy to speechEnergies even though that array is no longer read anywhere,
adding avoidable hot-path work and per-segment memory retention. This also creates duplicated state
that can silently diverge if one side is modified later.
Code

src/lib/audio/AudioSegmentProcessor.ts[R252-263]

            if (silenceDuration < this.options.maxSilenceWithinSpeech) {
                // Not yet enough silence to consider it a break
                this.state.speechEnergies.push(energy);
+                this.state.speechEnergySum += energy;
+                this.state.speechEnergyCount++;
            } else if (isConfirmedSilence) {
                // Confirmed silence - end speech segment
                if (this.state.speechStartTime !== null) {
                    const speechDuration = currentTime - this.state.speechStartTime;
-                    const avgEnergy = this.state.speechEnergies.length > 0
-                        ? this.state.speechEnergies.reduce((a, b) => a + b, 0) / this.state.speechEnergies.length
+                    const avgEnergy = this.state.speechEnergyCount > 0
+                        ? this.state.speechEnergySum / this.state.speechEnergyCount
                        : 0;
Evidence
The PR changes avgEnergy to use speechEnergySum/speechEnergyCount, but speechEnergies is still
only written to (pushed/reset) and not used for the average (or other calculations visible in this
module), making it redundant state on the processing hot path.

src/lib/audio/AudioSegmentProcessor.ts[50-68]
src/lib/audio/AudioSegmentProcessor.ts[247-293]
src/lib/audio/AudioSegmentProcessor.ts[334-346]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`avgEnergy` is computed via `speechEnergySum/speechEnergyCount`, but `speechEnergies` is still appended to on every processed chunk and reset on speech start. This array is now redundant state and adds avoidable hot-path work + per-segment memory retention.

## Issue Context
The intent of the PR is to optimize the VAD hot path by removing per-segment O(N) reductions. Keeping an ever-growing `speechEnergies` buffer (that is no longer used for the average) undermines some of that benefit and creates duplicated state that can drift.

## Fix Focus Areas
- src/lib/audio/AudioSegmentProcessor.ts[50-68]
- src/lib/audio/AudioSegmentProcessor.ts[247-293]
- src/lib/audio/AudioSegmentProcessor.ts[334-346]
- src/lib/audio/AudioSegmentProcessor.ts[560-586]

## Suggested approach
- Remove `speechEnergies` from `ProcessorState` if it has no other consumers.
- Remove all `.push()` / assignment sites for `speechEnergies`.
- If you still want to preserve it for debugging, gate it behind an explicit debug option (and keep sum/count as the primary stats path).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • You’ve introduced speechEnergySum and speechEnergyCount but only ever increment them; make sure they’re reset or kept in sync everywhere speechEnergies is cleared or a speech segment ends, otherwise averages will bleed across segments.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- You’ve introduced `speechEnergySum` and `speechEnergyCount` but only ever increment them; make sure they’re reset or kept in sync everywhere `speechEnergies` is cleared or a speech segment ends, otherwise averages will bleed across segments.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/audio/AudioSegmentProcessor.ts (1)

255-256: Centralize speech-energy accumulation to avoid state drift.

push + sum + count is duplicated across branches. A small helper would reduce maintenance risk and keep speechEnergies, speechEnergySum, and speechEnergyCount guaranteed in sync.

♻️ Suggested refactor
+    private appendSpeechEnergy(energy: number): void {
+        this.state.speechEnergies.push(energy);
+        this.state.speechEnergySum += energy;
+        this.state.speechEnergyCount++;
+    }

     private startSpeech(time: number, energy: number): void {
         this.state.inSpeech = true;
         this.state.speechStartTime = time;
         this.state.silenceCounter = 0;
-        this.state.speechEnergies = [energy];
-        this.state.speechEnergySum = energy;
-        this.state.speechEnergyCount = 1;
+        this.state.speechEnergies = [];
+        this.state.speechEnergySum = 0;
+        this.state.speechEnergyCount = 0;
+        this.appendSpeechEnergy(energy);
         this.state.silenceStartTime = null;
         this.state.silenceDuration = 0;
-                this.state.speechEnergies.push(energy);
-                this.state.speechEnergySum += energy;
-                this.state.speechEnergyCount++;
+                this.appendSpeechEnergy(energy);
-                this.state.speechEnergies.push(energy);
-                this.state.speechEnergySum += energy;
-                this.state.speechEnergyCount++;
+                this.appendSpeechEnergy(energy);

Also applies to: 288-289, 342-343

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 255 - 256, The
duplicated accumulation (pushing into speechEnergies and updating
speechEnergySum and speechEnergyCount) in AudioSegmentProcessor should be
centralized: add a private helper method (e.g., addSpeechEnergy(energy: number))
on the AudioSegmentProcessor class that pushes energy into
this.state.speechEnergies and updates this.state.speechEnergySum and
this.state.speechEnergyCount atomically, then replace the three duplicated code
sites (the branches currently doing this.state.speechEnergies.push(...);
this.state.speechEnergySum += ...; this.state.speechEnergyCount++;) with a
single call to addSpeechEnergy(energy) so the three state fields remain in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/lib/audio/AudioSegmentProcessor.ts`:
- Around line 255-256: The duplicated accumulation (pushing into speechEnergies
and updating speechEnergySum and speechEnergyCount) in AudioSegmentProcessor
should be centralized: add a private helper method (e.g.,
addSpeechEnergy(energy: number)) on the AudioSegmentProcessor class that pushes
energy into this.state.speechEnergies and updates this.state.speechEnergySum and
this.state.speechEnergyCount atomically, then replace the three duplicated code
sites (the branches currently doing this.state.speechEnergies.push(...);
this.state.speechEnergySum += ...; this.state.speechEnergyCount++;) with a
single call to addSpeechEnergy(energy) so the three state fields remain in sync.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b5b1a3c6-e68d-45fe-b922-478d8e563ab8

📥 Commits

Reviewing files that changed from the base of the PR and between 474dbe6 and ef5e8d1.

📒 Files selected for processing (2)
  • .jules/bolt.md
  • src/lib/audio/AudioSegmentProcessor.ts

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the AudioSegmentProcessor by replacing array-based average energy calculation with a running sum and count (speechEnergySum, speechEnergyCount). This change aims to improve performance in high-frequency audio processing paths, as documented in the updated .jules/bolt.md file. A review comment suggests that the speechEnergyCount > 0 check at line 261 in AudioSegmentProcessor.ts is redundant, as the count is always initialized to 1 and incremented, ensuring it's never zero when a speech segment is active.

Comment on lines +261 to 263
const avgEnergy = this.state.speechEnergyCount > 0
? this.state.speechEnergySum / this.state.speechEnergyCount
: 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check this.state.speechEnergyCount > 0 is redundant here because speechEnergyCount is initialized to 1 in startSpeech (line 343) and incremented thereafter. It will always be at least 1 when a speech segment is active.

const avgEnergy = this.state.speechEnergySum / this.state.speechEnergyCount;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant