⚡️ Speed up method VideoSourcesManager.retrieve_frames_from_sources by 26%#798
Open
codeflash-ai[bot] wants to merge 1 commit into
Conversation
The optimized code achieves a **25% speedup** through several key micro-optimizations focused on the hot path of the `retrieve_frames_from_sources` method: **Primary Optimizations:** 1. **Inlined method call elimination**: The original code called `_is_source_inactive()` for every source in the loop (277 calls taking 44.7% of total time). The optimized version inlines this check directly as `source_ord in self._ended_sources or source_ord in self._reconnection_threads`, eliminating function call overhead entirely. 2. **Loop structure optimization**: Replaced the `enumerate(zip(...))` pattern with a simple `range(total_sources)` loop and direct indexing. This avoids creating intermediate tuples and iterator objects, improving cache locality and reducing allocation overhead. 3. **Reduced datetime operations**: Cached `datetime.now` as a function reference outside the loop when timeout calculations are needed, preventing repeated attribute lookups in the hot path. 4. **Pre-cached attribute access**: Moved `self._video_sources.all_sources` and `self._video_sources.allow_reconnection` to local variables, eliminating repeated attribute access overhead in the loop. 5. **Minor copy optimization**: In `join_all_reconnection_threads`, replaced `copy(self._threads_to_join)` with `set(self._threads_to_join)` to avoid unnecessary copying. **Performance Impact by Test Case:** - **Large-scale scenarios** show the biggest gains (25.7% to 39.9% faster) where the loop optimizations compound across many sources - **Basic operations** see consistent 17-32% improvements across various conditions - **Early exit scenarios** benefit significantly (32.7% faster) due to reduced per-iteration overhead These optimizations are particularly valuable for video processing workloads where `retrieve_frames_from_sources` is called frequently in real-time scenarios, making the cumulative effect of these micro-optimizations substantial for overall system performance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 26% (0.26x) speedup for
VideoSourcesManager.retrieve_frames_from_sourcesininference/core/interfaces/camera/utils.py⏱️ Runtime :
212 microseconds→168 microseconds(best of30runs)📝 Explanation and details
The optimized code achieves a 25% speedup through several key micro-optimizations focused on the hot path of the
retrieve_frames_from_sourcesmethod:Primary Optimizations:
Inlined method call elimination: The original code called
_is_source_inactive()for every source in the loop (277 calls taking 44.7% of total time). The optimized version inlines this check directly assource_ord in self._ended_sources or source_ord in self._reconnection_threads, eliminating function call overhead entirely.Loop structure optimization: Replaced the
enumerate(zip(...))pattern with a simplerange(total_sources)loop and direct indexing. This avoids creating intermediate tuples and iterator objects, improving cache locality and reducing allocation overhead.Reduced datetime operations: Cached
datetime.nowas a function reference outside the loop when timeout calculations are needed, preventing repeated attribute lookups in the hot path.Pre-cached attribute access: Moved
self._video_sources.all_sourcesandself._video_sources.allow_reconnectionto local variables, eliminating repeated attribute access overhead in the loop.Minor copy optimization: In
join_all_reconnection_threads, replacedcopy(self._threads_to_join)withset(self._threads_to_join)to avoid unnecessary copying.Performance Impact by Test Case:
These optimizations are particularly valuable for video processing workloads where
retrieve_frames_from_sourcesis called frequently in real-time scenarios, making the cumulative effect of these micro-optimizations substantial for overall system performance.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-VideoSourcesManager.retrieve_frames_from_sources-miqutslfand push.