Skip to content

multi threaded processing#461

Open
bashbaug wants to merge 12 commits intointel:mainfrom
bashbaug:multi-threaded-processing
Open

multi threaded processing#461
bashbaug wants to merge 12 commits intointel:mainfrom
bashbaug:multi-threaded-processing

Conversation

@bashbaug
Copy link
Copy Markdown
Contributor

@bashbaug bashbaug commented Mar 29, 2026

Description of Changes

Adds support for processing device timing events on a separate thread. This can significantly reduce overhead because the application threads may continue executing while prior device timing events are processed. Chrome trace flushing is also moved to a separate thread, which can also improve performance, though additional work is needed to eliminate locks to take full advantage of multiple threads for chrome tracing.

For applications that do not want to create additional threads, this PR also adds a control to continue processing device timing events in the applicaiton threads. This control can be set via cliloader by passing the --no-threads or -nt command line options.

Finally, adds support for building with the thread sanitizer enabled.

Testing Done

Tested with an openvino benchmark app. Prior to this change, the benchmark app reported ~1450fps without the OpenCL Intercept Layer and ~975fps with device performance timing enabled (note: with max enqueue set to 2M). After this change, the benchmark app reported ~1440fps with device performance timing enabled, meaning that device performance timing was essentially free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant