[issue-858] GPU implementation of 911 vertices#880
Merged
NicolasJPosey merged 71 commits intoPoseyDevelopmentfrom Mar 12, 2026
Merged
[issue-858] GPU implementation of 911 vertices#880NicolasJPosey merged 71 commits intoPoseyDevelopmentfrom
NicolasJPosey merged 71 commits intoPoseyDevelopmentfrom
Conversation
Adds support for two uint64_t argument functions so that loadEpochInputs can be registered and called from the OperationManager class.
…-911vertices-gpu-implementation
Added a data member for storing the total number of events that are read into the InputManager. This allows us to define vector capacities based on the number of events being simulated.
AllVertices now has a non-virtual loadEpochInputs method. This calls two virtual methods, one for loading the epoch inputs and other for copy the inputs to the GPU. The default behavior for both is to do nothing.
…od instead of a connections method This method makes more sense to be a behavior of vertices as the behavior is also needed to be run on the GPU.
…-911vertices-gpu-implementation
We need a dynamically sized array use we use a vector instead of an array. But we want the implement to be easily mirrored on the GPU so we interact with the vector like we would with an array.
…ent call The push back call was not easy to mirror on the GPU. We already have examples of using the EventBuffer and insertEvent call on the GPU so we change to use this implementation. This also allows us to make it clear what size the buffer should be, again helping the mirrored GPU implementation.
This allows us to remove the resize calls which we don't want to do on the GPU. Also added a DoubleEventBuffer to use in place of RecordableVector<double>.
…-911vertices-gpu-implementation
The correct pattern is to first copy the device pointers to the CPU and then the values to the CPU data members. It happens to be the same that the number of bytes for a uint64_t and a uint64_t pointer are the same. However, if this pattern is repeated for a type like float, an illegal memory error is thrown.
The GPU noise array only works for numVertices >= 100 and that are a multiple of 100. Otherwise, an invalid kernel configuration error is thrown which masks other possible errors.
Having asserts in kernels can cause them to fail silence. Using print statements and returning is a better way to fail inside kernels.
Clean up of commented out code, unnecessary extra variables, and unused methods.
The update in vertex creation to make each vertex have the same sized data member for the GPU made it so that we would never get a dropped call due to large queue sizes. The logic was changed were we interact with vertex queues in PSAPs and RESPs to act like the size is equal to the number of trunks which was the original implementation size.
RecordableVectors are cleared after each epoch if they are the dynamic type. The size is not reset. We need subscript operator access for droppedCalls so the type much be constant which does not clear the vector after each epoch.
The current implementation for generating noise on a device has some assumptions that break for the 911 GPU model. To get noise support for 911, we implemented a way to have vertices specific how many noise elements they will need. A method was then implemented in GPUModel that rounds the input up to the nearest multiple of 100.
…-911vertices-gpu-implementation
NicolasJPosey
commented
Feb 15, 2026
Contributor
Author
There was a problem hiding this comment.
Change due to allowing for noise when using less than 100 vertices in GPUModel.cpp
NicolasJPosey
commented
Feb 15, 2026
stiber
approved these changes
Mar 12, 2026
NicolasJPosey
added a commit
that referenced
this pull request
Mar 13, 2026
* [issue-723] 911 edges GPU implementation (#867) * Use vector of char for consistency with neuro * Move setEdgeClassID method down into AllNeuroEdges since it's not used in 911 edges * Initial implementation of 911 edges for GPU * Make public explicitly * Fix clang format issue * Another try at a clang fix * Manually release array memory from stack on device copy This is to prevent a segmentation fault due to a stack overflow from the array declarations when running large graphs * Expand vertex type map reported in debug log * Resolve post merge changes * Update GPU results with updates from PR 877 * [issue-858] GPU implementation of 911 vertices (#880) * Add loadEpochInputs to OperationManager Adds support for two uint64_t argument functions so that loadEpochInputs can be registered and called from the OperationManager class. * Add vertices device struct to 911 class * Add total number of events data member to InputManager Added a data member for storing the total number of events that are read into the InputManager. This allows us to define vector capacities based on the number of events being simulated. * Initial CPU GPU architecture documentation * Refactor of loadEpochInputs to support loading inputs to GPU AllVertices now has a non-virtual loadEpochInputs method. This calls two virtual methods, one for loading the epoch inputs and other for copy the inputs to the GPU. The default behavior for both is to do nothing. * Refactor getEdgeToClosestResponder method to be a All911Vertices method instead of a connections method This method makes more sense to be a behavior of vertices as the behavior is also needed to be run on the GPU. * Forgot CPU code * Some GPU implementations but is incomplete * Remove reserve call since RecordableVector doesn't implement it * Refactor internal vector use in PSAP and RESP advance logic We need a dynamically sized array use we use a vector instead of an array. But we want the implement to be easily mirrored on the GPU so we interact with the vector like we would with an array. * Convert call metrics to EventBuffers and swap push back for insert event call The push back call was not easy to mirror on the GPU. We already have examples of using the EventBuffer and insertEvent call on the GPU so we change to use this implementation. This also allows us to make it clear what size the buffer should be, again helping the mirrored GPU implementation. * Replace numeric bool with actual bool for readability * Change vector type from RecordableVector to EventBuffer This allows us to remove the resize calls which we don't want to do on the GPU. Also added a DoubleEventBuffer to use in place of RecordableVector<double>. * Bug fix for copying spike histories from device The correct pattern is to first copy the device pointers to the CPU and then the values to the CPU data members. It happens to be the same that the number of bytes for a uint64_t and a uint64_t pointer are the same. However, if this pattern is repeated for a type like float, an illegal memory error is thrown. * Add a guard and debugging message for GPU random noise The GPU noise array only works for numVertices >= 100 and that are a multiple of 100. Otherwise, an invalid kernel configuration error is thrown which masks other possible errors. * Updates to support copying to and from GPU * Support for copying to and from GPU and make type float for now * Add GPU 911 vertices to make list * Implementation runs but results aren't quite right * Fix case sensative copying of call responder types * Fix bug using wrong size for queue length and utilization histories * Remove debugging printfs and replace asserts with printfs for errors Having asserts in kernels can cause them to fail silence. Using print statements and returning is a better way to fail inside kernels. * General cleanup Clean up of commented out code, unnecessary extra variables, and unused methods. * Free the array used to determine available servers and units in kernels. * Readd support for getting dropped calls The update in vertex creation to make each vertex have the same sized data member for the GPU made it so that we would never get a dropped call due to large queue sizes. The logic was changed were we interact with vertex queues in PSAPs and RESPs to act like the size is equal to the number of trunks which was the original implementation size. * Fix error if a dropped call is found after the first epoch RecordableVectors are cleared after each epoch if they are the dynamic type. The size is not reset. We need subscript operator access for droppedCalls so the type much be constant which does not clear the vector after each epoch. * Support for noise in 911 models The current implementation for generating noise on a device has some assumptions that break for the 911 GPU model. To get noise support for 911, we implemented a way to have vertices specific how many noise elements they will need. A method was then implemented in GPUModel that rounds the input up to the nearest multiple of 100. * Add assert for random number thread count * Add support for using noise to simulate attempted redials Because only caller regions simulate attempted redials, we add a vector to map the caller region vertex IDs to the noise array on the device. This allows us to use the existing noise algorithm with larger graphs since we can only generate noise for up to 10000 vertices. * Fix isFull error message to show right buffer size * Fix bug with waiting queue check If the number of trunks and servers is equal and the queue is full, capacity minus busy servers is negative. Since dstQueueSize is of type uint64_t, it can't be negative. The comparison then gives a false positive that the queue is not full. Fix is to cast the size to an int so that the right comparison is done. * Debugging statements for memory analysis * Add some larger 911 graphs * Updates to history to support less memory usage on GPU The call metrics account for the vast majority of the physical memory used by the GPU. By resizing each to a smaller value, we can fit larger graphs on the GPU by using more epochs with smaller steps per epoch. * Fix firing rate value Firing rate should actually be equal to 1 since we can have at most 1 call per second. * Fix issue using wrong buffer size The buffer size used for a CircularBuffer is 1 more than the capacity passed into the constructor. When we construct the buffer, we pass in the number of trunks but were effectively using 1 less during the simulation. * Fix getting front index when we want end index for queue length calculation * Add back in random redial attempt * More updates to reduce memory usage Metrics that used totalNumberOfEvents and totalTimeSteps were using more memory than needed. These were changed to maxEventsPerEpoch and stepsPerEpoch respectively. Also changed copyTo and copyFrom in All911Edges to use heap memory to prevent stack overflows with large graphs. * Fix bug with vertex queue size The buffer inside the CircularBuffer implementation is 1 larger than the capacity set at construction. VertexQueues are CircularBuffers so we add 1 where we use the buffer size. * Another CircularBuffer size bug fix Fixed allocation, copyTo, and copyFrom for VertexQueues. They are CircularBuffers which internally have a buffer that is 1 more than the capacity. The sizes used were updated to be 1 more than the stepsPerEpoch to match the construction capacity. * Fix firing rate and change epoch parameters to reduce memory Memory is mostly dependent on epoch duration so we decrease that parameter and increase the number of epochs parameter by the same factor. This keeps the total time steps constant but reduces memory usage. We can only have 1 call per step so the max firing rate should be 1. * Add an approximate state wide, month long configuration * General cleanup and adding of comments * GPU Optimizations Remove some branching and make changes to reduce amount of register usage. * Dataset updates * Timing adds, documentation, and updates * Add regression testing documentation markdown * Update after changing Abandoned and QueueLength history types * Add small 911 test to regression script * Add larger 911 test This corresponds to Dataset A in Posey capstone report. * Remove testing datasets * Remove temp timing changes * Correct how 2D arrays are copied from device to host * Add noise state logging for debugging * Noise is now generated and used for graphs with less than 100 vertices * Fix formatting * Try another clang fix * Try to fix clang in function node file * clang fix attempt * more clang * clang * Clean up and port some optimizations to CPU * clang formatting * Rename GPU documentation file * Remove trivial example and rewrite to clarify design of CPU implementation relative to GPU
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #858
Description
This PR introduces a GPU implementation of the NG911 model in connect with Nicolas Posey's master's capstone.
A small 911 test file was added to the GPU list in RunTest.sh file to allow for automated regression testing of this new implementation. Documentation has also been added to help with the process of implementing new mirrored GPU implementations from existing, domain-specific CPU implementations (see CpuGpuArchitecture.md). Lastly, documentation of the existing small 911 test file and a new medium 911 test file has been added to help start the process of documenting important configuration information of existing regression test files (see RegressionTestDocumentation.md).
Checklist (Mandatory for new features)
Testing (Mandatory for all changes)
test-medium-connected.xmlPassedtest-large-long.xmlPassed