Skip to content

Release v1.2#920

Merged
stiber merged 45 commits intomasterfrom
release-v1.2
Mar 13, 2026
Merged

Release v1.2#920
stiber merged 45 commits intomasterfrom
release-v1.2

Conversation

@stiber
Copy link
Contributor

@stiber stiber commented Mar 13, 2026

Closes #

Description

Checklist (Mandatory for new features)

  • Added Documentation
  • Added Unit Tests

Testing (Mandatory for all changes)

  • GPU Test: test-medium-connected.xml Passed
  • GPU Test: test-large-long.xml Passed

NicolasJPosey and others added 30 commits August 12, 2025 16:46
* Refractor neuro out of Layout.h

Layout.h contained neuro data members that should be moved down into the derived class. This was done and DeviceVectors were implemented for Layout911 vertex locations to replace the VectorMatrix type.

* Move distance matrix allocation into base class

* Update regression result for 911 test
…n be uncommented

- CMakeLists.txt added to .gitignore
- tests.yml: removed CUDA toolkit installation
- disabled GPU version
[ISSUE-890] Add Kyle Ricks to the Contributors.txt file.
Merge branch 'master' into SharedDevelopment

Incorporates documentation and action fixes.
…ontinue ONLY if tests are successfully built.

Now, GH actions would stop if a test fails to build or run.
…-from-running-if-a-test-fails

[Issue-894] Stop GitHub actions from running if a test fails
…e-regression-tests

Issue 848 replace simple regression tests
* Upgrade EventBuffer to be a templated class

This involved moving the EventBuffer.cpp information into the header file and updating previous usages to use uint64_t type which was the old default. Also cleaned up redudent overrides and added getters and setters to the EventBuffer so that it's friendship with neuro classes could be removed.

* Fix clang format issue

* Minor cleanup

* Add documentation for this pointer use

* Remove unnecessary comment
…tructions file since the current one does not seem to be used currently
…ing after review. Rewrote the copilot-instructions to be more concise to minimize context window usage.
…ion to docs/ to describe the function of the two .md files
kblricks and others added 15 commits February 17, 2026 23:26
…s for system prompts, and update documentation to match changes.
… better workflow that matches GitHub's reccomendations. Also updates doc file to match this. Included more verbose examples to describe how the file works
[ISSUE-812] Use an LLM to aid in generating basic unit tests for all classes
… runs an internal debug instead of code analysis
…mpt-file

[ISSUE-914] Rename debug command from /debug to /debug-code
* [issue-723] 911 edges GPU implementation (#867)

* Use vector of char for consistency with neuro

* Move setEdgeClassID method down into AllNeuroEdges since it's not used in 911 edges

* Initial implementation of 911 edges for GPU

* Make public explicitly

* Fix clang format issue

* Another try at a clang fix

* Manually release array memory from stack on device copy

This is to prevent a segmentation fault due to a stack overflow from the array declarations when running large graphs

* Expand vertex type map reported in debug log

* Resolve post merge changes

* Update GPU results with updates from PR 877

* [issue-858] GPU implementation of 911 vertices (#880)

* Add loadEpochInputs to OperationManager

Adds support for two uint64_t argument functions so that loadEpochInputs can be registered and called from the OperationManager class.

* Add vertices device struct to 911 class

* Add total number of events data member to InputManager

Added a data member for storing the total number of events that are read into the InputManager. This allows us to define vector capacities based on the number of events being simulated.

* Initial CPU GPU architecture documentation

* Refactor of loadEpochInputs to support loading inputs to GPU

AllVertices now has a non-virtual loadEpochInputs method. This calls two virtual methods, one for loading the epoch inputs and other for copy the inputs to the GPU. The default behavior for both is to do nothing.

* Refactor getEdgeToClosestResponder method to be a All911Vertices method instead of a connections method

This method makes more sense to be a behavior of vertices as the behavior is also needed to be run on the GPU.

* Forgot CPU code

* Some GPU implementations but is incomplete

* Remove reserve call since RecordableVector doesn't implement it

* Refactor internal vector use in PSAP and RESP advance logic

We need a dynamically sized array use we use a vector instead of an array. But we want the implement to be easily mirrored on the GPU so we interact with the vector like we would with an array.

* Convert call metrics to EventBuffers and swap push back for insert event call

The push back call was not easy to mirror on the GPU. We already have examples of using the EventBuffer and insertEvent call on the GPU so we change to use this implementation. This also allows us to make it clear what size the buffer should be, again helping the mirrored GPU implementation.

* Replace numeric bool with actual bool for readability

* Change vector type from RecordableVector to EventBuffer

This allows us to remove the resize calls which we don't want to do on the GPU. Also added a DoubleEventBuffer to use in place of RecordableVector<double>.

* Bug fix for copying spike histories from device

The correct pattern is to first copy the device pointers to the CPU and then the values to the CPU data members. It happens to be the same that the number of bytes for a uint64_t and a uint64_t pointer are the same. However, if this pattern is repeated for a type like float, an illegal memory error is thrown.

* Add a guard and debugging message for GPU random noise

The GPU noise array only works for numVertices >= 100 and that are a multiple of 100. Otherwise, an invalid kernel configuration error is thrown which masks other possible errors.

* Updates to support copying to and from GPU

* Support for copying to and from GPU and make type float for now

* Add GPU 911 vertices to make list

* Implementation runs but results aren't quite right

* Fix case sensative copying of call responder types

* Fix bug using wrong size for queue length and utilization histories

* Remove debugging printfs and replace asserts with printfs for errors

Having asserts in kernels can cause them to fail silence. Using print statements and returning is a better way to fail inside kernels.

* General cleanup

Clean up of commented out code, unnecessary extra variables, and unused methods.

* Free the array used to determine available servers and units in kernels.

* Readd support for getting dropped calls

The update in vertex creation to make each vertex have the same sized data member for the GPU made it so that we would never get a dropped call due to large queue sizes. The logic was changed were we interact with vertex queues in PSAPs and RESPs to act like the size is equal to the number of trunks which was the original implementation size.

* Fix error if a dropped call is found after the first epoch

RecordableVectors are cleared after each epoch if they are the dynamic type. The size is not reset. We need subscript operator access for droppedCalls so the type much be constant which does not clear the vector after each epoch.

* Support for noise in 911 models

The current implementation for generating noise on a device has some assumptions that break for the 911 GPU model. To get noise support for 911, we implemented a way to have vertices specific how many noise elements they will need. A method was then implemented in GPUModel that rounds the input up to the nearest multiple of 100.

* Add assert for random number thread count

* Add support for using noise to simulate attempted redials

Because only caller regions simulate attempted redials, we add a vector to map the caller region vertex IDs to the noise array on the device. This allows us to use the existing noise algorithm with larger graphs since we can only generate noise for up to 10000 vertices.

* Fix isFull error message to show right buffer size

* Fix bug with waiting queue check

If the number of trunks and servers is equal and the queue is full, capacity minus busy servers is negative. Since dstQueueSize is of type uint64_t, it can't be negative. The comparison then gives a false positive that the queue is not full. Fix is to cast the size to an int so that the right comparison is done.

* Debugging statements for memory analysis

* Add some larger 911 graphs

* Updates to history to support less memory usage on GPU

The call metrics account for the vast majority of the physical memory used by the GPU. By resizing each to a smaller value, we can fit larger graphs on the GPU by using more epochs with smaller steps per epoch.

* Fix firing rate value

Firing rate should actually be equal to 1 since we can have at most 1 call per second.

* Fix issue using wrong buffer size

The buffer size used for a CircularBuffer is 1 more than the capacity passed into the constructor. When we construct the buffer, we pass in the number of trunks but were effectively using 1 less during the simulation.

* Fix getting front index when we want end index for queue length calculation

* Add back in random redial attempt

* More updates to reduce memory usage

Metrics that used totalNumberOfEvents and totalTimeSteps were using more memory than needed. These were changed to maxEventsPerEpoch and stepsPerEpoch respectively. Also changed copyTo and copyFrom in All911Edges to use heap memory to prevent stack overflows with large graphs.

* Fix bug with vertex queue size

The buffer inside the CircularBuffer implementation is 1 larger than the capacity set at construction. VertexQueues are CircularBuffers so we add 1 where we use the buffer size.

* Another CircularBuffer size bug fix

Fixed allocation, copyTo, and copyFrom for VertexQueues. They are CircularBuffers which internally have a buffer that is 1 more than the capacity. The sizes used were updated to be 1 more than the stepsPerEpoch to match the construction capacity.

* Fix firing rate and change epoch parameters to reduce memory

Memory is mostly dependent on epoch duration so we decrease that parameter and increase the number of epochs parameter by the same factor. This keeps the total time steps constant but reduces memory usage. We can only have 1 call per step so the max firing rate should be 1.

* Add an approximate state wide, month long configuration

* General cleanup and adding of comments

* GPU Optimizations

Remove some branching and make changes to reduce amount of register usage.

* Dataset updates

* Timing adds, documentation, and updates

* Add regression testing documentation markdown

* Update after changing Abandoned and QueueLength history types

* Add small 911 test to regression script

* Add larger 911 test

This corresponds to Dataset A in Posey capstone report.

* Remove testing datasets

* Remove temp timing changes

* Correct how 2D arrays are copied from device to host

* Add noise state logging for debugging

* Noise is now generated and used for graphs with less than 100 vertices

* Fix formatting

* Try another clang fix

* Try to fix clang in function node file

* clang fix attempt

* more clang

* clang

* Clean up and port some optimizations to CPU

* clang formatting

* Rename GPU documentation file

* Remove trivial example and rewrite to clarify design of CPU implementation relative to GPU
Release 1.2

Release for spring 2026. Includes GPU implementation for 911 simulation, CoPilot customizations, etc.
@stiber stiber closed this Mar 13, 2026
@stiber stiber reopened this Mar 13, 2026
@stiber stiber merged commit b96e96c into master Mar 13, 2026
3 of 4 checks passed
@stiber stiber deleted the release-v1.2 branch March 13, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants