Skip to content

tactcomplabs/sst-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

277 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sst-bench

Getting Started

The sst-bench infrastructure contains a set of specifically crafted SST components, subcomponents, links and associated APIs designed to exercise specific aspects of the Structural Simulation Toolkit. The current set of benchmarks includes:

  • msg-perf : Tests the performance of sending incrementally sized messages over simpleNetwork links in a ring-network pattern.
  • micro-comp : Tests the performance of loading a large number of small components using the various loading methodologies
  • micro-comp-link : Similar to micro-comp but adds in link configuration with a single link per component that utilizes SimpleNetwork. Tests the loading of large numbers of components with the default SST partitioner.
  • chkpnt : Tests the performance of the checkpoint/restart functionality present in SST-14.0.0+. Note that this test is currently only built when SST 14.0.0 is detected and can be only run in sequential execution mode. Threading and MPI are not currently supported for checkpoint/restart in SST 14.0.0.
  • restore : Tests the storing and loading of well-defined simulation component data using the SST 14.0.0 checkpoint/restart functionality
  • restart : Sanity checks data after a context restore operation using the SST 14.0.0 checkpoint/restart functionality
  • large-stat : Tests the creation of instantiation of a variable number of unsigned 64bit statistics for very simple components. Designed to test large blocks of statistics values for simple components.
  • grid : Generates a configurable 2 dimensional grid network with configurable component and data transfer parameters. Compile options are provided for testing different container data types for evaluating checkpointing performance.
  • noodle : Generates randomly connected components using a configurable number of ports per component and randomly sends a configurable number of message payloads per cycle.
  • spaghetti : Generates randomly connected components using a configurable number of ports per component and randomly sends messages to adjacent components. Similar to noodle, but utilizes event handlers only, none of the components are clocked.
  • hpe-phold : Port of PHOLD benchmark from https://github.com/hpc-ai-adv-dev/sst-benchmarks based on Fujimoto's 1990 paper Performance of Time Warp Under Synthetic Workloads.

Prerequisites

Given that this is an SST external component, the primary prerequisite is a current installation of the SST Core. Some microbenchmarks use components from SST Elements so it is recommended to install that as well. These test case are labeled 'elements' so they can easily be excluded from testing. The sst-bench building infrastructure assumes that the sst-config tool is installed and can be found in the current PATH environment.

sst-bench relies upon CMake for building the component source. The minimum required version for this is 3.19

Building

Building the sst-bench infrastructure from source can be performed using the following steps:

git clone https://github.com/tactcomplabs/sst-bench.git
cd sst-bench
mkdir build
cd build
cmake ../
make && make install

Additional build options include:

  • make uninstall : forcible uninstalls the included components/subcomponents from the current version of SST
  • cmake -DSSTBENCH_ENABLE_TESTING=ON ../ : Enables included test harness: run with make test

Testing

Utilize the included test harness to test and ensure all tests are passing before opening new pull requests. The test harness can be enabled when you run the CMake configuration step as follows:

cmake -DSSTBENCH_ENABLE_TESTING=ON ../
make -j
make install
make test

If SST Elements is not installed, the dependent tests can be excluded using:

ctest -LE elements

A special set of long tests that may create extremely large files and can be excluded using:

ctest -LE large

Currently, the checkpoint tests may generate a large number of files. To clean up after running tests use

cd ..
git clean -f -d

Special Runtime Notes

Benchmark Scale

Be mindful of the simulation input size when scaling tests near the limits of physical memory or compute capacity. Several benchmarks exhibit exponential memory growth.

Detailed Benchmark Descriptions

msg-perf

msg-perf is a component infrastructure designed to test the available link bandwidth relative to the hardware on which the simulation is executing. This component allows users to specify a base payload size (startSize) and a maximum payload size (endSize) as well as a step size (stepSize) that is used to increment the size of the payload for each new phase of the simulation. For each phase of the simulation, the payload is sent to the target endpoint (specified by a SimpleNetwork infrastructure) a fixed number of times (iters). This ensures that the timing recorded per send is normalized across inteconnect or cache warmup timing. Additionally, users can specify a delay between phases in order to avoid subsequent phased payloads interacting with one another and potentially poisoning the data.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
clock Clock frequency for the cpu UnitAlgebra 1GHz
startSize Starting sizxe of the payload (bytes) UnitAlgebra 8B
endSize Ending size of the payload (bytes) UnitAlgebra 16B
stepSize Step size of the payload (bytes) UnitAlgebra 1B
iters Number of iterations per step Integer 1
clockDelay Clock ticks between sends Integer 100

Ports

Port Name Description Library
none

Statistics

Stat Name Description Values
BitsSent Number of bits sent count
ByteSize Byte size of the payload size
SentClock Sent clock cycle cycle
RecvClock Receive clock cycle cycle

Subcomponent Slots

Slot Name Description Library
nic Network interface SST::MsgPerf::MsgPerfNIC

MsgPerfNIC Parameters

Parameter Description Values Default
clock Clock frequency of the NIC UnitAlgebra 1GHz
port Port to use if loaded as an anonymous subcomponent String network
verbose Verbosity for output (0 = nothing) Integer 0

MsgPerfNIC Ports

Port Name Description Library
network Port to network simpleNetworkExample.nicEvent

MsgPerfNIC Subcomponent Slots

Slot Name Description Library
iface SimpleNetwork interface to a network SST::Interfaces::SimpleNetwork

micro-comp

micro-comp is designed to represent the smallest possible clocked component model. There are no subcomponents, ports or unnecessary variables required for serialization in this component. The goal of the micro-comp component is to provide a baseline to experiement with model loading performance and memory footprint under strictly controlled conditions. Note that executing micro-comp simulations will only execute for a single clock cycle. The only events generated will be the singular clock event per component.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0

Ports

Port Name Description Library
none

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
none

micro-comp-link

micro-comp-link borrows the core component functionality from micro-comp. However, this version adds a subcomponent slot that contains a network interface controller (NIC) based upon the existing sst-elements SimpleNetwork. This allows us to construct arbitrarily complex network topologies withe micro-comp-style endpoints. The goal with this component configuration is similar to micro-comp, but it does allow us to 1) experiment with sample topology configurations that are initialized during the init phase and 2) experiment with model loading/partitioning in a strictly controlled environment. The micro-comp-link simulation component only exists for a single clock cycle and is not currently checkpointable.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
clock Sets the clock frequency UnitAlgebra 1GHz

Ports

Port Name Description Library
none

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
nic Network Interface SST::MicroCompLink::MicroCompLinkNIC

Subcomponent Parameters

Parameter Description Values Default
clock Sets the NIC clock frequency UnitALgebra 1GHz
port Port to use if loaded anonymously simpleNetworkExample.nicEvent network
verbose Sets the verbosity level Integer 0

Subcomponent Ports

Port Name Description Library
iface SimpleNetwork interface to a network SST::Interfaces::SimpleNetwork

chkpnt

The chkpnt component is designed to provide a known baseline for testing checkpoint performance on SST 15.0+. The component contains a set of serialized data elements, port configurations and propogating events that exercise the main function of the base SST serialization and checkpoint features. The chkpnt component is initialized with a number of external facing ports, all of which need to be connected to adjacent components. These ports do not rely upon any existing sst-element components. The component executes for a fixed number of clock cycles and initiates event data sends to all connected ports using a user-defined clockDelay. On each sending cycle, the component chooses a random number of 64 bit values between minData and maxData and seeds these integers with random data. The entire payload is then sent across the link. Each payload for each port on each sending cycle will be different, thus exercising a large degree of randomness in serializing outstanding events. The component uses a known seed as input from the user, so the component can be executed with the same set of known values for reproducibility.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
numPorts Sets the number of external ports Integer 1
minData Minimum number of unsigned values Integer 1
maxData Maximum number of unsigned values Integer 2
clockDelay Clock delay between sends Integer 1
clocks Clock cycles to execute Integer 1000
rngSeed Mersenne RNG Seed Integer 1223
clockFreq Sets the clock frequency UnitAlgebra 1GHz

Ports

Port Name Description Library
port%(num_ports)d Ports which connect to endpoints chkpnt.ChkpntEvent

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
none

restore

The restore component is designed to exercise the checkpoint + restart functionality in the SST core.
The component supports serialization of internal data structures in 4 byte increments as defined by the numBytes parameter. The user executes the simulation with a pre-defined number of total clock cycles (clocks) and incrementally checkpoints the component. The component can then be restarted and the internal values can be verified as being correct. The component utilizes a predefined random number seed such that execution is reproducible across simuations. The goal of this component is to test the restore performance using a static number of internal bytes stored in a checkpoint payload.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
numBytes Sets the number of stored bytes (4 byte increments) Unit Algebra 64KB
clocks Clock cycles to execute Integer 1000
rngSeed Mersenne RNG Seed Integer 1223
clockFreq Sets the clock frequency UnitAlgebra 1GHz

Ports

Port Name Description Library
none

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
none

restart

The restart component is very similar to the restore component. The user specifies the number of bytes to store in an internal data structure that are seeded using a known (baseSeed) random number seed. However, for each clock cycle, the restart component verifies that the internal data structure contains the correct values element by element. This ensures that the data is restored is correct regardless of the checkpoint/restart timing.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
numBytes Sets the number of stored bytes (4 byte increments) Unit Algebra 64KB
clocks Clock cycles to execute Integer 1000
baseSeed Base Mersenne RNG Seed Integer 1223
clockFreq Sets the clock frequency UnitAlgebra 1GHz

Ports

Port Name Description Library
none

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
none

large-stat

The large-stat component is designed to examine the memory overhead of creating very large sets of components. The component is designed to execute for a single clock cycle with no interchanging events.
Upon startup, the component creates a user-defined number of unsigned 64 bit statistics in the form: STAT_n where n is a monotonically increasing integer. Users should execute this component with SST verbosity enabled and/or profiling in order to trace the amount of virtual memory utilized.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
numStats Sets the number of stats to create Integer 1

Ports

Port Name Description Library
none

Statistics

Stat Name Description Values
STAT_ Basic stat handler count

Subcomponent Slots

Slot Name Description Library
none

grid

The grid component is designed to facilitate the construction of basic mesh networks without dependencies on outside components or subcomponents. grid allows you to tune the number of bytes, number of external ports (topology) and the delay timing of message injection across the links. grid was primarly utilized to test serialization of non-linear topologies but has since been updated to include embedded demonstration material for the interative debugger infrastructure.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
numBytes Internal state size (4 byte increments) Integer 16384
numPorts Sets the number of external ports Integer 8
minData Minimum number of unsigned values over link Integer 10
maxData Maximum number of unsigned values over link Integer 8192
minDelay Minumum clock delay between sends Integer 50
maxDelay Maximum clock delay between sends Integer 100
clocks Sets the number of clocks to execute Integer 1000
clockFreq Sets the clock frequency UnitAlgebra 1GHz
rngSeed Sets the RNG seed Integer 1223
demoBug Induce a bug for debug demo Bool 0

Ports

Port Name Description Library
ports%(num_ports)d Ports which to connect to endpoints. chkpnt.GridNodeEvent

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
none

noodle

The noodle component is very different than other components. This component is designed to find distinct slow code paths and load imbalance issues when executing simulations with various degrees of ranks, threads and combinations therein. noodle is a relatively simple component, but can be scaled to very large simulations. It is designed to be executed in parallel (threads, ranks, etc) with a large number of ports per component (numPorts). The ports can be randomly connected such that the default partitioner cannot easily assign components to threads or ranks. On each clock cycle of simulation, the component issues msgsPerClock event messages across portsPerClock ports with a payload size of bytesPerClock. The payloads are randomly generated using the rngSeed. In this way, noodle issues randomly generated messages across a large number of potential endpoints. Examining the performance of the SST sync manager as well as other latency sensitive core operations is the goal of noodle. Additional asynchrony can be induced by using the randClockRange to randomly assign the core clock frequencies of the host components.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
clockFreq Sets the clock frequency UnitAlgebra 1GHz
numPorts Sets the number of ports Integer 2
msgsPerClock Sets the number of messages sent per clock Integer 8
bytesPerClock Sets the number of bytes per clock Integer 8
portsPerClock Sets the number of ports to send over per clock Integer 1
clocks Sets the number of clocks to execute Integer 10000
rngSeed Sets the RNG seed Integer 31337
randClockRange Overrides clockFreq and sets randomly chosen frequency in the target range (in GHz) String 1-2

Ports

Port Name Description Library
ports%(num_ports)d Ports which to connect to endpoints. noodle.NoodleEvent

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
none

spaghetti

The spaghetti component is similar in design to noodle in that it is constructed to find corner case behavior in scalable simulations. It is designed to be executed in parallel with a large number of ports configured per component. However, unlike noodle, spaghetti is entirely event driven. There are no clocked components. When the component is initialized, it builds a set of numMsgs for each of numPorts with the size bytesPerMsg. The component then issues the messages into the event queue using a randomized delay timing. In this way, the events are injected in order, but delivered to their endpoints randomly. The spaghetti component will further induce interesting behavior when scaled to large component counts and ports per component.

Parameters

Parameter Description Values Default
verbose Sets the verbosity level Integer 0
numPorts Sets the number of ports Integer 2
numMsgs Sets the number of messages sent per port to inject Integer 10
bytesPerMsg Sets the number of bytes per msg Integer 64
rngSeed Sets the RNG seed Integer 31337

Ports

Port Name Description Library
ports%(num_ports)d Ports which to connect to endpoints. spaghetti.SpaghettiEvent

Statistics

Stat Name Description Values
LATENCY_PORT_ Histogram of latency values latency

Subcomponent Slots

Slot Name Description Library
none

hpe-phold

hpe-phold is a port of PHOLD benchmark from https://github.com/hpc-ai-adv-dev/sst-benchmarks based on Fujimoto's 1990 paper Performance of Time Warp Under Synthetic Workloads. The goal of hpe-phold is to explicitly induce particular simulation states that do not function with in classic parallel discrete even simulation frameworks. We highly recommend reading the publications referenced for a better overview of the hpe-phold infrastructure.

Base Node Parameters

Parameter Description Values Default
numRings number of rings to connect to Integer 1
i My row index Integer -1
j My column index Integer -1
rowCount Total number of rows Integer -1
colCount Total number of columns Integer -1
timeToRun Time to run the simulation UnitAlgebra 10ns
eventDensity Number of events to start with per component Float 0.1
smallPayload Size of small event payloads in bytes Integer 8
largePayload Size of large event payloads in bytes Integer 1024
largeEventFraction Fraction of events that are large (default: 0.1) Float 0.1
verbose Whether or not to write the recvCount to file. Integer 0
componentSize Additional size of components in bytes Integer 0

Base Node Ports

Port Name Description Library
ports%d Ports which to connect to endpoints.

Exponential Node Parameters

Parameter Description Values Default
multiplier Multiplier for exponential distribution, in ns Integer 1

Exponential Node Parameters

Parameter Description Values Default
min Minimum value for uniform distribution, in ns, in addition to link delay Integer 0
max Maximum value for uniform distribution, in ns, in addition to link delay Integer 0

Statistics

Stat Name Description Values
none

Subcomponent Slots

Slot Name Description Library
none

Parameter Sweep Automation

A structured methodology to define, manage, and analyze parameter sweep simulations is provided along with sample scripts. These support running simulations locally on a development system or through the slurm batch management system. Example charts generated using this system are shown below. Refer to the documentation for more information.

parameter sweep examples

Contributing

We welcome outside contributions from corporate, academic and individual developers. However, there are a number of fundamental ground rules that you must adhere to in order to participate. These rules are outlined as follows:

  • By contributing to this code, one must agree to the licensing described in the top-level LICENSE file.
  • All code must adhere to the existing C++ coding style. While we are somewhat flexible in basic style, you will adhere to what is currently in place. This includes camel case C++ methods and inline comments. Uncommented, complicated algorithmic constructs will be rejected.
  • We support compilaton and adherence to C++ standard methods. All new methods and variables contained within public, private and protected class methods must be commented using the existing Doxygen-style formatting. All new classes must also include Doxygen blocks in the new header files. Any pull requests that lack these features will be rejected.
  • All changes to functionality and the API infrastructure must be accompanied by complementary tests All external pull requests must target the devel branch. No external pull requests will be accepted to the master branch.
  • All external pull requests must contain sufficient documentation in the pull request comments in order to be accepted.

License

See the LICENSE file

Authors

  • John Leidel
  • Ken Griesser
  • Shannon Kuntz

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors