Skip to content

MPI Repay Configuration Files

Kevin A. Brown edited this page May 7, 2024 · 8 revisions

Workload ID and Allocation (Nodes-Rank Mapping) Configuration

The following two files are used to specify the SWM (application) and synthetic jobs and the allocation of nodes to each jobs.

./model-net-mpi-replay --synch=1 --workload_type=online
    --workload_conf_file=./conf/milc36+lammps32+ur4.load \
    --alloc_file=/home=./conf/milc36+lammps32+ur4.conf \
    ...

Notes:

  • The entry on the first line is for application with ID 0, second line is for application with ID 1, and so on.
  • All workload configuration files (workload ID config file, allocation file, period file, etc.) use the same job ID format. That is, the first line in all of these files will refer to the same job.

The Workload ID

Runtime flag: --workload_conf_file=./conf/milc36+lammps32+ur4.load

Configuration file: Each line represents the configuration for a different job. Line format:

<number_of_nodes> <application_name> <default_qos_level> <injection_rate (only used for synthetic workload)>

[kevin@mbp.m2: ~/…/kronos/experiments/ ]$ cat milc36+lammps32+ur4.load
36 milc 0 0.0000
32 lammps 0 0.00
4 synthetic1 0 726.609003

The Allocation

Runtime flag: --alloc_file=/home=./conf/milc36+lammps32+ur4.conf

Configuration file: Each line represents the configuration for a different job. Line format:

<node_id_for_rank0> [<node_id_for_rank1> [<node_id_for_rank0> [...]]]

[kevin@mbp.m2: ~/…/kronos/experiments/ ]$ cat milc36+lammps32+ur4.conf
3 8 70 61 40 30 34 20 32 51 39 29 15 46 47 68 65 43 37 63 38 0 26 62 50 19 44 41 35 71 5 1 12 66 17 4
21 64 2 22 42 60 16 48 55 24 33 52 45 54 67 11 25 56 14 23 57 58 18 31 49 9 53 7 28 69 10 27
59 6 36 13

Synthetic Workload Injection Period File

For a given synthetic workload, specifies when the workload should change it's injection load and the new injection load.

Usage

model-net-mpi-replay --synch=1 ... --workload_period_file=/path/to/file.period ...

There must be a line in this period file to correspond to each line in the workload configuration file. That is, the job listed in line X of the workload configuration file will have it's period entries listed on line X of the period file. Injection periods set for non-synthetic workloads are ignored.

File Format

<count_of_entries_on_line> [<entry0_timestamp>:<entry0_injection_load> [<entry0_timestamp>:<entry0_injection_load>]]

Example 1: Two synthetic jobs, one load change entry per job

# workload conf file - last value an each line indicates the initial injection load of the job
$ cat ./workloads/alloci02.tpr/rand8320.load
4160 synthetic1 1 47.6837
1760 synthetic1 3 119.2093

# period file
$ cat ./workloads/alloci02.tpr/rand8320.period
1 400000:28.0492
1 600000:39.7364

Example 2: Two synthetic jobs, one load change entry per job

# workload conf file - no injection load is specified for the all_reduce256 SWM workload
$ cat ./workloads/alloct02.tpr/rand8320.load
256 all_reduce256 1 0.0000
832 synthetic1 0 47.6837

# period file - no load change can be applied to the SWM workload in the first line. Two changes are made for the synthetic workload
$ cat ./workloads/alloct02.tpr/rand8320_2.period
0
2 150000:119 250000:23.9

Example 3: multiple SWM and synthetic jobs

# workload conf file
$ cat ./workloads/alloci02.tpr/rand8320.load
32 allreduce32 0 0.0000
32 allreduce32 0 0.0000
256 allreduce256 0 0.0000
256 allreduce256 0 0.0000
64 synthetic1 0 794.7286
4160 synthetic1 1 47.6837
1760 synthetic1 2 29.8023
1760 synthetic1 3 119.2093

# period file
$ cat ./workloads/alloci02.tpr/rand8320.period
0
0
0
0
0
1 400000:28.0492
1 200000:119.2093
1 600000:39.7364

Clone this wiki locally