Single GPU benchmark scripts by ChSonnabend · Pull Request #15514 · AliceO2Group/AliceO2

ChSonnabend · 2026-06-12T07:28:27Z

This PR brings two scripts that benchmark the single GPU performance

gen_single_gpu_rtc_benchmark.sh generates the workflow from dpl-workflow.sh by setting environment variables and using early stops to avoid processing failures
analyze_gpu_benchmarks.py then analyzes the resulting log file for processing times, records then, histograms them and fits a gaussian to the result to determine the mean processing time per timeslice

…ithout accesibility to numactl

davidrohr

Didn't check anything in detail, but the things that immediately came to my mind

davidrohr · 2026-06-12T07:51:22Z

+
+# ROCm library injection is only useful for HIP runs. Keep it off by default for CUDA/NVIDIA containers,
+# because mixed AMD/NVIDIA hosts can otherwise leak ROCm libraries into LD_LIBRARY_PATH.
+if [[ "${GPUTYPE:-}" == "HIP" && "0$BENCH_AUTO_ROCM_LIBS" == "01" ]]; then


With new bash you can just use $BENCH_AUTO_ROCM_LIBS == 1

davidrohr · 2026-06-12T07:52:22Z

+
+export DPL_REPORT_PROCESSING="${DPL_REPORT_PROCESSING:-1}"
+
+export FST_TMUX_NO_EPN="${FST_TMUX_NO_EPN:-1}"


not needed, since start_tmux.sh is not used

davidrohr · 2026-06-12T07:52:39Z

+# ----------------------------------------------------------------------------------------------------------------------
+# Locate original workflow script. Keep the original untouched.
+
+: "${GEN_TOPO_MYDIR:=$(dirname "$(realpath "$0")")}"


Why don't you simple use $O2_ROOT/dpl-workflow.sh?

davidrohr · 2026-06-12T07:53:17Z

+export WORKFLOW_PARAMETERS="${WORKFLOW_PARAMETERS:-GPU,CTF}"
+export GPUTYPE="${GPUTYPE:-CUDA}"
+export NGPUS=1
+export NUMAGPUIDS=1


NUMAGPUIDS and NUMAID should not be set, if not using NUMA pinning

davidrohr · 2026-06-12T07:54:47Z

+export EPNSYNCMODE="${EPNSYNCMODE:-0}"
+export SYNCMODE="${SYNCMODE:-1}"
+export SYNCRAWMODE="${SYNCRAWMODE:-0}"
+
+export TIMEFRAME_RATE_LIMIT="${TIMEFRAME_RATE_LIMIT:-5}"
+export GEN_TOPO_NO_TF_RATE_UPSCALING="${GEN_TOPO_NO_TF_RATE_UPSCALING:-1}"
+
+export DISABLE_ROOT_OUTPUT="${DISABLE_ROOT_OUTPUT:-1}"
+
+# Double pipeline requires zsraw input. Therefore default to raw TF input, not CTF.
+export CTFINPUT="${CTFINPUT:-0}"
+export RAWTFINPUT="${RAWTFINPUT:-1}"
+export DIGITINPUT="${DIGITINPUT:-0}"
+export EXTINPUT="${EXTINPUT:-0}"


Why do you redefine all the defaults that come from setenv.sh?
I would only set those settings, which you need.
That should be
SYNCMODE=1
TIMEFRAME_RATE_LIMIT=5
RAWTFINPUT=1

davidrohr · 2026-06-12T07:55:45Z

+  source "$PWD/local_env.sh"
+fi
+
+export ALICE_O2_FST="${ALICE_O2_FST:-1}"


This is a hack for running on MI100, I would not put it in this script

davidrohr · 2026-06-12T07:56:00Z

+
+export ALICE_O2_FST="${ALICE_O2_FST:-1}"
+
+if [[ -f "$GEN_TOPO_MYDIR/setenv.sh" ]]; then


dpl-workflow.sh will source setenv.sh, why do you source it here?

davidrohr · 2026-06-12T07:56:33Z

+# Let O2/core dumps land in the benchmark run directory, not in the original working directory.
+export CORE_DUMP_DIR="${CORE_DUMP_DIR:-$RUNDIR}"
+export O2_CORE_DUMP_DIR="${O2_CORE_DUMP_DIR:-$RUNDIR}"
+export FAIRMQ_SHM_MONITOR_CONFIG="${FAIRMQ_SHM_MONITOR_CONFIG:-}"


We do not run the SHM MONITOR, why do you need this?

alibuild · 2026-06-12T19:08:09Z

Error while checking build/O2/fullCI_slc9 for 6353bf6 at 2026-06-12 21:08:

## sw/BUILD/O2-full-system-test-latest/log
command /sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local1/prodtests/full-system-test/dpl-workflow.sh had nonzero exit code 1

Full log here.

davidrohr · 2026-06-13T07:24:25Z

 (has_detector_reco ITS && ! has_detector_gpu ITS) && ! has_detector_from_global_reader ITS && add_W o2-its-reco-workflow "$ITS_CONFIG $ITS_STAGGERED $DISABLE_MC ${DISABLE_DIGIT_CLUSTER_INPUT:-} $DISABLE_ROOT_OUTPUT --pipeline $(get_N its-tracker ITS REST 1 ITSTRK),$(get_N its-clusterer ITS REST 1 ITSCL)" "$ITS_CONFIG_KEY;$ITSMFT_STROBES;$ITSEXTRAERR"
 [[ ${DISABLE_DIGIT_CLUSTER_INPUT:-} =~ "--digits-from-upstream" ]]  && has_detector_gpu ITS && ! has_detector_from_global_reader ITS && add_W o2-its-reco-workflow "--disable-tracking ${DISABLE_DIGIT_CLUSTER_INPUT:-} $ITS_STAGGERED $DISABLE_MC $DISABLE_ROOT_OUTPUT --pipeline $(get_N its-clusterer ITS REST 1 ITSCL)" "$ITS_CONFIG_KEY;$ITSMFT_STROBES;$ITSEXTRAERR"
-(has_detector_reco TPC || has_detector_ctf TPC) && ! has_detector_from_global_reader TPC && add_W o2-gpu-reco-workflow "--gpu-reconstruction \"$GPU_CONFIG_SELF\" --input-type=$GPU_INPUT $DISABLE_MC --output-type $GPU_OUTPUT $([[ $TPC_CORR_OPT == *--disable-ctp-lumi-request* ]] && echo --disable-ctp-lumi-request) $ITS_STAGGERED --pipeline gpu-reconstruction:${N_TPCTRK:-1},gpu-reconstruction-prepare:${N_TPCTRK:-1} $GPU_CONFIG" "GPU_global.deviceType=$GPUTYPE;GPU_proc.debugLevel=0;$GPU_CONFIG_KEY;$TRACKTUNETPCINNER;"
+(has_detector_reco TPC || has_detector_ctf TPC) && ! has_detector_from_global_reader TPC && add_W o2-gpu-reco-workflow "--gpu-reconstruction \"$GPU_CONFIG_SELF\" $MSLOG --input-type=$GPU_INPUT $DISABLE_MC --output-type $GPU_OUTPUT $([[ $TPC_CORR_OPT == *--disable-ctp-lumi-request* ]] && echo --disable-ctp-lumi-request) $ITS_STAGGERED --pipeline gpu-reconstruction:${N_TPCTRK:-1},gpu-reconstruction-prepare:${N_TPCTRK:-1} $GPU_CONFIG" "GPU_global.deviceType=$GPUTYPE;GPU_proc.debugLevel=0;$GPU_CONFIG_KEY;$TRACKTUNETPCINNER;"


Instead of modifying dpl-workflow.sh, you can just set

ARGS_EXTRA_PROCESS_o2_gpu_reco_workflow="--log-timestamp-us"

in your benchmark script.

davidrohr · 2026-06-13T07:25:40Z

+
+export DPL_REPORT_PROCESSING="${DPL_REPORT_PROCESSING:-1}"
+export WORKFLOW_PARAMETERS="${WORKFLOW_PARAMETERS:-GPU,CTF}"
+export GPUTYPE="${GPUTYPE:-CUDA}"


Perhaps I would not set CUDA here, but would request the user to set it, since the script is supposed to work equally for CUDA and for HIP. Just to avoid user error, if the user doesn't provide it.

davidrohr · 2026-06-13T07:28:03Z

+export O2_GPU_DOUBLE_PIPELINE="${O2_GPU_DOUBLE_PIPELINE:-1}"
+export O2_GPU_RTC="${O2_GPU_RTC:-1}"
+export SYNCMODE="${SYNCMODE:-1}"
+export DISABLE_ROOT_OUTPUT="${DISABLE_ROOT_OUTPUT:-1}"


DISABLE_ROOT_OUTPUT is alrady enabled by default.
So you can remove it here.
(And btw, for this setting to correct should be DISABLE_ROOT_OUTPUT="--disable-root-output")

davidrohr · 2026-06-13T07:30:29Z

+
+export RUN_BENCHMARK="${RUN_BENCHMARK:-0}"
+
+echo "# Alien/JAliEn environment check:"


I still don't understand why we need this alien token magic. If alien-token-info finds the token before running this script, that should be all that is needed?

davidrohr

Looks mostly good now. I have only 2 additional comments.

And I want to run it as test and compute the throughput manually, and compare with what the scripts measures, as validation. Or have you already done that?

davidrohr · 2026-06-16T09:30:11Z

+# Benchmark defaults. All can be overridden by exporting variables before calling this script.
+
+case "${GPUTYPE:-}" in
+  CUDA|HIP)


Why don't you allow OpenCL or CPU? If someone wants to measure that for comparison? Youl could just check if GPUTYPE is set?

I will include OpenCL and CPU in the options, but want an early failure in case the user specifies an incorrect type (e.g. a typo)

OK, but then you have to check for the string "OCL" :)

davidrohr · 2026-06-16T09:30:48Z

+trap cleanup_rundir EXIT
+
+# Let O2/core dumps land in the benchmark run directory, not in the original working directory.
+export CORE_DUMP_DIR="${CORE_DUMP_DIR:-$RUNDIR}"


Are you sure we need both variables?

Aren't they sensible in case we core dump to debug stuff?

I means mostly, why do we need 2 variables? But then, searching O2 and O2DPG for CORE_DUMP_DIR, I do not find anything. So by whom is this variable interpreted?

alibuild · 2026-06-16T14:00:58Z

Error while checking build/O2/fullCI_slc9 for 4cb5d7c at 2026-06-16 16:00:

--namespaces      Record namespaces events
        --no-bpf-event    do not record bpf events
        --no-buffering    collect data without buffering
        --num-thread-synthesize <n>
                          number of threads to run for event synthesis
        --off-cpu         Enable off-cpu analysis
        --overwrite       use overwrite mode
        --per-thread      use per-thread mmaps
        --phys-data       Record the sample physical addresses
        --proc-map-timeout <n>
                          per thread proc mmap processing timeout in ms
        --running-time    Record running/enabled time of read (:S) events
        --sample-cpu      Record the sample cpu
        --sample-identifier
                          Record the sample identifier
        --setup-filter <pin|unpin>
                          BPF filter action
        --strict-freq     Fail if the specified frequency can't be used
        --switch-events   Record context switch events
        --switch-max-files <n>
                          Limit number of switch output generated files
        --switch-output[=<signal or size[BKMG] or time[smhd]>]
                          Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold
        --switch-output-event <switch output event>
                          switch output event selector. use 'perf list' to list available events
        --synth <no|all|task|mmap|cgroup>
                          Fine-tune event synthesis: default=all
        --tail-synthesize
                          synthesize non-sample events at the end of output
        --threads[=<spec>]
                          write collected trace data into several data files using parallel threads
        --timestamp-boundary
                          Record timestamp boundary (time of first/last samples)
        --timestamp-filename
                          append timestamp to output filename
        --transaction     sample transaction flags (special events only)
        --user-callchains
                          collect user callchains
        --user-regs[=<any register>]
                          sample selected machine registers on interrupt, use '--user-regs=?' to list register names
        --vmlinux <file>  vmlinux pathname

++ FST_RC=129
++ mkdir -p /sw/BUILD/cce9efd9cc56df0ba2ec5b2d290afc550ed1b075/O2-full-system-test/../artifacts
++ find /sw/BUILD/cce9efd9cc56df0ba2ec5b2d290afc550ed1b075/O2-full-system-test/full-system-test-sim -name '*.log' -exec cp '{}' /sw/BUILD/cce9efd9cc56df0ba2ec5b2d290afc550ed1b075/O2-full-system-test/../artifacts/ ';'
++ [[ -n 1 ]]
++ [[ -f perf.data ]]
++ [[ 129 -ne 0 ]]
++ rm -Rf /sw/BUILD/cce9efd9cc56df0ba2ec5b2d290afc550ed1b075/O2-full-system-test/full-system-test-sim
++ exit 129

Full log here.

davidrohr · 2026-06-17T06:32:16Z

Btw, why is the script named "gen_single_gpu_rtc_benchmark.sh"? What does "gen" stand for?

davidrohr · 2026-06-17T06:37:18Z

And running it, is is not immediately clear how to provide input data. It asks to set the GPUTYPE, but setting only the GPUTYPE, it will just exit immediately again.
Perhaps, if nothing is set at all, you could print something like the comment at the start of the file

# Main benchmark mode:
#   RUN_BENCHMARK=1 GPUTYPE=HIP FILEWORKDIR=/path/to/raw_tf_dir ./gen_single_gpu_rtc_benchmark.sh

ChSonnabend · 2026-06-17T07:07:17Z

Btw, why is the script named "gen_single_gpu_rtc_benchmark.sh"? What does "gen" stand for?

Generate.

…mple metrics. Additionally adding RTC cache dir

ChSonnabend added 4 commits June 8, 2026 14:59

Avoiding numactl execution to avoid crashes of FST in container env w…

dd89cae

…ithout accesibility to numactl

Adding GPU benchmark scripts and python analysis script

95f3190

Merge branch 'AliceO2Group:dev' into devel_fst_numactl

48b88e3

Resetting start_tumx.sh to upstream/dev

478d76b

ChSonnabend requested a review from a team as a code owner June 12, 2026 07:28

davidrohr requested changes Jun 12, 2026

View reviewed changes

ChSonnabend added 3 commits June 12, 2026 13:11

Updating scripts

f87246f

Update env variables

43e244b

Adding microsecond logging for dpl-workflow.sh

6353bf6

Avoiding unbound variable

d57d8ab

davidrohr reviewed Jun 13, 2026

View reviewed changes

ChSonnabend added 3 commits June 13, 2026 09:48

Adjusting for comments

18943ef

Merge branch 'AliceO2Group:dev' into devel_fst_numactl

fce6045

Remove external lib-loading to avoid glibc errors

1b73af9

davidrohr requested changes Jun 16, 2026

View reviewed changes

ChSonnabend added 3 commits June 16, 2026 13:13

Allowing for CPU and OpenCL as GPUTYPE

4cb5d7c

Updating defaults and adding task analysis script

3fe1703

Fixing timing for threaded processes

dabd904

Adding multithreading variable for CPU processes

013c66d

ChSonnabend added 6 commits June 17, 2026 09:16

Minor polishing

5d12c67

Updating verbosity and changing png to pdf

3d4469c

Small fix for first gap usage

19a95a9

Three graphs (processing + gap, processing, gap) with gaussian and sa…

22e3993

…mple metrics. Additionally adding RTC cache dir

Bug-fix

207d153

Handling crashes and exit codes

92ba4d4

Adding summary output

e13b9bf


		export DPL_REPORT_PROCESSING="${DPL_REPORT_PROCESSING:-1}"

		export FST_TMUX_NO_EPN="${FST_TMUX_NO_EPN:-1}"


		export ALICE_O2_FST="${ALICE_O2_FST:-1}"

		if [[ -f "$GEN_TOPO_MYDIR/setenv.sh" ]]; then


		export RUN_BENCHMARK="${RUN_BENCHMARK:-0}"

		echo "# Alien/JAliEn environment check:"

Conversation

ChSonnabend commented Jun 12, 2026

Uh oh!

davidrohr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alibuild commented Jun 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidrohr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alibuild commented Jun 16, 2026

Uh oh!

davidrohr commented Jun 17, 2026

Uh oh!

davidrohr commented Jun 17, 2026

Uh oh!

ChSonnabend commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants