Skip to content

[Aurora] GPU pipelining failure/performance degradation #7464

@rithwiktom

Description

@rithwiktom

When using mpich/opt/develop-git.6037a7a on Aurora, I notice that the following test crashes for large message size (~64MB) when GPU pipelining is turned on. For smaller message sizes, there is a performance regression.
mpich/opt/4.2.3-intel does not seem to have this issue.

This is the performance upto 4MB using the two versions:

message size GPU PIPLN/ develop-git.6037a7a (MB/s) GPU PIPLN/4.2.3-intel (MB/s)
1 1 1.61
2 1.59 3.23
4 3.19 6.45
8 6.38 12.96
16 12.78 25.89
32 25.57 51.75
64 50.78 100.14
128 36.33 123.29
256 94.03 121.6
512 98.78 134.72
1024 106.68 143.59
2048 111.51 732.25
4096 113.87 1472.69
8192 109.69 2957.42
16384 110.34 5861.81
32768 107.82 11590.16
65536 21586.53 21417.23
131072 28338.11 34983.17
262144 30891.85 43363.4
524288 32059.07 46077.84
1048576 28726.47 46744.3
2097152 29452.7 47071.55
4194304 29858.49 47247.48

This is the test:

export FI_CXI_RDZV_THRESHOLD=131072
export EnableImplicitScaling=0
export NEOReadDebugKeys=1
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1
export MPIR_CVAR_GPU_USE_IMMEDIATE_COMMAND_LIST=1

# Enable GPU pipelining
export MPIR_CVAR_CH4_OFI_ENABLE_GPU_PIPELINE=1
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_THRESHOLD=0
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_NUM_BUFFERS_PER_CHUNK=4
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_MAX_NUM_BUFFERS=4
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_D2H_ENGINE_TYPE=1
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_H2D_ENGINE_TYPE=1

mpiexec -np 4 -ppn 2  --cpu-bind list:2:15  ~/gpu_wrappers/2-2.sh  $PATH_TO_OSU/pt2pt/osu_mbw_mr -m 1:67108864 -i 100 -x 20 -d ze D D 

The wrapper script used here is

#!/bin/bash
 
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
 
if [ $PALS_LOCAL_RANKID -eq 0 ]
then
    AFFINITY_MASK=0.0
    NIC_NUM=cxi0
elif [ $PALS_LOCAL_RANKID -eq 1 ]
then
    AFFINITY_MASK=1.0
    NIC_NUM=cxi1
fi
 
echo "[I am rank $PALS_RANKID] Localrank=$PALS_LOCAL_RANKID : Affinity mask = $AFFINITY_MASK, PREFERRED_NIC =  $NIC_NUM"
 
export ZE_AFFINITY_MASK=$AFFINITY_MASK
export FI_CXI_DEVICE_NAME=$NIC_NUM
 
# Invoke the main program
$*

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions