[Aurora] GPU pipelining failure/performance degradation

When using `mpich/opt/develop-git.6037a7a` on Aurora, I notice that the following test crashes for large message size (~64MB) when GPU pipelining is turned on. For smaller message sizes, there is a performance regression. 
`mpich/opt/4.2.3-intel` does not seem to have this issue.

This is the performance upto 4MB using the two versions:

message size | GPU PIPLN/ develop-git.6037a7a (MB/s) | GPU PIPLN/4.2.3-intel (MB/s)
-- | -- | --
1 | 1 | 1.61
2 | 1.59 | 3.23
4 | 3.19 | 6.45
8 | 6.38 | 12.96
16 | 12.78 | 25.89
32 | 25.57 | 51.75
64 | 50.78 | 100.14
128 | 36.33 | 123.29
256 | 94.03 | 121.6
512 | 98.78 | 134.72
1024 | 106.68 | 143.59
2048 | 111.51 | 732.25
4096 | 113.87 | 1472.69
8192 | 109.69 | 2957.42
16384 | 110.34 | 5861.81
32768 | 107.82 | 11590.16
65536 | 21586.53 | 21417.23
131072 | 28338.11 | 34983.17
262144 | 30891.85 | 43363.4
524288 | 32059.07 | 46077.84
1048576 | 28726.47 | 46744.3
2097152 | 29452.7 | 47071.55
4194304 | 29858.49 | 47247.48




This is the test:
```
export FI_CXI_RDZV_THRESHOLD=131072
export EnableImplicitScaling=0
export NEOReadDebugKeys=1
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1
export MPIR_CVAR_GPU_USE_IMMEDIATE_COMMAND_LIST=1

# Enable GPU pipelining
export MPIR_CVAR_CH4_OFI_ENABLE_GPU_PIPELINE=1
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_THRESHOLD=0
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_NUM_BUFFERS_PER_CHUNK=4
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_MAX_NUM_BUFFERS=4
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_D2H_ENGINE_TYPE=1
export MPIR_CVAR_CH4_OFI_GPU_PIPELINE_H2D_ENGINE_TYPE=1

mpiexec -np 4 -ppn 2  --cpu-bind list:2:15  ~/gpu_wrappers/2-2.sh  $PATH_TO_OSU/pt2pt/osu_mbw_mr -m 1:67108864 -i 100 -x 20 -d ze D D 

```

The wrapper script used here is

```
#!/bin/bash
 
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
 
if [ $PALS_LOCAL_RANKID -eq 0 ]
then
    AFFINITY_MASK=0.0
    NIC_NUM=cxi0
elif [ $PALS_LOCAL_RANKID -eq 1 ]
then
    AFFINITY_MASK=1.0
    NIC_NUM=cxi1
fi
 
echo "[I am rank $PALS_RANKID] Localrank=$PALS_LOCAL_RANKID : Affinity mask = $AFFINITY_MASK, PREFERRED_NIC =  $NIC_NUM"
 
export ZE_AFFINITY_MASK=$AFFINITY_MASK
export FI_CXI_DEVICE_NAME=$NIC_NUM
 
# Invoke the main program
$*
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Aurora] GPU pipelining failure/performance degradation #7464

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

message size	GPU PIPLN/ develop-git.6037a7a (MB/s)	GPU PIPLN/4.2.3-intel (MB/s)
1	1	1.61
2	1.59	3.23
4	3.19	6.45
8	6.38	12.96
16	12.78	25.89
32	25.57	51.75
64	50.78	100.14
128	36.33	123.29
256	94.03	121.6
512	98.78	134.72
1024	106.68	143.59
2048	111.51	732.25
4096	113.87	1472.69
8192	109.69	2957.42
16384	110.34	5861.81
32768	107.82	11590.16
65536	21586.53	21417.23
131072	28338.11	34983.17
262144	30891.85	43363.4
524288	32059.07	46077.84
1048576	28726.47	46744.3
2097152	29452.7	47071.55
4194304	29858.49	47247.48

[Aurora] GPU pipelining failure/performance degradation #7464

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions