Skip to content

Poor shared memory performance with small message sizes on Frontier #7723

@AcerP-py

Description

@AcerP-py

I am experiencing poor performance for small message sizes on Frontier. Here is my build configuration:

../mpich/configure \
    --prefix=/a/path \
    --with-device=ch4:ofi \
    --with-hip=${ROCM_PATH} \
    --with-libfabric=/opt/cray/libfabric/1.22.0 \
    --with-pm=none \
    --with-pmi=pmi2 \
    --with-pmi2=/opt/cray/pe/pmi/default \
    --enable-yield=sched_yield \
    --with-namepublisher=file \
    --enable-fortran \
    --disable-allowport \
    --disable-cxx \
    --enable-threads=runtime \
    --enable-thread-cs=global \
    --with-ch4-shmmods=xpmem \
    --with-hwloc

Here is a Cray MPICH run for comparison:

#############################################
# Cray MPICH
#############################################
$ srun -N1 -n2 osu_bw

# OSU MPI Bandwidth Test v7.5
# Datatype: MPI_CHAR.
# Size      Bandwidth (MB/s)
1                       3.87
2                       8.19
4                      16.65
8                      32.67
16                     66.86
32                    125.63
64                    245.75
128                   487.85
256                   536.78
512                   999.86
1024                 1203.84
2048                 2377.42
4096                 4632.42
8192                 7536.08
16384               10942.85
32768               16478.34
65536               20884.19
131072              22952.35
262144              23412.40
524288              24591.60
1048576             25474.22
2097152             26005.67
4194304             26272.59

And then the MPICH run with debug summary turned on:

#############################################
# MPICH v5.0.0
#############################################
$ MPIR_CVAR_DEBUG_SUMMARY=1 srun -N1 -n2 osu_bw
==== GPU Init (HIP) ====
device_count: 8
=========================
Required minimum FI_VERSION: 0, current version: 10016
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] 1<redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN6 [28] ::1
provider: shm, score = 0, pref = -2, FI_ADDR_STR [16] - fi_shm://242766
provider: shm, score = 5, pref = -2, FI_ADDR_STR [16] - fi_shm://242766
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: sm2, score = 0, pref = 0, FI_ADDR_STR [15] - fi_sm2://242766
provider: sm2, score = 4, pref = 0, FI_ADDR_STR [15] - fi_sm2://242766
Required minimum FI_VERSION: 10016, current version: 10016
==== Capability set configuration ====
libfabric provider: sm2 - sm2
MPIDI_OFI_ENABLE_DATA: 1
MPIDI_OFI_ENABLE_AV_TABLE: 1
MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 0
MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0
MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0
MPIDI_OFI_ENABLE_MR_ALLOCATED: 0
MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1
MPIDI_OFI_ENABLE_MR_PROV_KEY: 0
MPIDI_OFI_ENABLE_TAGGED: 1
MPIDI_OFI_ENABLE_AM: 1
MPIDI_OFI_ENABLE_RMA: 0
MPIDI_OFI_ENABLE_ATOMICS: 0
MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1
MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0
MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0
MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1
MPIDI_OFI_ENABLE_TRIGGERED: 0
MPIDI_OFI_ENABLE_HMEM: 1
MPIDI_OFI_NUM_AM_BUFFERS: 8
MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0
MPIDI_OFI_CONTEXT_BITS: 20
MPIDI_OFI_SOURCE_BITS: 0
MPIDI_OFI_TAG_BITS: 31
MPIDI_OFI_VNI_USE_DOMAIN: 1
MAXIMUM SUPPORTED RANKS: 4294967296
MAXIMUM TAG: 2147483648
==== Provider global thresholds ====
max_buffered_send: 0
max_buffered_write: 0
max_msg_size: 9223372036854775807
max_order_raw: -1
max_order_war: -1
max_order_waw: -1
tx_iov_limit: 4
rx_iov_limit: 4
rma_iov_limit: 4
max_mr_key_size: 8
==== Various sizes and limits ====
MPIDI_OFI_AM_MSG_HEADER_SIZE: 24
MPIDI_OFI_MAX_AM_HDR_SIZE: 255
sizeof(MPIDI_OFI_am_request_header_t): 416
sizeof(MPIDI_OFI_per_vci_t): 52480
MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024
MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384
======================================
==== Various sizes and limits ====
sizeof(MPIDI_per_vci_t): 128
==== collective selection ====
MPIR_CVAR_DEVICE_COLLECTIVES: percoll
MPIR: MPII_coll_generic_json
MPID: MPIDI_coll_generic_json
MPID/shm: MPIDI_POSIX_coll_generic_json
MPID (GPU): MPIDI_coll_generic_json
MPID/shm (GPU): MPIDI_POSIX_coll_generic_json
==== OFI dynamic settings ====
num_vcis: 1
num_nics: 1
======================================
MPICH 5.0.0 - 11719d364f - unreleased development copy
error checking    : enabled
QMPI              : disabled
debugger support  : disabled
thread level      : MPI_THREAD_SINGLE
thread CS         : global
threadcomm        : disabled
==== data structure summary ====
sizeof(MPIR_Comm): 1832
sizeof(MPIR_Request): 592
sizeof(MPIR_Datatype): 280
================================

# OSU MPI Bandwidth Test v7.5
# Datatype: MPI_CHAR.
# Size      Bandwidth (MB/s)
1                       0.65
2                       1.31
4                       2.62
8                       5.25
16                     10.51
32                     20.94
64                     41.56
128                    82.41
256                   164.45
512                   322.51
1024                  633.86
2048                 1094.64
4096                 2034.42
8192                 3822.30
16384                6166.69
32768                9719.44
65536               22026.47
131072              30912.31
262144              33822.81
524288              32617.31
1048576             36209.71
2097152             37716.53
4194304             38421.87

@raffenet if you need more info let me know!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions