I am experiencing poor performance for small message sizes on Frontier. Here is my build configuration:
#############################################
# MPICH v5.0.0
#############################################
$ MPIR_CVAR_DEBUG_SUMMARY=1 srun -N1 -n2 osu_bw
==== GPU Init (HIP) ====
device_count: 8
=========================
Required minimum FI_VERSION: 0, current version: 10016
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: cxi, score = 6, pref = -100, FI_FORMAT_UNSPEC [8]
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] 1<redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] <redacted>
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1
provider: udp;ofi_rxd, score = 0, pref = -2, FI_SOCKADDR_IN6 [28] ::1
provider: shm, score = 0, pref = -2, FI_ADDR_STR [16] - fi_shm://242766
provider: shm, score = 5, pref = -2, FI_ADDR_STR [16] - fi_shm://242766
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] <redacted>
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: sm2, score = 0, pref = 0, FI_ADDR_STR [15] - fi_sm2://242766
provider: sm2, score = 4, pref = 0, FI_ADDR_STR [15] - fi_sm2://242766
Required minimum FI_VERSION: 10016, current version: 10016
==== Capability set configuration ====
libfabric provider: sm2 - sm2
MPIDI_OFI_ENABLE_DATA: 1
MPIDI_OFI_ENABLE_AV_TABLE: 1
MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 0
MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0
MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0
MPIDI_OFI_ENABLE_MR_ALLOCATED: 0
MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1
MPIDI_OFI_ENABLE_MR_PROV_KEY: 0
MPIDI_OFI_ENABLE_TAGGED: 1
MPIDI_OFI_ENABLE_AM: 1
MPIDI_OFI_ENABLE_RMA: 0
MPIDI_OFI_ENABLE_ATOMICS: 0
MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1
MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0
MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0
MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1
MPIDI_OFI_ENABLE_TRIGGERED: 0
MPIDI_OFI_ENABLE_HMEM: 1
MPIDI_OFI_NUM_AM_BUFFERS: 8
MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0
MPIDI_OFI_CONTEXT_BITS: 20
MPIDI_OFI_SOURCE_BITS: 0
MPIDI_OFI_TAG_BITS: 31
MPIDI_OFI_VNI_USE_DOMAIN: 1
MAXIMUM SUPPORTED RANKS: 4294967296
MAXIMUM TAG: 2147483648
==== Provider global thresholds ====
max_buffered_send: 0
max_buffered_write: 0
max_msg_size: 9223372036854775807
max_order_raw: -1
max_order_war: -1
max_order_waw: -1
tx_iov_limit: 4
rx_iov_limit: 4
rma_iov_limit: 4
max_mr_key_size: 8
==== Various sizes and limits ====
MPIDI_OFI_AM_MSG_HEADER_SIZE: 24
MPIDI_OFI_MAX_AM_HDR_SIZE: 255
sizeof(MPIDI_OFI_am_request_header_t): 416
sizeof(MPIDI_OFI_per_vci_t): 52480
MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024
MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384
======================================
==== Various sizes and limits ====
sizeof(MPIDI_per_vci_t): 128
==== collective selection ====
MPIR_CVAR_DEVICE_COLLECTIVES: percoll
MPIR: MPII_coll_generic_json
MPID: MPIDI_coll_generic_json
MPID/shm: MPIDI_POSIX_coll_generic_json
MPID (GPU): MPIDI_coll_generic_json
MPID/shm (GPU): MPIDI_POSIX_coll_generic_json
==== OFI dynamic settings ====
num_vcis: 1
num_nics: 1
======================================
MPICH 5.0.0 - 11719d364f - unreleased development copy
error checking : enabled
QMPI : disabled
debugger support : disabled
thread level : MPI_THREAD_SINGLE
thread CS : global
threadcomm : disabled
==== data structure summary ====
sizeof(MPIR_Comm): 1832
sizeof(MPIR_Request): 592
sizeof(MPIR_Datatype): 280
================================
# OSU MPI Bandwidth Test v7.5
# Datatype: MPI_CHAR.
# Size Bandwidth (MB/s)
1 0.65
2 1.31
4 2.62
8 5.25
16 10.51
32 20.94
64 41.56
128 82.41
256 164.45
512 322.51
1024 633.86
2048 1094.64
4096 2034.42
8192 3822.30
16384 6166.69
32768 9719.44
65536 22026.47
131072 30912.31
262144 33822.81
524288 32617.31
1048576 36209.71
2097152 37716.53
4194304 38421.87
I am experiencing poor performance for small message sizes on Frontier. Here is my build configuration:
Here is a Cray MPICH run for comparison:
And then the MPICH run with debug summary turned on:
@raffenet if you need more info let me know!