Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ The model is trained from a random initialization until convergence, which is de
1. `ml cuda/12.9.1 gcc/13.3.1 mvapich2/2.3.7`
1. `export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH`
- ROCm (elcap):
1. `ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.0 rccl/fast-env-slows-mpi`
1. `ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.1 rccl/fast-env-slows-mpi`
- If using WCI wheel:
1. `export LD_PRELOAD=/opt/rocm-7.1.0/llvm/lib/libomp.so` # for libomp.so
1. `export LD_PRELOAD=/opt/rocm-7.1.1/llvm/lib/libomp.so` # for libomp.so

1. Install the benchmark in the python venv:
- CUDA: `pip install --no-binary=mpi4py .[cuda] --prefix=.venvs/scaffoldvenv --extra-index-url https://download.pytorch.org/whl/cu129 2>&1 | tee install.log`
Expand Down Expand Up @@ -226,7 +226,7 @@ make && make install
git clone https://github.com/LLNL/Caliper.git
cd Caliper
mkdir pybuild && cd pybuild
ml rocm/7.1.0
ml rocm/7.1.1
ml cuda/12.9.1
cmake -DWITH_PYTHON_BINDINGS=ON \
-DWITH_ROCPROFILER=ON \
Expand Down
4 changes: 2 additions & 2 deletions ScaFFold/utils/create_restart_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def _get_env_setup() -> str:
# --- Begin Environment Setup ---
# Load Modules
if command -v module &> /dev/null; then
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.0 rccl/fast-env-slows-mpi
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.1 rccl/fast-env-slows-mpi
fi

# Activate Virtual Environment
Expand All @@ -111,7 +111,7 @@ def _get_env_setup() -> str:

# Environment variables
export SPINDLE_FLUXOPT=off
export LD_PRELOAD=/opt/rocm-7.1.0/llvm/lib/libomp.so
export LD_PRELOAD=/opt/rocm-7.1.1/llvm/lib/libomp.so

export PROFILE_TORCH=ON
# --- End Environment Setup ---
Expand Down
2 changes: 1 addition & 1 deletion scripts/install-rccl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ if [ -d "aws-ofi-nccl.git" ]; then
return 1 2>/dev/null || exit 1
fi

rocm_version=7.1.0
rocm_version=7.1.1

module swap PrgEnv-cray PrgEnv-gnu
module load rocm/$rocm_version
Expand Down
2 changes: 1 addition & 1 deletion scripts/install-tuolumne-torchpypi.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
. install-rccl.sh
ml load python/3.11.5 && python3 -m venv .venvs/scaffoldvenv-tuo-pypi && source .venvs/scaffoldvenv-tuo-pypi/bin/activate && pip install --upgrade pip
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.0 rccl/fast-env-slows-mpi
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.1 rccl/fast-env-slows-mpi
pip install -e .[rocm] --prefix=.venvs/scaffoldvenv-tuo-pypi --extra-index-url https://download.pytorch.org/whl/rocm7.1 2>&1 | tee install.log
2 changes: 1 addition & 1 deletion scripts/install-tuolumne.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ml load python/3.11.5 && python3 -m venv .venvs/scaffoldvenv-tuo && source .venvs/scaffoldvenv-tuo/bin/activate && pip install --upgrade pip
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.0 rccl/fast-env-slows-mpi
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.1 rccl/fast-env-slows-mpi
pip install -e .[rocmwci] --prefix=.venvs/scaffoldvenv-tuo 2>&1 | tee install.log
# Needed until new wheel exists for torch using mpich 9.1.0
TORCH_LIB_DIR=".venvs/scaffoldvenv-tuo/lib/python3.11/site-packages/torch/lib"
Expand Down
2 changes: 1 addition & 1 deletion scripts/scaffold-tuolumne-torchpypi.job
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# flux: -qpdebug
# flux: -B fractale

ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.0 rccl/fast-env-slows-mpi
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.1 rccl/fast-env-slows-mpi

. .venvs/scaffoldvenv-tuo-pypi/bin/activate

Expand Down
4 changes: 2 additions & 2 deletions scripts/scaffold-tuolumne.job
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@
# flux: -qpdebug
# flux: -B fractale

ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.0 rccl/fast-env-slows-mpi
ml cce/21.0.0 cray-mpich/9.1.0 rocm/7.1.1 rccl/fast-env-slows-mpi

. .venvs/scaffoldvenv-tuo/bin/activate

# (1) Avoid libmagma error
# (2) Removing libmpi may cause segfault on mpi4py import
export LD_PRELOAD="/opt/rocm-7.1.0/llvm/lib/libomp.so /opt/cray/pe/mpich/9.1.0/ofi/gnu/11.2/lib/libmpi_gnu.so.12"
export LD_PRELOAD="/opt/rocm-7.1.1/llvm/lib/libomp.so /opt/cray/pe/mpich/9.1.0/ofi/gnu/11.2/lib/libmpi_gnu.so.12"

torchrun-hpc -N 1 -n 1 $(which scaffold) generate_fractals -c $(pwd)/ScaFFold/configs/benchmark_default.yml

Expand Down
Loading