From 4413887d03caf260626f1d14f994012e63f29369 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Tue, 5 May 2026 17:58:34 +0300 Subject: [PATCH 01/13] Add max CPU cores to Roihu GPU partitions --- .../computing/running/batch-job-partitions.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 5ab61b3206..4b27d68160 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -73,13 +73,13 @@ Roihu features the following partitions for submitting jobs to CPU nodes: Roihu features the following partitions for submitting jobs to GPU nodes: -| Partition | Allocation type | Time limit | Min GPUs | Max GPUs | Max nodes | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | -|------------------|-----------------|------------|----------|----------|-----------|-----------------------------------------|------------------|--------------------| -| `gputest` | G | 15 minutes | 1 | 8 | 2 | GPU | 116 GiB + 95 GiB | | -| `gpuinteractive` | G | 12 hours | 1 | 1 | 1 | GPU ([slice](#roihu-gpu-slices)) | TBA | | -| `gpumedium` | G | 36 hours | 1 | 4 | 1 | GPU | 116 GiB + 95 GiB | | -| `gpularge` | G | 36 hours | 4 | 40 | 10 | GPU | 116 GiB + 95 GiB | [scalability test] | -| `vizinteractive` | G | 12 hours | 1 | 1 | 1 | V | 183 GiB + 44 GiB | | +| Partition | Allocation type | Time limit | Max CPU cores | Min GPUs | Max GPUs | Max nodes | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | +|------------------|-----------------|------------|---------------|----------|----------|-----------|-----------------------------------------|------------------|--------------------| +| `gputest` | G | 15 minutes | 576 | 1 | 8 | 2 | GPU | 116 GiB + 95 GiB | | +| `gpuinteractive` | G | 12 hours | 288 | 1 | 1 | 1 | GPU ([slice](#roihu-gpu-slices)) | TBA | | +| `gpumedium` | G | 36 hours | 288 | 1 | 4 | 1 | GPU | 116 GiB + 95 GiB | | +| `gpularge` | G | 36 hours | 2880 | 1 | 40 | 10 | GPU | 116 GiB + 95 GiB | [scalability test] | +| `vizinteractive` | G | 12 hours | 64 | 1 | 1 | 1 | V | 183 GiB + 44 GiB | | #### Roihu GPU slices @@ -93,10 +93,10 @@ and one-eighth of the GPU memory capacity (12 GiB) of a full GH200 superchip. In addition to the regular partitions, the following partitions are also available during the Roihu pilot phase: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | [Node types](../systems-roihu.md#nodes) | -|------------|-----------------|------------|-----------|-----------|-----------------------------------------| -| `pilot` | N | 24 hours | 1 | 200 | M | -| `gpupilot` | N | 48 hours | 1 | 60 | GPU | +| Partition | Allocation type | Time limit | Max CPU cores | Min GPUs | Max GPUs | Max nodes | [Node types](../systems-roihu.md#nodes) | +|------------|-----------------|------------|---------------|----------|----------|-----------|-----------------------------------------| +| `pilot` | N | 24 hours | 76800 | 0 | 0 | 200 | M | +| `gpupilot` | G | 48 hours | 17280 | 1 | 240 | 60 | GPU | ### Local storage on Roihu nodes From 50c4f462dfeebcf8af7986641dec165ea81d7e07 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 6 May 2026 14:46:20 +0300 Subject: [PATCH 02/13] Fix Roihu longrun partition max memory --- docs/computing/running/batch-job-partitions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 4b27d68160..c3e40d0823 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -61,7 +61,7 @@ Roihu features the following partitions for submitting jobs to CPU nodes: |-------------------|-----------------|------------|---------------|---------------|-----------|-----------------------------------------|------------------|--------------------| | `test` | R | 15 minutes | 1 | 768 | 2 | M | 744 GiB per job | | | `interactive` | R | 36 hours | 1 | 32 | 1 | M | 64 GiB per job | | -| `longrun` | R | 10 days | 1 | 192 | 1 | M, L | 744 GiB per job | | +| `longrun` | R | 10 days | 1 | 192 | 1 | M, L | 1500 GiB per job | | | `small` | R | 72 hours | 1 | 384 | 1 | M, L | 1500 GiB per job | | | `medium` | N | 36 hours | 384 | 2304 | 6 | M | 744 GiB per node | | | `large` | N | 36 hours | 2304 | 23040 | 60 | M | 744 GiB per node | [scalability test] | From 6640e019a9d893b29c88c5be400547511231afd2 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Mon, 18 May 2026 10:44:35 +0300 Subject: [PATCH 03/13] Update GPU partition CPUs into a per GPU basis --- docs/computing/running/batch-job-partitions.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index c3e40d0823..ead06580f1 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -73,13 +73,13 @@ Roihu features the following partitions for submitting jobs to CPU nodes: Roihu features the following partitions for submitting jobs to GPU nodes: -| Partition | Allocation type | Time limit | Max CPU cores | Min GPUs | Max GPUs | Max nodes | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | -|------------------|-----------------|------------|---------------|----------|----------|-----------|-----------------------------------------|------------------|--------------------| -| `gputest` | G | 15 minutes | 576 | 1 | 8 | 2 | GPU | 116 GiB + 95 GiB | | -| `gpuinteractive` | G | 12 hours | 288 | 1 | 1 | 1 | GPU ([slice](#roihu-gpu-slices)) | TBA | | -| `gpumedium` | G | 36 hours | 288 | 1 | 4 | 1 | GPU | 116 GiB + 95 GiB | | -| `gpularge` | G | 36 hours | 2880 | 1 | 40 | 10 | GPU | 116 GiB + 95 GiB | [scalability test] | -| `vizinteractive` | G | 12 hours | 64 | 1 | 1 | 1 | V | 183 GiB + 44 GiB | | +| Partition | Allocation type | Time limit | Max CPU cores per GPU | Min GPUs | Max GPUs | Max nodes | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | +|------------------|-----------------|------------|-----------------------|----------|----------|-----------|-----------------------------------------|------------------|--------------------| +| `gputest` | G | 15 minutes | 72 | 1 | 8 | 2 | GPU | 116 GiB + 95 GiB | | +| `gpuinteractive` | G | 12 hours | 72 | 1 | 1 | 1 | GPU ([slice](#roihu-gpu-slices)) | TBA | | +| `gpumedium` | G | 36 hours | 72 | 1 | 4 | 1 | GPU | 116 GiB + 95 GiB | | +| `gpularge` | G | 36 hours | 72 | 1 | 40 | 10 | GPU | 116 GiB + 95 GiB | [scalability test] | +| `vizinteractive` | G | 12 hours | 72 | 1 | 1 | 1 | V | 183 GiB + 44 GiB | | #### Roihu GPU slices From ce5371d1d8f5c8f92756b7afd75afc4d9600bbe7 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 16:17:08 +0300 Subject: [PATCH 04/13] Change batch job partition formatting --- .../computing/running/batch-job-partitions.md | 48 ++++++++++--------- 1 file changed, 26 insertions(+), 22 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index ead06580f1..766177e92d 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -57,30 +57,34 @@ and resource requirements. These are explained in the table below. Roihu features the following partitions for submitting jobs to CPU nodes: -| Partition | Allocation type | Time limit | Min CPU cores | Max CPU cores | Max nodes | [Node types](../systems-roihu.md#nodes) | Max memory | Requirements | -|-------------------|-----------------|------------|---------------|---------------|-----------|-----------------------------------------|------------------|--------------------| -| `test` | R | 15 minutes | 1 | 768 | 2 | M | 744 GiB per job | | -| `interactive` | R | 36 hours | 1 | 32 | 1 | M | 64 GiB per job | | -| `longrun` | R | 10 days | 1 | 192 | 1 | M, L | 1500 GiB per job | | -| `small` | R | 72 hours | 1 | 384 | 1 | M, L | 1500 GiB per job | | -| `medium` | N | 36 hours | 384 | 2304 | 6 | M | 744 GiB per node | | -| `large` | N | 36 hours | 2304 | 23040 | 60 | M | 744 GiB per node | [scalability test] | -| `hugemem` | C | 36 hours | 16 | 128 | 1 | XL | 6037 GiB per job | | -| `hugemem_longrun` | C | 10 days | 16 | 128 | 1 | XL | 6037 GiB per job | | - +| Partition | Allocation type | Time limit | Min nodes | Max nodes | CPUs per node | [Node types](../systems-roihu.md#nodes) | Max memory per node | Requirements | +|-------------------|-----------------|------------|-----------|-----------|---------------|-----------------------------------------|-----------------------|--------------------| +| `test` | R | 15 minutes | 1 | 2 | 384 | M | 744 GiB | | +| `interactive` | R | 36 hours | 1 | 1 | 32 | M | 64 GiB | | +| `small` | R | 72 hours | 1 | 1 | 384 | M, L | 1500 GiB | | +| `medium` | N | 36 hours | 1 | 6 | 384 | M | 744 GiB | | +| `large` | N | 36 hours | 6 | 60 | 384 | M | 744 GiB | [scalability test] | +| `longrun` | R | 10 days | 1 | 1 | 192 | M, L | 1500 GiB | | +| `hugemem` | C | 36 hours | 1 | 1 | 128 | XL | 6037 GiB | | +| `hugemem_longrun` | C | 10 days | 1 | 1 | 128 | XL | 6037 GiB | | ### Roihu GPU partitions Roihu features the following partitions for submitting jobs to GPU nodes: -| Partition | Allocation type | Time limit | Max CPU cores per GPU | Min GPUs | Max GPUs | Max nodes | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | -|------------------|-----------------|------------|-----------------------|----------|----------|-----------|-----------------------------------------|------------------|--------------------| -| `gputest` | G | 15 minutes | 72 | 1 | 8 | 2 | GPU | 116 GiB + 95 GiB | | -| `gpuinteractive` | G | 12 hours | 72 | 1 | 1 | 1 | GPU ([slice](#roihu-gpu-slices)) | TBA | | -| `gpumedium` | G | 36 hours | 72 | 1 | 4 | 1 | GPU | 116 GiB + 95 GiB | | -| `gpularge` | G | 36 hours | 72 | 1 | 40 | 10 | GPU | 116 GiB + 95 GiB | [scalability test] | -| `vizinteractive` | G | 12 hours | 72 | 1 | 1 | 1 | V | 183 GiB + 44 GiB | | +| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per node | CPUs per node | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | +|------------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------|------------------|--------------------| +| `gputest` | G | 15 minutes | 1 | 2 | 4 | 288 | GPU | 116 GiB + 95 GiB | | +| `gpuinteractive` | G | 12 hours | 1 | 1 | 1 | 72 | GPU ([slice](#roihu-gpu-slices)) | TBA | | +| `gpumedium` | G | 36 hours | 1 | 1 | 4 | 288 | GPU | 116 GiB + 95 GiB | | +| `gpularge` | G | 36 hours | 1 | 10 | 4 | 288 | GPU | 116 GiB + 95 GiB | [scalability test] | + +### Roihu visualization partition + +Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: +| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per node | CPUs per node | [Node types](../systems-roihu.md#nodes) | Max CPU memory | Memory per GPU | +| `vizinteractive` | G | 12 hours | 1 | 1 | 2 | 64 | V | 367 GiB per node | 44 GiB | #### Roihu GPU slices @@ -93,10 +97,10 @@ and one-eighth of the GPU memory capacity (12 GiB) of a full GH200 superchip. In addition to the regular partitions, the following partitions are also available during the Roihu pilot phase: -| Partition | Allocation type | Time limit | Max CPU cores | Min GPUs | Max GPUs | Max nodes | [Node types](../systems-roihu.md#nodes) | -|------------|-----------------|------------|---------------|----------|----------|-----------|-----------------------------------------| -| `pilot` | N | 24 hours | 76800 | 0 | 0 | 200 | M | -| `gpupilot` | G | 48 hours | 17280 | 1 | 240 | 60 | GPU | +| Partition | Allocation type | Time limit | Min nodes | Max nodes | CPUs per node | GPUs per node | [Node types](../systems-roihu.md#nodes) | +|------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------| +| `pilot` | N | 24 hours | 1 | 200 | 384 | 0 | M | +| `gpupilot` | G | 48 hours | 1 | 60 | 288 | 4 | GPU | ### Local storage on Roihu nodes From 8a72704a02ca9ed0aced496b1681355e5119fb30 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 16:38:25 +0300 Subject: [PATCH 05/13] Clarify max memory and max cpus in partitions --- .../computing/running/batch-job-partitions.md | 27 ++++++++++--------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 766177e92d..b4dfadae23 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -57,33 +57,34 @@ and resource requirements. These are explained in the table below. Roihu features the following partitions for submitting jobs to CPU nodes: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | CPUs per node | [Node types](../systems-roihu.md#nodes) | Max memory per node | Requirements | +| Partition | Allocation type | Time limit | Min nodes | Max nodes | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max memory | Requirements | |-------------------|-----------------|------------|-----------|-----------|---------------|-----------------------------------------|-----------------------|--------------------| -| `test` | R | 15 minutes | 1 | 2 | 384 | M | 744 GiB | | -| `interactive` | R | 36 hours | 1 | 1 | 32 | M | 64 GiB | | -| `small` | R | 72 hours | 1 | 1 | 384 | M, L | 1500 GiB | | -| `medium` | N | 36 hours | 1 | 6 | 384 | M | 744 GiB | | -| `large` | N | 36 hours | 6 | 60 | 384 | M | 744 GiB | [scalability test] | -| `longrun` | R | 10 days | 1 | 1 | 192 | M, L | 1500 GiB | | -| `hugemem` | C | 36 hours | 1 | 1 | 128 | XL | 6037 GiB | | -| `hugemem_longrun` | C | 10 days | 1 | 1 | 128 | XL | 6037 GiB | | +| `test` | R | 15 minutes | 1 | 2 | 384 per node | M | 744 GiB per node | | +| `interactive` | R | 36 hours | 1 | 1 | 32 per job | M | 64 GiB per job | | +| `small` | R | 72 hours | 1 | 1 | 384 per node | M, L | 1500 GiB per node | | +| `medium` | N | 36 hours | 1 | 6 | 384 per node | M | 744 GiB per node | | +| `large` | N | 36 hours | 6 | 60 | 384 per node | M | 744 GiB per node | [scalability test] | +| `longrun` | R | 10 days | 1 | 1 | 192 per job | M, L | 1500 GiB per job | | +| `hugemem` | C | 36 hours | 1 | 1 | 128 per node | XL | 6037 GiB per node | | +| `hugemem_longrun` | C | 10 days | 1 | 1 | 128 per node | XL | 6037 GiB per node | | ### Roihu GPU partitions Roihu features the following partitions for submitting jobs to GPU nodes: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per node | CPUs per node | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | +| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per node | CPUs per GPU | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | |------------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------|------------------|--------------------| -| `gputest` | G | 15 minutes | 1 | 2 | 4 | 288 | GPU | 116 GiB + 95 GiB | | +| `gputest` | G | 15 minutes | 1 | 2 | 4 | 72 | GPU | 116 GiB + 95 GiB | | | `gpuinteractive` | G | 12 hours | 1 | 1 | 1 | 72 | GPU ([slice](#roihu-gpu-slices)) | TBA | | -| `gpumedium` | G | 36 hours | 1 | 1 | 4 | 288 | GPU | 116 GiB + 95 GiB | | -| `gpularge` | G | 36 hours | 1 | 10 | 4 | 288 | GPU | 116 GiB + 95 GiB | [scalability test] | +| `gpumedium` | G | 36 hours | 1 | 1 | 4 | 72 | GPU | 116 GiB + 95 GiB | | +| `gpularge` | G | 36 hours | 1 | 10 | 4 | 72 | GPU | 116 GiB + 95 GiB | [scalability test] | ### Roihu visualization partition Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: | Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per node | CPUs per node | [Node types](../systems-roihu.md#nodes) | Max CPU memory | Memory per GPU | +|------------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------|------------------|------------------| | `vizinteractive` | G | 12 hours | 1 | 1 | 2 | 64 | V | 367 GiB per node | 44 GiB | #### Roihu GPU slices From 3c09651dd528eec187ff7829578fe1a680a4e910 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 16:48:38 +0300 Subject: [PATCH 06/13] Use per-job categorization on single node partitions --- docs/computing/running/batch-job-partitions.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index b4dfadae23..032d9090c7 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -61,12 +61,12 @@ Roihu features the following partitions for submitting jobs to CPU nodes: |-------------------|-----------------|------------|-----------|-----------|---------------|-----------------------------------------|-----------------------|--------------------| | `test` | R | 15 minutes | 1 | 2 | 384 per node | M | 744 GiB per node | | | `interactive` | R | 36 hours | 1 | 1 | 32 per job | M | 64 GiB per job | | -| `small` | R | 72 hours | 1 | 1 | 384 per node | M, L | 1500 GiB per node | | +| `small` | R | 72 hours | 1 | 1 | 384 per job | M, L | 1500 GiB per job | | | `medium` | N | 36 hours | 1 | 6 | 384 per node | M | 744 GiB per node | | | `large` | N | 36 hours | 6 | 60 | 384 per node | M | 744 GiB per node | [scalability test] | | `longrun` | R | 10 days | 1 | 1 | 192 per job | M, L | 1500 GiB per job | | -| `hugemem` | C | 36 hours | 1 | 1 | 128 per node | XL | 6037 GiB per node | | -| `hugemem_longrun` | C | 10 days | 1 | 1 | 128 per node | XL | 6037 GiB per node | | +| `hugemem` | C | 36 hours | 1 | 1 | 128 per job | XL | 6037 GiB per job | | +| `hugemem_longrun` | C | 10 days | 1 | 1 | 128 per job | XL | 6037 GiB per job | | ### Roihu GPU partitions @@ -83,9 +83,9 @@ Roihu features the following partitions for submitting jobs to GPU nodes: Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per node | CPUs per node | [Node types](../systems-roihu.md#nodes) | Max CPU memory | Memory per GPU | +| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per job | CPUs per job | [Node types](../systems-roihu.md#nodes) | Max CPU memory | Memory per GPU | |------------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------|------------------|------------------| -| `vizinteractive` | G | 12 hours | 1 | 1 | 2 | 64 | V | 367 GiB per node | 44 GiB | +| `vizinteractive` | G | 12 hours | 1 | 1 | 2 | 64 | V | 367 GiB per job | 44 GiB | #### Roihu GPU slices From f735c2e01bf736d7ba3e9828fd400601831f3234 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 18:57:51 +0300 Subject: [PATCH 07/13] Simplify slurm partition table --- .../computing/running/batch-job-partitions.md | 40 ++++++++++--------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 032d9090c7..080bc80336 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -57,35 +57,37 @@ and resource requirements. These are explained in the table below. Roihu features the following partitions for submitting jobs to CPU nodes: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max memory | Requirements | -|-------------------|-----------------|------------|-----------|-----------|---------------|-----------------------------------------|-----------------------|--------------------| -| `test` | R | 15 minutes | 1 | 2 | 384 per node | M | 744 GiB per node | | -| `interactive` | R | 36 hours | 1 | 1 | 32 per job | M | 64 GiB per job | | -| `small` | R | 72 hours | 1 | 1 | 384 per job | M, L | 1500 GiB per job | | -| `medium` | N | 36 hours | 1 | 6 | 384 per node | M | 744 GiB per node | | -| `large` | N | 36 hours | 6 | 60 | 384 per node | M | 744 GiB per node | [scalability test] | -| `longrun` | R | 10 days | 1 | 1 | 192 per job | M, L | 1500 GiB per job | | -| `hugemem` | C | 36 hours | 1 | 1 | 128 per job | XL | 6037 GiB per job | | -| `hugemem_longrun` | C | 10 days | 1 | 1 | 128 per job | XL | 6037 GiB per job | | +| Partition | Allocation type | Time limit | Nodes | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max memory | Requirements | +|-------------------|-----------------|------------|--------|---------------|-----------------------------------------|-----------------------|--------------------| +| `test` | R | 15 minutes | 1 - 2 | 384 per node | M | 744 GiB per node | | +| `interactive` | R | 36 hours | 1 | 32 per job | M | 64 GiB per job | | +| `small` | R | 72 hours | 1 | 384 per job | M, L | 1500 GiB per job | | +| `medium` | N | 36 hours | 1 - 6 | 384 per node | M | 744 GiB per node | | +| `large` | N | 36 hours | 6 - 60 | 384 per node | M | 744 GiB per node | [scalability test] | +| `longrun` | R | 10 days | 1 | 192 per job | M, L | 1500 GiB per job | | +| `hugemem` | C | 36 hours | 1 | 128 per job | XL | 6037 GiB per job | | +| `hugemem_longrun` | C | 10 days | 1 | 128 per job | XL | 6037 GiB per job | | ### Roihu GPU partitions Roihu features the following partitions for submitting jobs to GPU nodes: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per node | CPUs per GPU | [Node types](../systems-roihu.md#nodes) | Memory per GPU | Requirements | -|------------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------|------------------|--------------------| -| `gputest` | G | 15 minutes | 1 | 2 | 4 | 72 | GPU | 116 GiB + 95 GiB | | -| `gpuinteractive` | G | 12 hours | 1 | 1 | 1 | 72 | GPU ([slice](#roihu-gpu-slices)) | TBA | | -| `gpumedium` | G | 36 hours | 1 | 1 | 4 | 72 | GPU | 116 GiB + 95 GiB | | -| `gpularge` | G | 36 hours | 1 | 10 | 4 | 72 | GPU | 116 GiB + 95 GiB | [scalability test] | +| Partition | Allocation type | Time limit | Nodes | Max GPUs | [Node types](../systems-roihu.md#nodes) | Requirements | +|------------------|-----------------|------------|--------|---------------|-----------------------------------------|--------------------| +| `gputest` | G | 15 minutes | 1 - 2 | 4 per node | GPU | | +| `gpuinteractive` | G | 12 hours | 1 | 1 per job | GPU ([slice](#roihu-gpu-slices)) | | +| `gpumedium` | G | 36 hours | 1 | 4 per job | GPU | | +| `gpularge` | G | 36 hours | 1 - 10 | 4 per node | GPU | [scalability test] | + +Each full GPU node in the partition has 4 GH200 GPUs. Each reserved GPU grants access to up to 72 CPU cores, and 116 GiB of HBM3 memory and 95 GiB of LPDDR5 memory. ### Roihu visualization partition Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | GPUs per job | CPUs per job | [Node types](../systems-roihu.md#nodes) | Max CPU memory | Memory per GPU | -|------------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------|------------------|------------------| -| `vizinteractive` | G | 12 hours | 1 | 1 | 2 | 64 | V | 367 GiB per job | 44 GiB | +| Partition | Allocation type | Time limit | Nodes | Max GPUs | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max CPU memory | Memory per GPU | +|------------------|-----------------|------------|-------|-----------|---------------|-----------------------------------------|------------------|------------------| +| `vizinteractive` | G | 12 hours | 1 | 2 per job | 64 per job | V | 367 GiB per job | 44 GiB | #### Roihu GPU slices From 4fb1095d81b4ab5e515b66753663c15b9be43614 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 19:10:45 +0300 Subject: [PATCH 08/13] Update slurm partition table --- .../computing/running/batch-job-partitions.md | 38 ++++++++++++------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 080bc80336..2bde0ac14a 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -79,15 +79,8 @@ Roihu features the following partitions for submitting jobs to GPU nodes: | `gpumedium` | G | 36 hours | 1 | 4 per job | GPU | | | `gpularge` | G | 36 hours | 1 - 10 | 4 per node | GPU | [scalability test] | -Each full GPU node in the partition has 4 GH200 GPUs. Each reserved GPU grants access to up to 72 CPU cores, and 116 GiB of HBM3 memory and 95 GiB of LPDDR5 memory. - -### Roihu visualization partition - -Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: - -| Partition | Allocation type | Time limit | Nodes | Max GPUs | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max CPU memory | Memory per GPU | -|------------------|-----------------|------------|-------|-----------|---------------|-----------------------------------------|------------------|------------------| -| `vizinteractive` | G | 12 hours | 1 | 2 per job | 64 per job | V | 367 GiB per job | 44 GiB | +Each full GPU node has 4 GH200 GPUs. On full GPU nodes, each reserved GPU grants access to up to 72 CPU cores, +116 GiB of HBM3 memory and 95 GiB of LPDDR5 memory. #### Roihu GPU slices @@ -95,15 +88,23 @@ Roihu `gpuinteractive` partition features GH200 superchips that are divided into a total of 48 smaller slices that have one-seventh of the compute capacity and one-eighth of the GPU memory capacity (12 GiB) of a full GH200 superchip. +### Roihu visualization partition + +Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: + +| Partition | Allocation type | Time limit | Nodes | Max GPUs | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max memory | Memory per GPU | +|------------------|-----------------|------------|-------|-----------|---------------|-----------------------------------------|-----------------|------------------| +| `vizinteractive` | G | 12 hours | 1 | 2 per job | 64 per job | V | 367 GiB per job | 44 GiB | + ### Roihu pilot partitions In addition to the regular partitions, the following partitions are also available during the Roihu pilot phase: -| Partition | Allocation type | Time limit | Min nodes | Max nodes | CPUs per node | GPUs per node | [Node types](../systems-roihu.md#nodes) | -|------------|-----------------|------------|-----------|-----------|---------------|---------------|-----------------------------------------| -| `pilot` | N | 24 hours | 1 | 200 | 384 | 0 | M | -| `gpupilot` | G | 48 hours | 1 | 60 | 288 | 4 | GPU | +| Partition | Allocation type | Time limit | Nodes | Max CPUs | Max GPUs | [Node types](../systems-roihu.md#nodes) | +|------------|-----------------|------------|---------|---------------|---------------|-----------------------------------------| +| `pilot` | N | 24 hours | 1 - 200 | 384 per node | 0 | M | +| `gpupilot` | G | 48 hours | 1 - 60 | 288 per node | 4 per node | GPU | ### Local storage on Roihu nodes @@ -112,6 +113,17 @@ Local storage on Roihu M, L and GPU nodes is meant for storing temporary files o High-performance local storage is available on Roihu XL and V nodes. Ideal for I/O-intensive jobs. +The available local storage that a single user can access in their jobs depends +on the system [partition](#roihu-partitions) they use: + +| Allocation type | Quota per user | +|:-------------------|---------------:| +| R (shared nodes) | 20 GiB | +| N (full nodes) | 600 GiB | +| G (GPU nodes) | 150 GiB | +| Hugemem (XL) nodes | 1.6 TiB | +| VIZ nodes | 6.5 TiB | + Read more about: [Local storage on Roihu nodes](../disk.md#temporary-local-disk-areas) ## Puhti partitions From 73ab1604c05325715dff039d0b4a9ca63831aea8 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 19:20:39 +0300 Subject: [PATCH 09/13] Simplify vizinteractive partition table --- docs/computing/running/batch-job-partitions.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 2bde0ac14a..2f59d79e2c 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -92,9 +92,11 @@ and one-eighth of the GPU memory capacity (12 GiB) of a full GH200 superchip. Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: -| Partition | Allocation type | Time limit | Nodes | Max GPUs | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max memory | Memory per GPU | -|------------------|-----------------|------------|-------|-----------|---------------|-----------------------------------------|-----------------|------------------| -| `vizinteractive` | G | 12 hours | 1 | 2 per job | 64 per job | V | 367 GiB per job | 44 GiB | +| Partition | Allocation type | Time limit | Nodes | Max GPUs | [Node types](../systems-roihu.md#nodes) | +|------------------|-----------------|------------|-------|-----------|-----------------------------------------| +| `vizinteractive` | G | 12 hours | 1 | 2 per job | V | + +Each node in the partition has 2 Nvidia L40 GPUs with 44 GiB of memory. Each reserved GPU grants access to up to 32 CPU cores and 183 GiB of CPU memory. ### Roihu pilot partitions From 25e5561dc974b549a6d7d8b168db3465f5abb724 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 20:22:32 +0300 Subject: [PATCH 10/13] Fix HBM v LDDR memory sizes --- docs/computing/running/batch-job-partitions.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 2f59d79e2c..b5c9007456 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -80,7 +80,10 @@ Roihu features the following partitions for submitting jobs to GPU nodes: | `gpularge` | G | 36 hours | 1 - 10 | 4 per node | GPU | [scalability test] | Each full GPU node has 4 GH200 GPUs. On full GPU nodes, each reserved GPU grants access to up to 72 CPU cores, -116 GiB of HBM3 memory and 95 GiB of LPDDR5 memory. +95 GiB of HBM3 memory and 116 GiB of LPDDR5 memory. + +The memory amounts listed here are the allocatable amounts available to jobs; +some GPU memory is reserved for system use. #### Roihu GPU slices From a605591e06ea4ccc580f5c13c26ba09406a87029 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Wed, 20 May 2026 20:23:05 +0300 Subject: [PATCH 11/13] Clarify memory reserved for system use --- docs/computing/running/batch-job-partitions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index b5c9007456..74ca500f3f 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -83,7 +83,7 @@ Each full GPU node has 4 GH200 GPUs. On full GPU nodes, each reserved GPU grants 95 GiB of HBM3 memory and 116 GiB of LPDDR5 memory. The memory amounts listed here are the allocatable amounts available to jobs; -some GPU memory is reserved for system use. +some memory is reserved for system use. #### Roihu GPU slices From cb0ccc5e09c948b3ec2437be4b1eafb3722c6015 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Thu, 21 May 2026 11:01:08 +0300 Subject: [PATCH 12/13] Move interactive partitions to separate section --- .../computing/running/batch-job-partitions.md | 54 +++++++++++++------ 1 file changed, 37 insertions(+), 17 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 74ca500f3f..2e2f6539ea 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -49,18 +49,17 @@ and resource requirements. These are explained in the table below. | Allocation type | Resource request | |:---------------:|---------------------------------------------------------------------------| | R | Memory and CPU resources can be changed independently | -| N | Full node requests only | -| C | Share of memory resources fixed based on requested number of CPU cores | -| G | Share of CPU and memory resources fixed based on requested number of GPUs | +| N | Full-node requests only | +| C | Memory allocation is fixed based on the requested number of CPU cores | +| G | CPU and memory allocation is fixed based on the requested number of GPUs | ### Roihu CPU partitions -Roihu features the following partitions for submitting jobs to CPU nodes: +Roihu provides the following partitions for submitting jobs to CPU nodes: | Partition | Allocation type | Time limit | Nodes | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max memory | Requirements | |-------------------|-----------------|------------|--------|---------------|-----------------------------------------|-----------------------|--------------------| | `test` | R | 15 minutes | 1 - 2 | 384 per node | M | 744 GiB per node | | -| `interactive` | R | 36 hours | 1 | 32 per job | M | 64 GiB per job | | | `small` | R | 72 hours | 1 | 384 per job | M, L | 1500 GiB per job | | | `medium` | N | 36 hours | 1 - 6 | 384 per node | M | 744 GiB per node | | | `large` | N | 36 hours | 6 - 60 | 384 per node | M | 744 GiB per node | [scalability test] | @@ -70,7 +69,7 @@ Roihu features the following partitions for submitting jobs to CPU nodes: ### Roihu GPU partitions -Roihu features the following partitions for submitting jobs to GPU nodes: +Roihu provides the following partitions for submitting jobs to GPU nodes: | Partition | Allocation type | Time limit | Nodes | Max GPUs | [Node types](../systems-roihu.md#nodes) | Requirements | |------------------|-----------------|------------|--------|---------------|-----------------------------------------|--------------------| @@ -80,30 +79,52 @@ Roihu features the following partitions for submitting jobs to GPU nodes: | `gpularge` | G | 36 hours | 1 - 10 | 4 per node | GPU | [scalability test] | Each full GPU node has 4 GH200 GPUs. On full GPU nodes, each reserved GPU grants access to up to 72 CPU cores, -95 GiB of HBM3 memory and 116 GiB of LPDDR5 memory. +95 GiB of HBM3 memory, and 116 GiB of LPDDR5 memory. The memory amounts listed here are the allocatable amounts available to jobs; some memory is reserved for system use. +### Roihu interactive partitions + +Roihu has several partitions reserved for interactive use and for data visualization. + +#### Roihu-CPU interactive use + +The `interactive` partition on Roihu allows running +[interactive jobs](./interactive-usage.md) on CPU nodes, through the `sinteractive` command. + +| Partition | Allocation type | Time limit | Nodes | Max CPUs | [Node types](../systems-roihu.md#nodes) | Max memory | +|-------------------|-----------------|------------|--------|---------------|-----------------------------------------|-----------------------| +| `interactive` | R | 36 hours | 1 | 32 per job | M | 64 GiB per job | + +#### Roihu-GPU interactive use + +`sinteractive` selects the correct partition based on your resource request +and automatically provides a GPU slice when run from a Roihu-GPU login node. + +| Partition | Allocation type | Time limit | Nodes | Max CPUs | [Node types](../systems-roihu.md#nodes) | +|-------------------|-----------------|------------|--------|---------------|-----------------------------------------| +| `gpuinteractive` | G | 12 hours | 1 | 1 per job | GPU ([slice](#roihu-gpu-slices)) | + #### Roihu GPU slices -Roihu `gpuinteractive` partition features GH200 superchips that are divided -into a total of 48 smaller slices that have one-seventh of the compute capacity -and one-eighth of the GPU memory capacity (12 GiB) of a full GH200 superchip. +The Roihu `gpuinteractive` partition uses GH200 superchips divided into 48 smaller slices. +Each slice has one-seventh of the compute capacity and one-eighth of the GPU memory capacity (12 GiB) of a full GH200 superchip. -### Roihu visualization partition +#### Vizinteractive -Additionally, Roihu features the following partition for interactive use and for visualizing data with specialized hardware: +Roihu also features the following partition for interactive use and data visualization with specialized hardware: | Partition | Allocation type | Time limit | Nodes | Max GPUs | [Node types](../systems-roihu.md#nodes) | |------------------|-----------------|------------|-------|-----------|-----------------------------------------| | `vizinteractive` | G | 12 hours | 1 | 2 per job | V | -Each node in the partition has 2 Nvidia L40 GPUs with 44 GiB of memory. Each reserved GPU grants access to up to 32 CPU cores and 183 GiB of CPU memory. +Each node in the partition has 2 Nvidia L40 GPUs with 44 GiB of memory and a 64-core AMD EPYC 9335 CPU. +Each reserved GPU grants access to up to 32 CPU cores and 183 GiB of CPU memory. ### Roihu pilot partitions -In addition to the regular partitions, the following partitions are also +In addition to the regular partitions, the following partitions are available during the Roihu pilot phase: | Partition | Allocation type | Time limit | Nodes | Max CPUs | Max GPUs | [Node types](../systems-roihu.md#nodes) | @@ -116,10 +137,9 @@ available during the Roihu pilot phase: Local storage on Roihu M, L and GPU nodes is meant for storing temporary files only, not high-performance I/O. -High-performance local storage is available on Roihu XL and V nodes. Ideal for I/O-intensive jobs. +High-performance local storage is available on Roihu XL and V nodes, which is ideal for I/O-intensive jobs. -The available local storage that a single user can access in their jobs depends -on the system [partition](#roihu-partitions) they use: +The amount of local storage available to a single user depends on the [partition](#roihu-partitions) used: | Allocation type | Quota per user | |:-------------------|---------------:| From 6c8a462e19c334aa7dbd42033bfe09db562c8739 Mon Sep 17 00:00:00 2001 From: leopekkas Date: Thu, 21 May 2026 11:10:20 +0300 Subject: [PATCH 13/13] Add R/W speeds of local storage --- docs/computing/running/batch-job-partitions.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/computing/running/batch-job-partitions.md b/docs/computing/running/batch-job-partitions.md index 2e2f6539ea..d46052b1b5 100644 --- a/docs/computing/running/batch-job-partitions.md +++ b/docs/computing/running/batch-job-partitions.md @@ -135,19 +135,19 @@ available during the Roihu pilot phase: ### Local storage on Roihu nodes -Local storage on Roihu M, L and GPU nodes is meant for storing temporary files only, not high-performance I/O. +Local storage on Roihu M, L, and GPU nodes is meant for storing temporary files only, not high-performance I/O. High-performance local storage is available on Roihu XL and V nodes, which is ideal for I/O-intensive jobs. The amount of local storage available to a single user depends on the [partition](#roihu-partitions) used: -| Allocation type | Quota per user | -|:-------------------|---------------:| -| R (shared nodes) | 20 GiB | -| N (full nodes) | 600 GiB | -| G (GPU nodes) | 150 GiB | -| Hugemem (XL) nodes | 1.6 TiB | -| VIZ nodes | 6.5 TiB | +| Allocation type | Quota per user | Read / Write speeds | +|:-------------------|---------------:|---------------------| +| R (shared nodes) | 20 GiB | 5000 / 1400 MB/s | +| N (full nodes) | 600 GiB | 5000 / 1400 MB/s | +| G (GPU nodes) | 150 GiB | 5000 / 1400 MB/s | +| Hugemem (XL) nodes | 1.6 TiB | 6700 / 4000 MB/s | +| VIZ nodes | 6.5 TiB | 6700 / 4000 MB/s | Read more about: [Local storage on Roihu nodes](../disk.md#temporary-local-disk-areas)