A successful execution of the following tests will verify the installation, linking, and multi-threading capabilities of OpenBLAS on the new RISE software stack.
Two C programs are provided for testing:
openblas_sanity.c- Verifies C-header availability, library linking, and basic math correctness.openblas_benchmark.c- Tests matrix multiplication performance (GFLOP/s) and OpenMP threading.
Unlike Python packages, OpenBLAS is a compiled C/Fortran library. We do not need a virtual environment, but we do need the gcc compiler and pkg-config to locate the library files.
Load the required modules:
# Clear existing modules
module purge
# Load the RISE software stack, compiler, and OpenBLAS
module use /storage/icds/sw8/modulefiles_rc2026/linux-rhel8-x86_64/Core
module load gcc
module load openblas/0.3.30
This program compiles a simple 2x2 matrix multiplication using the cblas_dgemm function. It verifies that the compiler can find cblas.h, that the linker can find libopenblas.so, and that the underlying math engine is calculating correctly.
- Compile the code: (We use pkg-config to automatically inject the correct include and library paths).
gcc openblas_sanity.c -o openblas_sanity $(pkg-config --cflags --libs openblas)
- Run the executable:
./openblas_sanity
Expected Output:
--- OpenBLAS Sanity Check ---
OpenBLAS Config: OpenBLAS 0.3.30 DYNAMIC_ARCH NO_AFFINITY USE_OPENMP USE_LOCKING Haswell MAX_THREADS=512
Detected CPU Core: Haswell
Performing 2x2 DGEMM (Matrix Multiplication)...
Result Matrix:
[ 19.0 22.0]
[ 43.0 50.0]
OpenBLAS is linked and calculating correctly!
Because OpenBLAS is designed for High-Performance Computing, this test allocates large (2000x2000) matrices and measures the GFLOP/s (Giga-Floating Point Operations per Second).
Because this OpenBLAS module is compiled with USE_OPENMP, it relies on the OMP_NUM_THREADS environment variable to control its multi-threading. We will test it using 4 threads.
- Compile the code:
gcc openblas_benchmark.c -o openblas_benchmark $(pkg-config --cflags --libs openblas)
- Set the thread count and run:
export OMP_NUM_THREADS=4
./openblas_benchmark
Expected Output: (Note: Your exact Time and GFLOP/s will vary depending on the specific CPU architecture of your current compute node).
--- OpenBLAS Performance & Threading Test ---
Allocating 2000x2000 matrices...
Threads: 4 | Time: 0.180 sec | Performance: 88.89 GFLOP/s
Threads: 4 | Time: 0.182 sec | Performance: 87.91 GFLOP/s
Threads: 4 | Time: 0.185 sec | Performance: 86.49 GFLOP/s
Threads: 4 | Time: 0.190 sec | Performance: 84.21 GFLOP/s
Benchmark complete!
Once testing is complete, you can remove the compiled executables:
rm openblas_sanity openblas_benchmark