The purification algorithms implented in this library have been published in JCTC.
In this project, we are bundling essential components of the quantum chemical workflow that are not part of the electronic structure method, but rather the "machinery" surrounding it. Our focus is to adapt existing algorithms (and in the future maybe new ones) to use the computational power of Graphics Processing Units (GPUs). In an effort to "democratize" scientific computation on GPUs, we focus on implementations that perform well on so-called "Gaming" GPUs in contrast to General Purpose GPUs. To target this hardware, we need to limit the number of FP64 (double precision) operations in favor of FP32 (single precision) operations. Currently, we provide C++ implementations of density matrix purification techniques and what we call "Mixers", which are techniques to accelerate the SCF convergence.
They can be accessed through a C style interface, which also allows linking with FORTRAN and other programming languages. We are working on a FORTRAN interface.
To achieve GPU-offload, we are using the LAHVA library.
We test the implementation for a permutation of the following operating systems, compilers and BLAS/LAPACK implementations with and without GPU support:
| Operating System | Compiler | CPU BLAS/LAPACK | CUDA |
|---|---|---|---|
| Ubuntu 20.04 | intel oneAPI 2023.2.0 | intel oneMKL 2023.2.0 | 11.8 |
| gcc-9 | OpenBLAS |
Build system: meson (v. 1.4.0), cmake (> 3.18) Build generator: ninja, make
We also provide apptainer recipes to use for building and deployment purposes. You can find them in the subfolder apptainer_recipe.
Currently, we support both meson and CMake build system.
First of all, LAHVA can be compiled with and without GPU support (default is with GPU support, nvidia only).
This behavior is set by -Dgpu=true or within the meson_options.txt file.
If you are planning to use an nvidia GPU, you will need the compute capability of your GPU or the range of GPUs that the software should be deployed to. One resource to find out this value is techpowerup. You can search for your hardware and find the compute capability (cc) under Graphics Features then CUDA. The cc value is given with a . between both digits. However, when changing the value of gpu_arch in meson_options.txt remove the . and fill in the cc values of all cards that will be used with the program in the array.
Next you should take care that meson is able to find your CUDA installation. The easiest way to take care of this is to set the CUDA_ROOT environment variable used by meson. It should point to the root of your CUDA installation path, i.e. /mnt/group-lib/nvidia-hpc-sdk/Linux_x86_64/24.7/cuda/11.8/. When you are using a non-standard installation path for CUDA if you are on a shared HPC or other system, it can be necessary to also set the paths for libcudart the CUDA runtime library and other libraries such as libcublas or libcusolver. YOu can achieve this by setting or extending the LIBRARY_PATH environment variable. For example: export LIBRARY_PATH=/mnt/group-lib/nvidia-hpc-sdk/Linux_x86_64/24.7/cuda/11.8/lib64:$LIBRARY_PATH. Finally, if you have several nvcc versions installed it might be helpful to set the path of nvcc also explicitly.
Now that the compile environment is setup for GPU compilation, we need to setup the meson build. Optional arguments are: the lapack vendor (options: mkl, openblas; default: auto)
meson setup _build -Dgpu=true [optional: -Dlapack=mkl,openblas]After the setup, we can compile LAHVA like so:
meson compile -C _build Lastly, we can test the library using the provided unit tests.
meson test -C _build We are currently not providing pre-built libraries, as the project should be compiled on the system due to the dependencies of CUDA, BLAS etc.
Please consider having a look at our use of GAMBITs in the tblite codebase. Currently not yet featured in the original repository, but here is a link to a public fork of one of the authors.
We provide a header file purification.h, which acts as interface to our implementation.
First the user should setup the method using the PurifiyFockSetup function, where the actual DMP method is chosen by an enumerator. The numerical precision and the runmode (where are the computations executed, i.e. CPU or GPU) are also set using enumerators.
After generating the pointer to the Purification method, the pointer is given to any subsequent call to the library.
Currently, we still need a transformation matrix to transform the Fock and density matrix from AO to MO basis and vice-versa.
However, we are working on using the Newton-Schulz procedure to compute the inverse of the overlap matrix.
The transformation matrix is provided to the library as a pointer, using SeTransformationMatrix.
After these setup steps, we are ready to generate a density matrix based on a given Fock matrix.
For that, we call GetDensityAO, providing the number of electrons, the Fock matrix, and a pointer to store the density matrix in. In addition to that we added, a pointer to an int, this is to inform the calling program that the purification has failed or succeeded.
We also provide a call to compute the nuclear gradients using the energy-weighted density matrix, GetEnergyWDensityAO.
To reuse, the DMP object continuously in a geometry optimization or MD simulation, we added the Reset call. It resets certain internal variables. The most important is the switch for the incremental AO-MO transform, it triggers a new full AO-MO transform based on a new transformation for the new geometry. So after a Reset call, the new transformation matrix should be set again.
At the end, the calling program should delete the pointer to the C++ object using DeletePointer.
Pit Steinbach - Lead, Purification Schemes Mark Heezen - Mixer
Christoph Bannwarth - Funding and Supervision
The status should be considered experimental, although we are committed to keep the APi stable, changes could occur.