-
Go here and enter your email. You'll receieve a donload link in your email.
-
Once you've successfuly downloaded the
Anaconda3-2024.10-1-Linux-x86_64.shfile, install it with,$ bash Anaconda3-2024.10-1-Linux-x86_64.sh
and follow the given prompts.
-
Upgrade conda
$ conda upgrade conda -
Install prerequisites and CGCNN
$ conda create -n cgcnn python=3 scikit-learn pytorch torchvision pymatgen -c pytorch -c conda-forge -
Activate
cgcnnconda environment$ conda activate cgcnn$ conda install -c conda-forge mp-api -
Clone the cgcnn repo from github;
$ git clone https://github.com/txie-93/cgcnn.git
-
Go to the cloned repository
$ cd cgcnn -
Inside the
datadirectory, create a new directory with anyname, in this caseformation_energy
$ cd data
$ mkdir formation_energy-trained
$ cd formation_energy
and inside it have the following files:
1. CIF files having the format ID.cif
2. id_prop.csv which contains material id and property we want to predict in the first and second column respectively.
3. atom_init.json a JSON file that stores the initialization vector for each element. This data/sample-regression/atom_init.json file should be good for most applications.
-
The
CIFandid_prop.csvfiles can be obtained by editinggenerateCIF.pyfile (which can be downloaded from here) appropriately to suit your needs and then running;$ python3 generateCIF.py -
The
atom_init.jsonfile can be obtained by;$ cp ../sample-regression/atom_init.json .
-
Create a
Utildirectory at the root of thecgcnndirectory to store utility scripts and data.$ mkdir Util -
In the
data/formation_energydirectory, move thegenerateCIF.pyandfull_dataset.csvfiles to the newly createdUtildirectory.$ mv generateCIF.py ../../Util/$ mv full_dataset.csv ../../Util/
-
To train the model;
$ python main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 path/to/data/formation_energy 2>&1 | tee training.log -
To proceed with the calculation after an interuption;
$ python3 main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 data/formation_energy/ --resume checkpoint.pth.tar 2>&1 | tee training.log -
After training, you will get three files in
cgcnndirectory;model_best.pth.tar: stores the CGCNN model with the best validation accuracy.checkpoint.pth.tar: stores the CGCNN model at the last epoch.test_results.csv: stores the ID, target value, and predicted value for each crystal in test set.
-
In the
datadirectory create a new directory (in this caseformation_energy_prediction) and inside it have theCIFfiles, theatom_init.jsonand theid_prop.csvfile. -
Run the following command at the root of
cgcnndirectory;$ python3 predict.py post-processing/model_best.pth.tar data/formation_energy_prediction/ 2>&1 | tee formation_energy_prection.log -
All the output from running the training and prediction is contained in the
post-processingdirectory. -
After predicting, you will get one file in
cgcnndirectory:test_results.csv: stores the ID, target value, and predicted value for each crystal in test set. Here the target value is just any number that you set while defining the dataset inid_prop.csv, which is not important.
For more info see the github repo by Tian Xie here and the article on arXiv .