Skip to content

Finetuning the pretrained model chemical_system_energy_above_the hull on own property data #231

@cypherumang65

Description

@cypherumang65

Describe the bug
I am trying to fintune my own property data with the model chemical_system_energy_above_the_hull.

For the yaml config I am using dft_mag density one as it was mentioned if its float valued one then I can take it for my use.

Script I am running
source /scratch/e1554701/mattergen/.venv/bin/activate

export PROPERTY=ionic_conductivity_S_cm
export HYDRA_FULL_ERROR=1

mattergen-finetune adapter.pretrained_name=chemical_system_energy_above_hull data_module=alex_mp_20 +lightning_module/diffusion_module/model/property_embeddings@adapter.adapter.property_embeddings_adapt.$PROPERTY=$PROPERTY ~trainer.logger data_module.properties=["$PROPERTY"]

Output last lines

Sanity Checking: 0it [00:00, ?it/s]
Sanity Checking: 0%| | 0/1 [00:00<?, ?it/s]
Sanity Checking DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]q0.dtype: torch.float64
q0.dtype: torch.float64

Error
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/lightning_fabric/init.py:36: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import("pkg_resources").declare_namespace(name)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'finetune': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:67: UserWarning: Starting from v1.9.0, tensorboardX has been removed as a dependency of the pytorch_lightning package, due to potential conflicts with other packages in the ML ecosystem. For this reason, logger=True will use CSVLogger as the default logger, unless the tensorboard or tensorboardX packages are found. Please pip install lightning[extra] or one of them to enable TensorBoard support by default
warning_cache.warn(
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/lightning_fabric/init.py:36: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import("pkg_resources").declare_namespace(name)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'finetune': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [GPU-1004b653-b1d3-9674-3428-e0f2c6ffb7e6,GPU-1811d176-d6fe-7645-784b-f42b78cc1625]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [GPU-1004b653-b1d3-9674-3428-e0f2c6ffb7e6,GPU-1811d176-d6fe-7645-784b-f42b78cc1625]
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "

Fitting property scalers: 0%| | 0/3 [00:00<?, ?it/s]
Fitting property scalers: 0%| | 0/3 [00:00<?, ?it/s]
Fitting property scalers: 67%|██████▋ | 2/3 [00:00<00:00, 18.54it/s]
Fitting property scalers: 67%|██████▋ | 2/3 [00:00<00:00, 18.77it/s]
Fitting property scalers: 100%|██████████| 3/3 [00:00<00:00, 27.15it/s]
Fitting property scalers: 100%|██████████| 3/3 [00:00<00:00, 27.49it/s]

| Name | Type | Params

0 | diffusion_module | DiffusionModule | 48.8 M

48.8 M Trainable params
22 Non-trainable params
48.8 M Total params
195.042 Total estimated model params size (MB)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:432: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 112 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/init.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
_C._set_default_tensor_type(t)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/init.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
_C._set_default_tensor_type(t)
Error executing job with overrides: ['adapter.pretrained_name=chemical_system_energy_above_hull', 'data_module=alex_mp_20', '+lightning_module/diffusion_module/model/property_embeddings@adapter.adapter.property_embeddings_adapt.ionic_conductivity_S_cm=ionic_conductivity_S_cm', '~trainer.logger', 'data_module.properties=[ionic_conductivity_S_cm]']
Error executing job with overrides: ['adapter.pretrained_name=chemical_system_energy_above_hull', 'data_module=alex_mp_20', '+lightning_module/diffusion_module/model/property_embeddings@adapter.adapter.property_embeddings_adapt.ionic_conductivity_S_cm=ionic_conductivity_S_cm', '~trainer.logger', 'data_module.properties=[ionic_conductivity_S_cm]']
Traceback (most recent call last):
File "/scratch/e1554701/mattergen/.venv/bin/mattergen-finetune", line 10, in
sys.exit(mattergen_finetune())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report
raise ex
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report
return func()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/scratch/e1554701/mattergen/mattergen/scripts/finetune.py", line 141, in mattergen_finetune
trainer.fit(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 529, in fit
call._call_and_handle_interrupt(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 41, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 91, in launch
return function(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 973, in _run
results = self._run_stage()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1014, in _run_stage
self._run_sanity_check()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1043, in _run_sanity_check
val_loop.run()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 291, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 336, in validation_step
return self.model(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 102, in forward
return self._forward_module.validation_step(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/double.py", line 48, in validation_step
return self.module.validation_step(
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 142, in validation_step
return self._calc_loss(val_batch, False)
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 149, in _calc_loss
loss, metrics = self.diffusion_module.calc_loss(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 79, in calc_loss
noisy_batch, t = self._corrupt_batch(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 115, in _corrupt_batch
noisy_batch = self.corruption.sample_marginal(batch, t)
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 112, in sample_marginal
noisy_data = self._apply_corruption_fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 149, in _apply_corruption_fn
return apply(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 164, in apply
return {
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 165, in
field_name: fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 105, in sample_marginal
logits = self.marginal_prob(x=x, t=t, batch_idx=batch_idx, batch=batch)[0]
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 59, in marginal_prob
_, logits = d3pm.q_sample(
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 703, in q_sample
logits = diffusion.get_qt_given_q0(q0=x_start, t=t, return_logits=True)
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 528, in get_qt_given_q0
assert q0.dtype == torch.float32
AssertionError
Traceback (most recent call last):
File "/scratch/e1554701/mattergen/.venv/bin/mattergen-finetune", line 10, in
sys.exit(mattergen_finetune())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report
raise ex
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report
return func()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/scratch/e1554701/mattergen/mattergen/scripts/finetune.py", line 141, in mattergen_finetune
trainer.fit(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 529, in fit
call._call_and_handle_interrupt(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 41, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 91, in launch
return function(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 973, in _run
results = self._run_stage()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1014, in _run_stage
self._run_sanity_check()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1043, in _run_sanity_check
val_loop.run()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 291, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 336, in validation_step
return self.model(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 102, in forward
return self._forward_module.validation_step(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/double.py", line 48, in validation_step
return self.module.validation_step(
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 142, in validation_step
return self._calc_loss(val_batch, False)
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 149, in _calc_loss
loss, metrics = self.diffusion_module.calc_loss(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 79, in calc_loss
noisy_batch, t = self._corrupt_batch(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 115, in _corrupt_batch
noisy_batch = self.corruption.sample_marginal(batch, t)
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 112, in sample_marginal
noisy_data = self._apply_corruption_fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 149, in _apply_corruption_fn
return apply(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 164, in apply
return {
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 165, in
field_name: fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 105, in sample_marginal
logits = self.marginal_prob(x=x, t=t, batch_idx=batch_idx, batch=batch)[0]
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 59, in marginal_prob
_, logits = d3pm.q_sample(
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 703, in q_sample
logits = diffusion.get_qt_given_q0(q0=x_start, t=t, return_logits=True)
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 528, in get_qt_given_q0
assert q0.dtype == torch.float32
AssertionError

**Desktop **
linux hpc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions