-
Notifications
You must be signed in to change notification settings - Fork 303
Description
Describe the bug
I am trying to fintune my own property data with the model chemical_system_energy_above_the_hull.
For the yaml config I am using dft_mag density one as it was mentioned if its float valued one then I can take it for my use.
Script I am running
source /scratch/e1554701/mattergen/.venv/bin/activate
export PROPERTY=ionic_conductivity_S_cm
export HYDRA_FULL_ERROR=1
mattergen-finetune adapter.pretrained_name=chemical_system_energy_above_hull data_module=alex_mp_20 +lightning_module/diffusion_module/model/property_embeddings@adapter.adapter.property_embeddings_adapt.$PROPERTY=$PROPERTY ~trainer.logger data_module.properties=["$PROPERTY"]
Output last lines
Sanity Checking: 0it [00:00, ?it/s]
Sanity Checking: 0%| | 0/1 [00:00<?, ?it/s]
Sanity Checking DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]q0.dtype: torch.float64
q0.dtype: torch.float64
Error
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/lightning_fabric/init.py:36: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import("pkg_resources").declare_namespace(name)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'finetune': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:67: UserWarning: Starting from v1.9.0, tensorboardX has been removed as a dependency of the pytorch_lightning package, due to potential conflicts with other packages in the ML ecosystem. For this reason, logger=True will use CSVLogger as the default logger, unless the tensorboard or tensorboardX packages are found. Please pip install lightning[extra] or one of them to enable TensorBoard support by default
warning_cache.warn(
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/lightning_fabric/init.py:36: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import("pkg_resources").declare_namespace(name)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'finetune': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [GPU-1004b653-b1d3-9674-3428-e0f2c6ffb7e6,GPU-1811d176-d6fe-7645-784b-f42b78cc1625]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [GPU-1004b653-b1d3-9674-3428-e0f2c6ffb7e6,GPU-1811d176-d6fe-7645-784b-f42b78cc1625]
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "
Fitting property scalers: 0%| | 0/3 [00:00<?, ?it/s]
Fitting property scalers: 0%| | 0/3 [00:00<?, ?it/s]
Fitting property scalers: 67%|██████▋ | 2/3 [00:00<00:00, 18.54it/s]
Fitting property scalers: 67%|██████▋ | 2/3 [00:00<00:00, 18.77it/s]
Fitting property scalers: 100%|██████████| 3/3 [00:00<00:00, 27.15it/s]
Fitting property scalers: 100%|██████████| 3/3 [00:00<00:00, 27.49it/s]
| Name | Type | Params
0 | diffusion_module | DiffusionModule | 48.8 M
48.8 M Trainable params
22 Non-trainable params
48.8 M Total params
195.042 Total estimated model params size (MB)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:432: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 112 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/init.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
_C._set_default_tensor_type(t)
/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/init.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
_C._set_default_tensor_type(t)
Error executing job with overrides: ['adapter.pretrained_name=chemical_system_energy_above_hull', 'data_module=alex_mp_20', '+lightning_module/diffusion_module/model/property_embeddings@adapter.adapter.property_embeddings_adapt.ionic_conductivity_S_cm=ionic_conductivity_S_cm', '~trainer.logger', 'data_module.properties=[ionic_conductivity_S_cm]']
Error executing job with overrides: ['adapter.pretrained_name=chemical_system_energy_above_hull', 'data_module=alex_mp_20', '+lightning_module/diffusion_module/model/property_embeddings@adapter.adapter.property_embeddings_adapt.ionic_conductivity_S_cm=ionic_conductivity_S_cm', '~trainer.logger', 'data_module.properties=[ionic_conductivity_S_cm]']
Traceback (most recent call last):
File "/scratch/e1554701/mattergen/.venv/bin/mattergen-finetune", line 10, in
sys.exit(mattergen_finetune())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report
raise ex
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report
return func()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/scratch/e1554701/mattergen/mattergen/scripts/finetune.py", line 141, in mattergen_finetune
trainer.fit(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 529, in fit
call._call_and_handle_interrupt(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 41, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 91, in launch
return function(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 973, in _run
results = self._run_stage()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1014, in _run_stage
self._run_sanity_check()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1043, in _run_sanity_check
val_loop.run()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 291, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 336, in validation_step
return self.model(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 102, in forward
return self._forward_module.validation_step(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/double.py", line 48, in validation_step
return self.module.validation_step(
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 142, in validation_step
return self._calc_loss(val_batch, False)
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 149, in _calc_loss
loss, metrics = self.diffusion_module.calc_loss(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 79, in calc_loss
noisy_batch, t = self._corrupt_batch(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 115, in _corrupt_batch
noisy_batch = self.corruption.sample_marginal(batch, t)
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 112, in sample_marginal
noisy_data = self._apply_corruption_fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 149, in _apply_corruption_fn
return apply(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 164, in apply
return {
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 165, in
field_name: fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 105, in sample_marginal
logits = self.marginal_prob(x=x, t=t, batch_idx=batch_idx, batch=batch)[0]
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 59, in marginal_prob
_, logits = d3pm.q_sample(
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 703, in q_sample
logits = diffusion.get_qt_given_q0(q0=x_start, t=t, return_logits=True)
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 528, in get_qt_given_q0
assert q0.dtype == torch.float32
AssertionError
Traceback (most recent call last):
File "/scratch/e1554701/mattergen/.venv/bin/mattergen-finetune", line 10, in
sys.exit(mattergen_finetune())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report
raise ex
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report
return func()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/scratch/e1554701/mattergen/mattergen/scripts/finetune.py", line 141, in mattergen_finetune
trainer.fit(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 529, in fit
call._call_and_handle_interrupt(
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 41, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 91, in launch
return function(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 973, in _run
results = self._run_stage()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1014, in _run_stage
self._run_sanity_check()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1043, in _run_sanity_check
val_loop.run()
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 291, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 336, in validation_step
return self.model(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 102, in forward
return self._forward_module.validation_step(*inputs, **kwargs)
File "/scratch/e1554701/mattergen/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/double.py", line 48, in validation_step
return self.module.validation_step(
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 142, in validation_step
return self._calc_loss(val_batch, False)
File "/scratch/e1554701/mattergen/mattergen/diffusion/lightning_module.py", line 149, in _calc_loss
loss, metrics = self.diffusion_module.calc_loss(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 79, in calc_loss
noisy_batch, t = self._corrupt_batch(batch)
File "/scratch/e1554701/mattergen/mattergen/diffusion/diffusion_module.py", line 115, in _corrupt_batch
noisy_batch = self.corruption.sample_marginal(batch, t)
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 112, in sample_marginal
noisy_data = self._apply_corruption_fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 149, in _apply_corruption_fn
return apply(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 164, in apply
return {
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/multi_corruption.py", line 165, in
field_name: fn(
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 105, in sample_marginal
logits = self.marginal_prob(x=x, t=t, batch_idx=batch_idx, batch=batch)[0]
File "/scratch/e1554701/mattergen/mattergen/diffusion/corruption/d3pm_corruption.py", line 59, in marginal_prob
_, logits = d3pm.q_sample(
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 703, in q_sample
logits = diffusion.get_qt_given_q0(q0=x_start, t=t, return_logits=True)
File "/scratch/e1554701/mattergen/mattergen/diffusion/d3pm/d3pm.py", line 528, in get_qt_given_q0
assert q0.dtype == torch.float32
AssertionError
**Desktop **
linux hpc