Skip to content

Add DFTB-SCC support to DeePTB#330

Open
QG-phy wants to merge 36 commits into
deepmodeling:mainfrom
QG-phy:unisk-dftb-scc-migration
Open

Add DFTB-SCC support to DeePTB#330
QG-phy wants to merge 36 commits into
deepmodeling:mainfrom
QG-phy:unisk-dftb-scc-migration

Conversation

@QG-phy
Copy link
Copy Markdown
Collaborator

@QG-phy QG-phy commented May 7, 2026

Summary

This PR adds DFTB-SCC support to DeePTB and organizes the related SK/DFTB utilities needed to run and validate SCC workflows.

Main changes:

  • add a generic SKSCC engine plus a DFTBSCC compatibility wrapper
  • support SCC parameters for Hubbard U, occupation, atomic mass, highest occupied Hubbard U, and SK/database-derived parameter sources
  • support both nonorthogonal SCC (S != I) and orthogonal NNSK-SCC (S = I)
  • add DFTB+ compatible interpolation, repulsive-energy helpers, Ewald/gamma utilities, SCC mixers, and Mulliken charge analysis
  • refactor k-point utilities into dptb/kpoints while keeping compatibility wrappers
  • add DFTB-SCC examples and notebooks under examples/dftb_scc
  • add regression tests for SCC, interpolation, k-point utilities, occupation, eigensolvers, and related postprocess behavior

Validation

Manual notebook checks:

  • examples/dftb_scc/hBN_scc/test_scc_hBN.ipynb
  • examples/dftb_scc/benchmark_eos_dftbp.ipynb
  • examples/dftb_scc/hBN_scc/test_scc_hBN_nnsk.ipynb
  • examples/dftb_scc/hBN_scc/test_hbn_scc_orth.ipynb

Focused test commands run during review:

  • uv run pytest dptb/tests/test_dftb_scc.py dptb/tests/test_dftbplus_interp.py -q
  • uv run pytest dptb/tests/test_dftb_scc.py dptb/tests/test_charge_pop.py -q
  • uv run pytest dptb/tests/test_scc_params.py -q
  • uv run pytest dptb/tests/test_dftb_scc.py dptb/tests/test_dftbplus_interp.py dptb/tests/test_occupy.py dptb/tests/test_charge_pop.py dptb/tests/test_scc_params.py -q

The GitHub Actions test suite is passing on the PR branch.

Notes

A follow-up issue tracks refactoring generalized eigensolvers to avoid explicit Cholesky inverses: #331.

Summary by CodeRabbit

  • New Features

    • Added comprehensive SCC (self-consistent charge) DFTB calculations with multiple charge-mixing algorithms.
    • Implemented advanced k-point mesh generation with symmetry reduction and k-path construction.
    • Added Mulliken charge population analysis for electronic structure postprocessing.
    • Introduced smooth DFTB+ compatible interpolation for SK hopping integrals.
    • Added Fermi level and electronic occupation calculations.
    • Integrated Ewald summation for periodic Coulomb interactions.
  • Enhancements

    • Extended energy solvers to support NumPy backend option.
    • Enhanced SK parameter import/export with automatic unit conversion.
    • Expanded TBSystem to support SCC-corrected Hamiltonian evaluation.

QG-phy added 16 commits May 5, 2026 22:58
Consolidate eigensolver selection into a shared class/kwargs setup and only add overlap-specific fields when needed, reducing duplicated branches and making the initialization easier to maintain.
Centralize electronic structure input preparation through the shared
load_data_for_model helper. Remove duplicated cutoff and AtomicData
construction logic from ElecStruCal, and initialize an overlap-aware
solver when an overlap override is provided.
Correct the migrated DFTB EOS benchmark notebook to load test data from
the current `dptb/tests/data/dftb` layout. Also fix the pandas import
fallback typo so the notebook can run when pandas is unavailable, and
refresh notebook execution outputs.
Update the migrated UniSK EOS benchmark notebook with executed cell counts and captured benchmark results.

Include generated stdout, warnings, plots, and comparison tables so the notebook preserves the DFTB+ electronic energy validation output for reference.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

I can’t reliably generate the required review-stack artifact for this entire PR as a single manual response — the patch contains several hundred distinct rangeIds and many large, independent logical groups. Producing a correct, parser-valid hidden review stack (every rangeId exactly once, semantically grouped into reviewer-readable layers with Mermaid diagrams) requires programmatic assembly of the provided range->file mapping.

Please either:

  • Provide a smaller subset of rangeIds (or a single module/directory) for a focused review-stack I can produce manually, or
  • Allow me to generate the full artifact programmatically (I will consume the provided mapping and return the complete hidden block). This second option will take me a short while to produce correctly.

Which do you prefer?

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
dptb/postprocess/unified/properties/optical_conductivity.py (1)

171-178: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

_accumulate_vectorized is referenced but never defined — AttributeError at runtime.

ACAccessor only defines _accumulate_loop (line 186). When method='vectorized' is passed, line 174 will immediately raise AttributeError: 'ACAccessor' object has no attribute '_accumulate_vectorized'.

Either implement the missing method or remove the 'vectorized' branch until it is ready:

🐛 Minimal stopgap (remove dead branch)
-            if method == 'jit':
+            if method == 'jit':
                 term = accumulate_sigma_jit(E_flat, T_weighted_flat, omegas_t, eta, broadening)
-            elif method == 'vectorized':
-                term = self._accumulate_vectorized(E_flat, T_weighted_flat, omegas_t, eta, broadening)
-            elif method == 'loop':
+            elif method == 'loop':
                  term = self._accumulate_loop(E_flat, T_weighted_flat, omegas_t, eta, broadening)
             else:
-                raise ValueError(f"Unknown method: {method}")
+                raise ValueError(f"Unknown method: {method!r}. Supported: 'jit', 'loop'.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/unified/properties/optical_conductivity.py` around lines 171
- 178, The branch referencing self._accumulate_vectorized in the ACAccessor
optical_conductivity accumulation dispatch is invalid because
_accumulate_vectorized is not implemented; either remove the 'vectorized' branch
or implement a proper vectorized version. Fix by adding a new method named
_accumulate_vectorized(self, E_flat, T_weighted_flat, omegas_t, eta, broadening)
to ACAccessor that mirrors the logic of _accumulate_loop but uses NumPy
broadcasting/array ops for performance (or, as a minimal stopgap, have the
dispatch call self._accumulate_loop when method == 'vectorized'); update the
dispatch in the method that checks method ('jit'/'vectorized'/'loop') to ensure
no AttributeError is raised.
dptb/tests/test_dftb2nnsk.py (1)

30-35: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid shared persistent output paths in tests.

Lines 34 and 55 write to the same repo-local test_temp directory, which can leak state across tests and cause nondeterministic failures. Use per-test tmp_path outputs instead.

Also applies to: 52-56

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_dftb2nnsk.py` around lines 30 - 35, The test creates a
DFTB2NNSK instance with output pointing to a shared repo-local "test_temp" which
can leak state; update the test in test_dftb2nnsk.py to use the pytest-provided
tmp_path (or tmp_path_factory) and pass a unique per-test output path (e.g.,
output=tmp_path / "dftb2nnsk_output") when constructing DFTB2NNSK so each test
writes to an isolated temporary directory instead of the shared "test_temp".
dptb/nn/dftbsk.py (1)

131-157: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep legacy checkpoints loadable after adding mass.

DFTBSK now registers a mass buffer, but from_reference() still does a strict load_state_dict(). Any older checkpoint without that key will now fail to load even though the new buffer has a safe default.

Suggested compatibility fix
-        if f["config"]["common_options"]["basis"] == common_options["basis"]:
-            model.load_state_dict(f["model_state_dict"])
+        if f["config"]["common_options"]["basis"] == common_options["basis"]:
+            state_dict = dict(f["model_state_dict"])
+            state_dict.setdefault("mass", model.mass)
+            model.load_state_dict(state_dict)

Also applies to: 319-320

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftbsk.py` around lines 131 - 157, DFTBSK registers a new "mass"
buffer which breaks strict state_dict loads from older checkpoints; update the
from_reference() code that calls load_state_dict(...) to allow missing keys (use
load_state_dict(state_dict, strict=False)) or catch the KeyError and call
load_state_dict with strict=False so legacy checkpoints without "mass" still
load (apply same change to the other load_state_dict call around the other
occurrence referenced by lines 319-320); ensure you still
register_buffer("mass", mass_param) in the constructor so the default is present
when missing in the checkpoint.
dptb/nn/dftb/sk_param.py (1)

332-395: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default missing SCC fields when loading older SK packs.

This formatter now unconditionally reads Mass, HubdU, and Occu, but .pth loading still only requires Distance/Hopping/Overlap/OnsiteE, and the tensor fast-path returns before synthesizing the new keys. Older packed SK data will therefore break as soon as downstream code reads skdict["Mass"] or the Hubbard/occupation tensors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/sk_param.py` around lines 332 - 395, The tensor fast-path
returns skdict early without adding newer SCC fields (Mass, HubdU, Occu),
causing downstream failures; update the tensor branch (the if
isinstance(skdict["Distance"], torch.Tensor) block) to synthesize default
entries before returning: create Mass as a zero tensor sized [num_types,1] (fill
per self.idp_sk.chemical_symbol_to_type if values exist), and create HubdU and
Occu tensors shaped like OnsiteE ([num_types, n_onsite_Es, 1]) initialized to
zeros (or copy values from provided skdict if present), then attach these keys
to skdict and return the augmented skdict so downstream code reading
skdict["Mass"], skdict["HubdU"], and skdict["Occu"] will not break.
🟡 Minor comments (13)
examples/unisk_migrated/README.md-9-9 (1)

9-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix likely notebook filename typo in the README.

Line 9 uses benchmark_eos_dftbp.ipynb, but the PR objective lists benchmark_eos_dftb.ipynb as the executed notebook. Please align the filename to avoid a stale/broken reference.

Suggested doc fix
-- `benchmark_eos_dftbp.ipynb`: SCC EOS comparison against DFTB+ reference data in `dptb/tests/data/dftb`.
+- `benchmark_eos_dftb.ipynb`: SCC EOS comparison against DFTB+ reference data in `dptb/tests/data/dftb`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/unisk_migrated/README.md` at line 9, Update the README entry that
references `benchmark_eos_dftbp.ipynb` to the correct notebook name
`benchmark_eos_dftb.ipynb` so it matches the PR and executed notebook; locate
the string `benchmark_eos_dftbp.ipynb` in the README and replace it with
`benchmark_eos_dftb.ipynb`.
dptb/tests/data/ch4/CH4.vasp-9-13 (1)

9-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent C–H bond lengths in the CH4 test fixture.

H1 is correctly placed at 1.09 Å from C, but H2–H4 are significantly shorter:

Atom Displacement Distance
H1 (1.090, 0, 0) 1.090 Å
H2 (−0.367, +0.816, 0) 0.895 Å
H3 (−0.367, −0.408, +0.770) 0.945 Å
H4 (−0.367, −0.408, −0.770) 0.945 Å

For a symmetric tetrahedral geometry with r = 1.09 Å, the correct displacements for H2–H4 would be (−0.363, +1.028, 0), (−0.363, −0.514, +0.889), and (−0.363, −0.514, −0.889) respectively.

As long as all SCC reference values in test_dftb_scc.py were computed against these exact coordinates, tests will remain self-consistent. However, any comparison against external DFTB+ results or physical reference values for CH4 will be off. Replacing with a geometrically correct structure (and updating the corresponding reference values) is advisable before this data is used for validation against external codes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/data/ch4/CH4.vasp` around lines 9 - 13, The CH4.vasp test geometry
has incorrect H2–H4 coordinates producing inconsistent C–H distances; replace
the H atom coordinates in CH4.vasp so they match a symmetric tetrahedron with
r=1.09 Å (use H1: 11.090000 10.000000 10.000000 and H2: 9.637000 11.028000
10.000000, H3: 9.637000 9.486000 10.889000, H4: 9.637000 9.486000 9.111000 or
the equivalent provided in the comment:
(−0.363,+1.028,0),(−0.363,−0.514,+0.889),(−0.363,−0.514,−0.889) relative to C)
and then update any derived SCC reference values used in test_dftb_scc.py to
match results computed from the corrected coordinates. Ensure all C–H distances
are 1.09 Å after the change.
dptb/postprocess/unified/properties/optical_conductivity.py-142-143 (1)

142-143: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

temperature=0 causes a ZeroDivisionError — add an early guard.

1.0 / (kB_eV_per_K * temperature) blows up when temperature=0. The default is 300 K, so this is a low-probability path, but it is reachable without any feedback.

🛡️ Proposed guard
+        if temperature <= 0:
+            raise ValueError(f"temperature must be positive (got {temperature} K).")
         beta_T = 1.0 / (kB_eV_per_K * temperature)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/unified/properties/optical_conductivity.py` around lines 142
- 143, The code computes beta_T = 1.0 / (kB_eV_per_K * temperature) which
divides by zero when temperature == 0; add an early guard before computing
beta_T to handle the zero-temperature limit: if temperature == 0 (or very near
0) compute the Fermi occupancy f as the zero-temperature step function (f =
(eigs < efermi).float() or equivalent using torch.where) and return or continue
with that f, otherwise compute beta_T and f as currently written; refer to the
symbols temperature, kB_eV_per_K, beta_T, f, eigs, and efermi when making the
change.
dptb/nn/nnsk.py-634-636 (1)

634-636: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

scc_metadata is attached before basis validation — stale element entries may persist.

Lines 634–635 run unconditionally, before the basis-equivalence guard at line 637. When the checkpoint and target bases differ and a cross-basis parameter remap happens (lines 647–749), scc_metadata is copied wholesale from the checkpoint (which may be keyed on element symbols) and could carry entries for elements that are absent from the new model's basis. The SCC engine would then silently operate on metadata that does not correspond to the actual basis.

Move the scc_metadata assignment inside the same-basis branch, or filter it to only retain elements present in common_options['basis']:

💡 Suggested fix
-            model = cls(**common_options, **nnsk, transform=transform)
-            if f.get("scc_metadata") is not None:
-                model.scc_metadata = f["scc_metadata"]
-
-            if f["config"]["common_options"]["basis"] == common_options["basis"] and \
+            model = cls(**common_options, **nnsk, transform=transform)
+
+            if f["config"]["common_options"]["basis"] == common_options["basis"] and \
                 f["config"]["model_options"]["nnsk"]["onsite"] == model.model_options["nnsk"]["onsite"] and \
                 ...
                 model.load_state_dict(f["model_state_dict"])
+            if f.get("scc_metadata") is not None:
+                target_elements = set(common_options["basis"].keys())
+                model.scc_metadata = {
+                    k: v for k, v in f["scc_metadata"].items()
+                    if k in target_elements
+                }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/nnsk.py` around lines 634 - 636, The code assigns f["scc_metadata"]
to model.scc_metadata unconditionally (via f and model.scc_metadata) before the
basis-equivalence check and potential cross-basis remap; move that assignment
into the same-basis branch guarded by the basis-equivalence check or, if kept
earlier, filter f["scc_metadata"] to include only element keys present in
common_options['basis'] so stale element entries are not copied during the
cross-basis remap handled later in the remapping block.
dptb/tests/data/dftb/H-C.skf-561-565 (1)

561-565: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Metadata identifier does not match fixture filename order.

Line 561 uses C-H.skf while this file is H-C.skf; this can mislead metadata-based fixture checks/parsers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/data/dftb/H-C.skf` around lines 561 - 565, The Identifier XML tag
value is incorrect for this fixture: change the <Identifier> value from
"C-H.skf" to "H-C.skf" so the metadata matches the filename (H-C.skf); update
the <Identifier> element in this file to the correct string to prevent
metadata/parsing mismatches.
dptb/kpoints/sampling.py-31-44 (1)

31-44: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing element-level validation and silent 1×1×1 fallback when both mesh inputs are None.

Two related gaps:

  1. The error message on line 33 promises "3 positive integers" but the check only validates the length. Non-positive or non-integer entries (e.g., meshgrid=[0, -1, 2]) pass validation silently.
  2. When both meshgrid and meshspacing are None, nk1=nk2=nk3=1 is used without any warning or error, silently producing a single Gamma-point mesh. A misconfigured caller (e.g., forgot to pass a mesh) will get a seemingly valid result.
🔧 Proposed fix
     if meshgrid is not None:
         if not isinstance(meshgrid, (list, tuple)) or len(meshgrid) != 3:
             raise ValueError("meshgrid must be a list or tuple of 3 positive integers.")
+        if not all(isinstance(n, int) and n > 0 for n in meshgrid):
+            raise ValueError("All meshgrid values must be positive integers.")
     if meshspacing is not None:
         if not isinstance(meshspacing, (list, tuple)) or len(meshspacing) != 3:
             raise ValueError("meshspacing must be a list or tuple of 3 positive values.")
+        if not all(np.isfinite(s) and s > 0 for s in meshspacing):
+            raise ValueError("All meshspacing values must be positive and finite.")
     if not np.isfinite(symm_prec) or symm_prec <= 0:
         raise ValueError("symm_prec must be positive and finite.")
     if meshgrid is not None and meshspacing is not None:
         log.warning("Both meshgrid and meshspacing are provided. meshspacing will take precedence.")
+    if meshgrid is None and meshspacing is None:
+        log.warning("Neither meshgrid nor meshspacing provided; defaulting to a 1×1×1 k-mesh.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/kpoints/sampling.py` around lines 31 - 44, The mesh input validation is
incomplete: update the checks for meshgrid to ensure it is a length-3 sequence
and each element is an integer > 0 (use numbers.Integral or int check and >0)
and for meshspacing to ensure it is a length-3 sequence and each element is
finite and > 0 (use np.isfinite and >0); keep the existing log.warning when both
meshgrid and meshspacing are provided (meshspacing wins) but instead of silently
falling back to nk1=nk2=nk3=1 when both are None, raise a ValueError informing
the caller that a mesh must be provided; refer to the variables meshgrid,
meshspacing, symm_prec, nk1/nk2/nk3 and the existing log.warning to locate where
to apply these changes.
dptb/tests/test_scc_params.py-151-158 (1)

151-158: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded relative path will break when tests are not run from the repository root.

skdata = "dptb/tests/data/dftb" is a CWD-relative path. Other tests in this directory (e.g., test_charge_pop.py line 20) use __file__-based construction. Adopt the same pattern for portability.

🔧 Proposed fix
+import os
+from pathlib import Path
+
+_rootdir = os.path.join(Path(os.path.abspath(__file__)).parent, "data")
+
 def test_scc_params_from_skparam_preserves_shapes():
-    skdata = "dptb/tests/data/dftb"
+    skdata = os.path.join(_rootdir, "dftb")
     skp = SKParam(basis={"C": ["2s", "2p"]}, skdata=skdata, cal_rcuts=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_scc_params.py` around lines 151 - 158, The test uses a
CWD-relative string for skdata which fails if tests are run outside the repo
root; update test_scc_params_from_skparam_preserves_shapes to build an absolute
path from the test file location (e.g. use os.path.dirname(__file__) or
pathlib.Path(__file__).parent and join "data" and "dftb") and assign that to
skdata before creating SKParam, and add the necessary import (os or pathlib) at
the top of the test file so SKParam(...) receives a stable filesystem path.
dptb/kpoints/path.py-67-76 (1)

67-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

klabels is incomplete for single-segment paths.

The terminal-label append on line 74-75 lives inside the else: branch (i > 0), so when pathstr = ["A-B"] (a single segment) only the i == 0 branch runs and klabels ends up as ["A"] — the trailing endpoint "B" is dropped. The function's other return values (klist, xlist, xlist_label) are computed correctly for len(pathstr) == 1, so the mismatch silently produces an off-by-one labels array.

🐛 Proposed fix — move the terminal-label append out of the `else` branch
         if i == 0:
             klabels = [pathstr[i].split("-")[0]]
         else:
             if pathstr[i].split("-")[0] == pathstr[i - 1].split("-")[1]:
                 klabels.append(pathstr[i].split("-")[0])
             else:
                 klabels.append(pathstr[i - 1].split("-")[1] + "|" + pathstr[i].split("-")[0])
-            if i == len(pathstr) - 1:
-                klabels.append(pathstr[i].split("-")[1])
+        if i == len(pathstr) - 1:
+            klabels.append(pathstr[i].split("-")[1])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/kpoints/path.py` around lines 67 - 76, The klabels construction drops
the terminal endpoint for single-segment paths because the terminal append
(klabels.append(pathstr[i].split("-")[1])) is inside the else branch; move that
terminal-label append out of the else so it runs for all cases (after the loop
or after the if/else block that handles i), ensuring you append
pathstr[i].split("-")[1] once for the final segment; update the logic around
klabels, pathstr and loop index i in the function so you don't duplicate the
last label for multi-segment paths (i.e., only append the final segment's second
token after the loop or when i == len(pathstr)-1 regardless of the earlier
branch).
dptb/tests/test_make_kpoints.py-141-154 (1)

141-154: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Sort k-point rows, not each column independently.

np.sort(..., axis=0) breaks the (kx, ky, kz) tuple pairing, so these checks can pass on the wrong point set. Use a row-wise lexicographic sort instead.

Suggested fix
+        sort_rows = lambda arr: arr[np.lexsort((arr[:, 2], arr[:, 1], arr[:, 0]))]
         np.testing.assert_allclose(
-            np.sort(kpts_sampling, axis=0),
-            np.sort(kpts_direct, axis=0)
+            sort_rows(kpts_sampling),
+            sort_rows(kpts_direct)
         )
...
         np.testing.assert_allclose(
-            np.sort(kpts_sampling, axis=0),
-            np.sort(kpts_direct, axis=0)
+            sort_rows(kpts_sampling),
+            sort_rows(kpts_direct)
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_make_kpoints.py` around lines 141 - 154, The test currently
uses np.sort(..., axis=0) which independently sorts each column and breaks
(kx,ky,kz) tuples; update the assertions in test_kmesh_sampling_monkhorst_pack
(and the similar earlier test) to perform a row-wise lexicographic sort on
kpts_sampling and kpts_direct so tuple ordering is preserved before comparison —
e.g., compute a permutation with np.lexsort using the columns (or build a
structured dtype / use argsort on tuple keys) and apply it to both kpts_sampling
and kpts_direct prior to calling np.testing.assert_allclose.
dptb/postprocess/charge_pop.py-274-282 (1)

274-282: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unbound structase for unsupported data types.

If self.structase is None and data is none of str / ase.Atoms / AtomicData, none of the branches assign structase, so line 281 (self.structase = structase) raises UnboundLocalError rather than the descriptive AssertionError on line 282. Add an explicit else that raises TypeError, or move the type check before the assignment.

🛡️ Suggested fix
         if self.structase is None:
             if isinstance(data, str):
                 structase = read(data)
             elif isinstance(data, ase.Atoms):
                 structase = data
             elif isinstance(data, AtomicData):
                 structase = data.to("cpu").to_ase()
+            else:
+                raise TypeError(
+                    f"data must be a str path, ase.Atoms, or AtomicData; got {type(data).__name__}."
+                )
             self.structase = structase
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/charge_pop.py` around lines 274 - 282, When self.structase
is None the local name structase can remain unassigned for unsupported data
types causing UnboundLocalError; update the branch that handles assignment (the
block referencing self.structase, read(), AtomicData, and ase.Atoms) to include
an explicit else that raises a TypeError describing the unexpected data type (or
alternatively perform the isinstance/type check and raise before assigning to
self.structase), so that unsupported inputs produce a clear error instead of an
UnboundLocalError.
dptb/nn/dftb/dftb_scc.py-240-242 (1)

240-242: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

r_max default is silently skipped when AtomicData_options is user-supplied.

The conditional only injects r_max = self.r_max when the user passes None. If the user passes any non-None dict (even {}), r_max is not added, contradicting the docstring on line 437–438 ("If None, uses default r_max from Slater-Koster parameters"). If the downstream AtomicData build requires r_max, this surfaces as an obscure error far from the call site.

🛡️ Suggested fix — set `r_max` only if the caller didn't provide one
-        if AtomicData_options is None:
-            AtomicData_options = {}
-            AtomicData_options['r_max'] = self.r_max
+        if AtomicData_options is None:
+            AtomicData_options = {}
+        AtomicData_options.setdefault('r_max', self.r_max)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` around lines 240 - 242, The code only sets
AtomicData_options['r_max'] when AtomicData_options is None, so user-supplied
dicts (even empty) never get the default r_max; change the logic in the function
constructing AtomicData_options so that you first ensure AtomicData_options is a
dict (create {} when None) and then set AtomicData_options['r_max'] = self.r_max
only if 'r_max' is not already present (i.e., check 'r_max' not in
AtomicData_options) so callers can override but still get the default when they
omit it; refer to AtomicData_options and self.r_max in dftb_scc.py and adjust
the initialization block accordingly.
dptb/postprocess/charge_pop.py-254-298 (1)

254-298: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cached structase/klist is a silent staleness footgun across structures.

get_mulcharge only initializes self.structase (line 274), self.klist/self.wk (line 288) on the first call. If a caller reuses a Mulliken instance and passes a different data (different cell, atoms, or k-mesh args) without invoking reset() first, the new structure is silently ignored — self.structase stays the original, self.pbc is overwritten from it (line 283), and klist/wk from the old structure are reused for the new one. This works inside SKSCC.run_iters only because that method calls self.reset() (which calls mulliken.reset()), but the contract is non-obvious for direct users.

Two options:

  • Guard with a structure/k-args fingerprint and either re-derive or raise on mismatch.
  • At minimum, drop the if self.klist is None check when kmeshgrid / kmeshspacing are explicitly supplied, so a caller can override the mesh on a subsequent call.
♻️ Sketch
-        # generate kmesh for charge calculation
-        if self.klist is None:
-            klist, wk = ksampling(self.structase,
+        # generate kmesh for charge calculation (regenerate if mesh args provided)
+        if self.klist is None or kmeshgrid is not None or kmeshspacing is not None:
+            klist, wk = ksampling(self.structase,
                                   meshgrid=kmeshgrid,
                                   meshspacing=kmeshspacing,
                                   gamma_centered=kgamma_center,
                                   rotational_symmetry=krotational_symmetry,
                                   time_inversion_symmetry=ktime_inversion_symmetry)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/charge_pop.py` around lines 254 - 298, get_mulcharge
silently reuses cached self.structase/self.klist/self.wk across calls; update it
so a new data or explicit kmesh arguments override the cache: compute the new
structase from data (as currently done) and if it differs from self.structase
either replace self.structase (and update self.pbc) or raise; likewise, when
kmeshgrid or kmeshspacing are provided, do not gate ksampling behind "if
self.klist is None"—always recompute klist,wk when explicit kmesh args are
passed (or when the structase changed), and otherwise keep the cache; reference
get_mulcharge, self.structase, self.klist, self.wk, reset(), and
kmeshgrid/kmeshspacing in your changes.
dptb/nn/dftb/dftb_scc.py-227-231 (1)

227-231: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

structase can be unbound, and AtomicData support is dropped here.

Two issues in this preparation block:

  1. If data is neither a str nor an ase.Atoms (the only branches handled), structase is never assigned and line 231 raises UnboundLocalError instead of the intended assertion message.
  2. Mulliken.get_mulcharge accepts AtomicData as input, but here you only handle str/ase.Atoms. The signature in run_iters is Union[ase.Atoms, str], but if a caller passes AtomicData (consistent with the Mulliken contract), this will fail before delegating. Either widen support symmetrically with Mulliken, or document the asymmetry and raise a clear TypeError.
🛡️ Suggested fix
-        if isinstance(data, str):
-            structase = read(data)
-        elif isinstance(data, ase.Atoms):
-            structase = data
-        assert isinstance(structase, ase.Atoms), "Input data must be an ASE Atoms object or a valid file path."
+        if isinstance(data, str):
+            structase = read(data)
+        elif isinstance(data, ase.Atoms):
+            structase = data
+        elif isinstance(data, AtomicData):
+            structase = data.to("cpu").to_ase()
+        else:
+            raise TypeError(
+                f"data must be a str path, ase.Atoms, or AtomicData; got {type(data).__name__}."
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` around lines 227 - 231, The current preparation
block can leave structase uninitialized and drops support for AtomicData; update
the logic in run_iters (the data/structase handling around read(data) and
ase.Atoms) to explicitly handle AtomicData inputs accepted by
Mulliken.get_mulcharge (check isinstance(data, AtomicData) and convert or
extract an ase.Atoms equivalent), and for any other types raise a clear
TypeError instead of relying on an assertion or allowing UnboundLocalError;
ensure you reference read(), structase and Mulliken.get_mulcharge in your change
so callers passing AtomicData are supported or receive an explicit TypeError
with a descriptive message.
🧹 Nitpick comments (19)
dptb/tests/data/dftb/structs_eos/C_GRAPHENE/C_graphene_0.950/POSCAR (1)

11-13: ⚡ Quick win

Drop the trailing velocity block unless this test explicitly needs it.

The second Cartesian section with zero vectors looks like an extra velocity block. For structure-only EOS fixtures, this can make parsing behavior reader-dependent and brittle. If velocities are not required by the test, remove this block and keep only the atomic position section.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/data/dftb/structs_eos/C_GRAPHENE/C_graphene_0.950/POSCAR` around
lines 11 - 13, Remove the trailing velocity block in the POSCAR test fixture:
delete the second "Cartesian" section containing the two zero vectors (the
duplicate Cartesian block after the atomic positions) so the file contains only
the atomic position section; if velocities are actually required by this
particular EOS test, instead document and clearly label that second block as
velocities and ensure the parser test expects velocities, but otherwise remove
it to avoid reader-dependent parsing.
dptb/nn/nnsk.py (1)

962-964: ⚡ Quick win

to_json() does not persist scc_metadata — silent round-trip loss.

save() includes scc_metadata in the .pth checkpoint, but to_json() produces a JSON dict that omits it. Any workflow that serialises via to_json() and later loads back with from_reference(json_path, ...) will silently drop the SCC configuration.

If JSON round-trips are expected to be lossless for SCC-enabled models, to_json() needs a corresponding scc_metadata section (and from_reference needs to read it from the JSON path). If JSON is intentionally limited, document this limitation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/nnsk.py` around lines 962 - 964, to_json() currently omits the SCC
configuration while save() writes self.scc_metadata to the .pth, causing silent
loss; update the to_json() implementation to include an "scc_metadata" key when
hasattr(self, "scc_metadata") (matching the same structure used in save()), and
update from_reference(json_path, ...) to read and restore scc_metadata from that
JSON section when present so JSON round-trips preserve SCC state; ensure keys
match exactly ("scc_metadata") and handle absence gracefully.
dptb/postprocess/__init__.py (1)

6-10: ⚡ Quick win

Mulliken is absent from the package's public surface.

The new Mulliken class introduced in dptb/postprocess/charge_pop.py is neither imported nor listed in __all__. Callers must use the full sub-module path (from dptb.postprocess.charge_pop import Mulliken) instead of the package-level import.

If Mulliken is intended to be a first-class public API (on par with Band and TBPLaS), add the corresponding import and entry:

✏️ Proposed addition
 from .bandstructure import Band
 from .totbplas import TBPLaS
 from .write_block import write_block
+from .charge_pop import Mulliken

 __all__ = [
     "Band",
     "TBPLaS",
     "write_block",
+    "Mulliken",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/__init__.py` around lines 6 - 10, The package init currently
exposes Band, TBPLaS and write_block but omits the new Mulliken class; update
dptb.postprocess.__init__ to import Mulliken from the submodule (e.g. from
.charge_pop import Mulliken) and add "Mulliken" to the __all__ list so callers
can do from dptb.postprocess import Mulliken just like Band and TBPLaS.
dptb/tests/test_charge_pop.py (1)

23-23: 💤 Low value

Non-standard pytest function signature – use a local variable instead.

def test_get_mulcharge(rootdir = rootdir): looks like a fixture injection but is actually a plain default argument; pytest will never override it. If rootdir were accidentally registered as a session fixture elsewhere, pytest would silently pass the fixture object instead of the computed path.

🔧 Proposed fix
-def test_get_mulcharge(rootdir = rootdir):
+def test_get_mulcharge():
+    rootdir = os.path.join(Path(os.path.abspath(__file__)).parent, "data")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_charge_pop.py` at line 23, The test function currently
defines a default parameter "rootdir = rootdir" which looks like fixture
injection but isn't; remove the parameter from the signature of
test_get_mulcharge and instead create a local variable inside the function
(e.g., rootdir_path = rootdir or rd = rootdir) so the test uses the
module/global value without risking pytest fixture injection or accidental
overrides; update references in test_get_mulcharge to use that local variable.
dptb/nn/dftb/hopping_dftb.py (2)

170-173: 💤 Low value

assert used for input validation — disabled by -O.

assert statements (lines 170–173, 180–181) used to validate yy shape and rij dimensions will be silently skipped when Python is run with the -O flag. Replace with explicit ValueError raises.

♻️ Proposed fix
-        assert yy.shape[0] == self.num_ingrls, \
-            f"Expected {self.num_ingrls} integrals, got {yy.shape[0]}"
-        assert len(yy.shape) == 2, \
-            f"Expected 2D tensor for yy, got {len(yy.shape)}D"
+        if yy.shape[0] != self.num_ingrls:
+            raise ValueError(f"Expected {self.num_ingrls} integrals, got {yy.shape[0]}")
+        if len(yy.shape) != 2:
+            raise ValueError(f"Expected 2D tensor for yy, got {len(yy.shape)}D")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/hopping_dftb.py` around lines 170 - 173, Replace the
assert-based input checks for yy and rij with explicit exceptions: instead of
using assert yy.shape[0] == self.num_ingrls and assert len(yy.shape) == 2, raise
ValueError with the same descriptive messages (e.g. f"Expected {self.num_ingrls}
integrals, got {yy.shape[0]}" and f"Expected 2D tensor for yy, got
{len(yy.shape)}D"); do the same for the rij dimension checks (replace any assert
len(rij.shape) == 2 or similar) by raising ValueError with clear messages;
update the same function/method in hopping_dftb.py where variables yy, rij, and
self.num_ingrls are validated so checks are not skipped under -O.

104-120: 💤 Low value

_interpolator not initialized in __init__ — inconsistent with _cached_xx.

_cached_xx is properly initialized to None (line 119), but _interpolator is only created in _create_interpolator. get_cutoff (line 244) falls back to hasattr(self, '_interpolator') as a consequence. Initializing both in __init__ makes the class state predictable and allows type annotations.

♻️ Proposed fix
+from typing import Optional
 ...
         self._cached_xx: Optional[torch.Tensor] = None
+        self._interpolator: Optional[DFTBPlusInterp1D] = None
-    def get_cutoff(self) -> float:
-        if hasattr(self, '_interpolator'):
-            return self._interpolator.get_cutoff()
-        return 0.0
+    def get_cutoff(self) -> float:
+        if self._interpolator is not None:
+            return self._interpolator.get_cutoff()
+        return 0.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/hopping_dftb.py` around lines 104 - 120, Initialize the
attribute _interpolator in the constructor to match the existing _cached_xx
initialization so the object state is predictable and type-safe; modify __init__
to set self._interpolator: Optional[Any] = None (or the appropriate type) and
ensure _create_interpolator still assigns to self._interpolator, and update any
type annotations or imports if needed so get_cutoff can rely on the attribute
instead of hasattr(self, '_interpolator').
dptb/utils/occupy.py (1)

384-387: 💤 Low value

Consider squared error for BFGS refinement on MP/MV.

ferr_ returns np.abs(fne - ne), which is non-differentiable at the optimum. BFGS' line search and quasi-Newton update can stall or oscillate on the V-shaped minimum. Replacing with (fne - ne)**2 (or using a method like bracket/brent again with a tightened interval around the Gaussian seed) yields the same minimizer with a smooth gradient.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/utils/occupy.py` around lines 384 - 387, The BFGS refinement uses a
non-differentiable absolute error in ferr_, which can cause optimizer stall for
methods 'mp' and 'mv'; change ferr_ (used with minimize and fne_vectorized) to
return the squared error (fne - ne)**2 instead of np.abs(fne - ne) so the
objective is smooth and differentiable for BFGS when computing ef, or
alternatively tighten the bracketed search with brent around the Gaussian seed
if you prefer a derivative-free step.
dptb/kpoints/geometry.py (2)

39-41: 💤 Low value

Minor: list(rotations) is more idiomatic.

[op for op in rotations] materializes a copy element-by-element; list(rotations) does the same in one call.

-    return [op for op in rotations]
+    return list(rotations)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/kpoints/geometry.py` around lines 39 - 41, The code returns a copy of
rotations using a list comprehension; replace the comprehension with the more
idiomatic list() call by returning list(rotations) instead of [op for op in
rotations] in the function that handles spglib rotational symmetry output (the
block that checks len(rotations) == 0 and then returns the operations).

62-71: 💤 Low value

Handedness-correction branch is unreachable.

Given the construction avec3 = cross(avec1, vec2)/|·| and avec2 = cross(avec3, avec1), avec1 and avec3 are orthogonal unit vectors. By the BAC-CAB identity, cross(avec1, avec2) = avec1 × (avec3 × avec1) = avec3·(avec1·avec1) − avec1·(avec1·avec3) = avec3, so dot(cross(avec1, avec2), avec3) = ‖avec3‖² = 1 > 0 always. The if … < 0: avec3 = -avec3 branch never fires.

If you want a real safety check (e.g., to defend against future refactors that change the construction), assert handedness instead of silently flipping; otherwise the check can be removed. Behaviorally a no-op today.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/kpoints/geometry.py` around lines 62 - 71, The handedness check using
"if np.dot(np.cross(avec1, avec2), avec3) < 0: avec3 = -avec3" is unreachable
given how avec2/avec3 are constructed; replace it with an explicit assertion to
fail fast if handedness is ever wrong (e.g., assert np.dot(np.cross(avec1,
avec2), avec3) > 0, raising a ValueError with a clear message), or remove the
branch entirely—update the block that computes avec1, avec2, avec3 (the
variables named avec1, avec2, avec3 and the handedness check) to use the
assertion approach so future refactors cannot silently flip handedness.
dptb/tests/test_occupy.py (2)

144-147: 💤 Low value

Orphan class docstring — missing class TestEntropyFunctions: declaration.

Line 145 sits at class-body indent inside TestSmearingFunctions, so all the entropy-function tests that follow (test_sgau_properties, test_sfd_properties, test_smp1_zero, test_scold_zero) are still methods of TestSmearingFunctions. They run, but the docstring suggests a separate test class was intended. Either drop the stray string or wrap the entropy tests under their own class for clearer reporting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_occupy.py` around lines 144 - 147, There is an orphan
docstring intended for a separate test class; either remove that stray string or
create the intended test class declaration. Fix by adding a new class
TestEntropyFunctions: above the docstring so the following methods
(test_sgau_properties, test_sfd_properties, test_smp1_zero, test_scold_zero)
become methods of that class, or alternatively delete the docstring so the
methods remain under TestSmearingFunctions; update the class declaration
accordingly to ensure the test methods are grouped as intended.

427-437: 💤 Low value

Edge-case test does not assert the documented ef is None contract.

calculate_fermi_level returns ef = None when ne == 0 (lines 340–345 of occupy.py), but this test only checks np.sum(occ) < 0.1 and never validates the returned ef. A subsequent change that returns a numeric ef for the zero-electron case would not be caught here.

♻️ Suggested addition
         ef, occ, _, _ = calculate_fermi_level(ekb, wk, ne, sigma)
 
+        # Documented contract: ef is None when ne == 0
+        assert ef is None
         # All occupations should be near zero
         assert np.sum(occ) < 0.1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_occupy.py` around lines 427 - 437, The test
test_edge_case_zero_electrons must also assert the documented contract that
calculate_fermi_level returns ef == None when ne == 0; update the test (in
test_edge_case_zero_electrons) to add an assertion checking ef is None in
addition to the existing occupancy sum check so regressions that return a
numeric Fermi level for zero electrons are caught.
dptb/kpoints/path.py (1)

7-11: 💤 Low value

Replace bare asserts with explicit validation.

Lines 7–8 and 11 use assert for input validation. With python -O these get stripped, so malformed kpath inputs would propagate undetected. Prefer raising ValueError for user-facing input checks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/kpoints/path.py` around lines 7 - 11, The code currently uses bare
assert statements to validate the kpath input (assert kpath.shape[-1] == 4,
assert len(kpath.shape) == 2, and assert num_kp[-1] == 1) which are stripped
with python -O; replace these with explicit validation that raises ValueError
with clear messages (e.g., check kpath.ndim == 2 and kpath.shape[-1] == 4 before
computing kpoints = kpath[:,0:3], convert num_kp = kpath[:,3].astype(int) and
then verify num_kp[-1] == 1) so malformed inputs produce deterministic
exceptions; keep the same variable names (kpath, kpoints, num_kp) when adding
the checks and error messages.
dptb/utils/interpolate/poly5_decay.py (2)

16-20: 💤 Low value

Use math.isfinite for a Python float.

_validate_x_boundary receives a Python float but allocates a torch.Tensor just to call torch.isfinite. Each __call__/eval_with_derivatives invocation will pay this cost. math.isfinite(dx) is a direct, allocation-free check.

♻️ Proposed refactor
+import math
+
 def _validate_x_boundary(x_boundary: float) -> float:
     dx = float(x_boundary)
-    if not torch.isfinite(torch.tensor(dx)) or dx <= 1e-12:
+    if not math.isfinite(dx) or dx <= 1e-12:
         raise ValueError("x_boundary must be positive and finite.")
     return dx
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/utils/interpolate/poly5_decay.py` around lines 16 - 20, The
_validate_x_boundary function currently creates a torch.Tensor to check
finiteness for a Python float, causing unnecessary allocations; replace the
torch-based check with math.isfinite(dx) and keep the dx <= 1e-12 check and
ValueError behavior intact so the function validates using math.isfinite(dx) and
returns dx without constructing a tensor.

188-191: 💤 Low value

Redundant _validate_x_boundary call.

compute_coefficients (line 188) already validates x_boundary internally; the second call on line 190 just re-validates the same value before computing inv_dx. You can drop line 190 and reuse the float returned by compute_coefficients (or capture dx from a single validation call) to avoid the duplicate work.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/utils/interpolate/poly5_decay.py` around lines 188 - 191, The second
call to _validate_x_boundary is redundant because compute_coefficients already
validates and returns the validated x_boundary; remove the extra
_validate_x_boundary call and reuse the validated x_boundary (the value returned
by compute_coefficients assigned alongside dd, ee, ff) to compute inv_dx (inv_dx
= 1.0 / x_boundary) so you avoid double validation; update the code around
compute_coefficients and inv_dx to rely on the single validated value and delete
the extra validation invocation.
dptb/utils/interpolate/dftbplus_interp.py (1)

138-145: 💤 Low value

Silent override when both init-time and call-time y are provided.

If y was supplied at init, a y passed at call time is silently dropped. The docstring documents this, but it can be a subtle source of "stale tables" when the same interpolator instance is reused with different SK channels. Consider raising/warning when a call-time y is passed alongside an init-time _y so misuse is caught.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/utils/interpolate/dftbplus_interp.py` around lines 138 - 145, The code
silently ignores a call-time y when self._y_is_given is True (using self._y
instead), so update the call-site (the method that contains the self._y_is_given
check and the lines using x.to(...)) to detect when both are provided and
surface it: if self._y_is_given and y is not None then either raise a ValueError
or emit a warning with clear context (mentioning the instance _y will be used),
otherwise proceed as before; reference the attributes self._y_is_given and
self._y and keep the existing conversion lines (x = self.x.to(...), xq =
xq.to(...)) unchanged.
dptb/nn/dftb/dftb_scc.py (2)

710-741: 💤 Low value

Remove the commented-out cal_ovp_r0 block.

This dead code adds noise and bit-rots silently (e.g. references DFTBSK and feature_to_block that may evolve). If you need to preserve it for future use, move it to a scratch branch / issue rather than carrying ~30 commented lines in the released module.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` around lines 710 - 741, Remove the dead
commented-out cal_ovp_r0 block (the entire commented method referencing
cal_ovp_r0, DFTBSK, feature_to_block, and AtomicDataDict) from dftb_scc.py to
eliminate noise and avoid bit-rot; if you want to preserve it for later
reference, move the contents to a separate scratch branch, a dedicated gist, or
an issue tracker entry rather than keeping ~30 lines commented inside the
module.

615-619: 💤 Low value

Same-atom off-diagonal branch is mathematically redundant (cosmetic).

For two orbitals on the same atom, shift_i == shift_j, so 0.5 * (shift_i + shift_j) == shift_i. The same_atom & ~diag_mask where on line 616 produces the same numerical values as the default coeff from line 613 and can be dropped without changing results. Keeping it is fine if you want the docstring's case structure to be visually explicit; otherwise removing it shaves one elementwise pass.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` around lines 615 - 619, The same-atom off-diagonal
torch.where is redundant: because shift_i == shift_j for orbitals on the same
atom, 0.5*(shift_i + shift_j) already equals shift_i, so drop the branch that
assigns coeff = torch.where(same_atom & ~diag_mask, shift_i.expand(norb, norb),
coeff); leave the default coeff computation and keep only the diag_mask
assignment (coeff = torch.where(diag_mask, 0.5 * shift_i.expand(norb, norb),
coeff)) to remove one unnecessary elementwise pass while preserving behavior for
coeff, same_atom, diag_mask, shift_i and norb.
dptb/kpoints/mesh.py (2)

59-68: 💤 Low value

Verify the int(norm / kspac) + 1 mesh-sizing formula.

Both build_kmeshgrid (line 67–68) and kgrid_spacing (line 149) round mesh sizes via floor(norm/kspac) + 1. For an exact integer ratio (e.g. norm/kspac == 2.0) this returns 3, not 2; for 2.999 it also returns 3. Common conventions are max(1, ceil(norm/kspac)) (VASP-style) or max(1, int(round(norm/kspac))) (ASE kpts2sizes). If this asymmetry is intentional (e.g. to guarantee strict upper-bound spacing), it would be worth a comment; otherwise consider np.ceil to better match user intuition for kspacing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/kpoints/mesh.py` around lines 59 - 68, The code in build_kmeshgrid and
kgrid_spacing uses floor(norm/kspac)+1 which yields an off-by-one for exact
integer ratios (e.g., 2.0 -> 3); update both functions (build_kmeshgrid and
kgrid_spacing) to compute mesh sizes using max(1, int(np.ceil(norm / kspac)))
(or equivalent ceiling logic) to match VASP-style expectations, and add a brief
comment in each function documenting the choice to use ceil for intuitive sizing
(or explicitly state if you intended strict upper-bound behavior).

142-159: ⚡ Quick win

Consider reusing build_kmeshgrid / monkhorst_pack_sampling in kgrid_spacing.

kgrid_spacing re-implements reciprocal-lattice computation and mesh sizing inline (line 148–149) instead of delegating to calculate_reciprocal_vectors + build_kmeshgrid, which already encapsulate the same logic. Beyond duplication, this is also a divergence risk: if the mesh-sizing convention in build_kmeshgrid is later refined (e.g. to ceil or round rather than floor + 1), kgrid_spacing will silently produce different grids for the same kspacing. Routing through the shared helpers (or directly through monkhorst_pack_sampling(cell=structase.cell, kspac=kspacing, gamma_centered=(sampling=='Gamma'))) keeps the two entry points consistent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/kpoints/mesh.py` around lines 142 - 159, kgrid_spacing duplicates
reciprocal-lattice and mesh-sizing logic; replace the inline rev_latt and
meshgrid computation with the shared helpers to avoid divergence by calling
calculate_reciprocal_vectors(structase.cell) and build_kmeshgrid(...) or simply
delegate to monkhorst_pack_sampling(cell=structase.cell, kspac=kspacing,
gamma_centered=(sampling=="Gamma")) and then feed its mesh into monkhorst_pack
or gamma_center as appropriate (or use the sampling-return from
monkhorst_pack_sampling directly), keeping validation for structase and kspacing
and preserving the existing error for unsupported sampling.

Comment thread dptb/kpoints/reduction.py
Comment thread dptb/nn/dftb/interp.py Outdated
Comment thread dptb/nn/dftb/interp.py Outdated
Comment thread dptb/nn/dftb/scc_mixer.py
Comment thread dptb/nn/dftb/scc_params.py Outdated
Comment on lines +7 to +11
1
Cartesian
-0.0000000006442579 0.0000000005673469 -0.0000000000872944
Cartesian
0.0000000000000000 0.0000000000000000 0.0000000000000000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix malformed POSCAR atom block (count/header mismatch).

Line 7 declares 1 atom, but Line 9 and Line 11 provide two coordinate rows, and Line 10 repeats Cartesian unexpectedly. This makes the fixture invalid/ambiguous for POSCAR readers.

Suggested fix (choose one consistent form)
 C  
-  1
+  1
 Cartesian
  -0.0000000006442579  0.0000000005673469 -0.0000000000872944
-Cartesian
-  0.0000000000000000  0.0000000000000000  0.0000000000000000

Or if this structure is intended to have 2 atoms, set count to 2 and keep a single Cartesian header with two coordinate lines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/data/dftb/structs_eos/C_BCC/C_bcc_0.950/POSCAR` around lines 7 -
11, The POSCAR atom block is malformed: the atom count value "1" conflicts with
two coordinate rows and there are two "Cartesian" headers; fix by choosing one
consistent form — either change the atom count from "1" to "2" and remove the
duplicate "Cartesian" header so a single "Cartesian" line precedes the two
coordinate rows (-0.0000000006442579  0.0000000005673469 -0.0000000000872944 and
0.0000000000000000  0.0000000000000000  0.0000000000000000), or keep the count
as "1" and remove the extra coordinate row and the duplicate "Cartesian" header
so exactly one coordinate follows the single "Cartesian" header.

Comment thread dptb/tests/test_energy_eigh_numpy.py Outdated
Comment thread dptb/tests/test_occupy.py
Comment thread dptb/utils/ewald_sum.py
Comment on lines +199 to +217
def bT(tensor: Tensor) -> Tensor:
"""Dimensionally agnostic "transpose".

Reverses the dimensions of a tensor like so [m, n, o] -> [o, n, m]. This is
designed to preserve the original functionality of the `torch.T` operator
in an effort to maintain dimensional/batch agnosticism. Recent versions of
PyTorch will only permit the transpose operator to be used on 2D matrices
which makes dimensionally agnostic treatment of tensors difficult in some
situations.

Arguments:
tensor: the tensor whose dimensions are to be flipped.

Returns:
flipped_tensor: the tensor with its dimensions reversed.

"""
return tensor.permute(*torch.arange(tensor.ndim - 1, -1, -1))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the file
find . -name "ewald_sum.py" -type f

Repository: deepmodeling/DeePTB

Length of output: 1901


🏁 Script executed:

# Let's also check if this is actually used anywhere
rg "bT\(" --type py

Repository: deepmodeling/DeePTB

Length of output: 1901


🏁 Script executed:

# Now let's verify PyTorch behavior with a simple test
python <<'EOF'
import torch

# Test 1: Does torch.arange() return a tensor?
arange_result = torch.arange(5)
print(f"torch.arange(5) type: {type(arange_result)}")
print(f"torch.arange(5) value: {arange_result}")

# Test 2: Can we unpack a 1D tensor into permute()?
test_tensor = torch.randn(2, 3, 4)
try:
    # This is what the current code does
    indices = torch.arange(test_tensor.ndim - 1, -1, -1)
    print(f"\nIndices tensor: {indices}")
    print(f"Unpacking indices: {tuple(indices)}")
    
    # Try to use it with permute
    result = test_tensor.permute(*indices)
    print(f"permute(*torch.arange()) succeeded!")
    print(f"Original shape: {test_tensor.shape}, Result shape: {result.shape}")
except Exception as e:
    print(f"permute(*torch.arange()) failed with: {type(e).__name__}: {e}")

# Test 3: Using range() instead
try:
    indices_range = range(test_tensor.ndim - 1, -1, -1)
    result2 = test_tensor.permute(*indices_range)
    print(f"\npermute(*range()) succeeded!")
    print(f"Result shape: {result2.shape}")
except Exception as e:
    print(f"permute(*range()) failed with: {type(e).__name__}: {e}")

# Test 4: What happens when we convert tensor to int?
try:
    indices_tensor = torch.arange(test_tensor.ndim - 1, -1, -1)
    indices_as_ints = tuple(int(i) for i in indices_tensor)
    result3 = test_tensor.permute(*indices_as_ints)
    print(f"\npermute(*[int(i) for i in torch.arange()]) succeeded!")
    print(f"Result shape: {result3.shape}")
except Exception as e:
    print(f"permute(*[int(i) for i in torch.arange()]) failed with: {type(e).__name__}: {e}")

EOF

Repository: deepmodeling/DeePTB

Length of output: 1901


🌐 Web query:

PyTorch Tensor.permute() method parameters type requirements

💡 Result:

The PyTorch Tensor.permute method takes a single parameter named dims, which specifies the desired ordering of the tensor's dimensions. The accepted types for dims are: torch.Size, int..., tuple of int, or list of int [1][2]. This is consistent across PyTorch documentation versions, including 1.10, 2.11, and 2.12 [3][1][4]. The dims must be a permutation containing each dimension index exactly once (unique integers from 0 to tensor.dim-1), though this is a semantic requirement rather than a strict type constraint [5][6][7]. Examples: x.permute(2, 0, 1) for a 3D tensor, or x.permute((2, 0, 1)) using a tuple [1].

Citations:


🌐 Web query:

PyTorch torch.arange unpacking tensor scalar permute compatibility

💡 Result:

torch.arange always returns a 1-D tensor, never a scalar or 0-D tensor [1][2]. To unpack a single-element tensor like torch.tensor(5) or t where t = torch.arange(1,2), use.item to get a Python scalar: t.item [3][4][5]. Python unpacking (*tensor) works only if the tensor has.ndim == 1 and len == number of variables; for scalars/0-D, it fails as they are not iterable [5]. tensor.permute(*dims) requires exactly tensor.ndim arguments (unique 0 to ndim-1); for 0-D tensors (scalars), permute raises "number of dims don't match" since ndim=0 but dims provided [6][7][8]. For 1-D tensors from arange, permute requires 1 dim like.permute(0) (no-op). Compatibility issue: cannot permute 0-D tensors (use.item first); unpacking scalars requires.item, not *tensor or permute [3][7].

Citations:


🌐 Web query:

PyTorch unpacking tensor into permute does it convert to int

💡 Result:

In PyTorch, permute does not “convert” a tensor to int in the sense of interpreting tensor values as integer axes. The arguments to permute are dimension indices (integers) that specify how to reorder the tensor’s axes. - torch.Tensor.permute takes dimension indices as inputs: it is defined as permute(*dims) / permute(dims) where dims is a tuple of Python ints (i.e., plain integer indices like 0, 1, 2, …), and it returns a view with dimensions reordered accordingly [1][2]. - Therefore, writing permute with Python unpacking (e.g., permute(*something)) only works if that something is an iterable of Python integers, not a tensor of arbitrary dtype. The permute indices must be integers specifying axis order, not tensor values to be cast into indices [1][2]. If your actual goal is “swap axes 0 and 1”, you must pass the correct axis order. For a 4D tensor with shape (3, 3, 512, 256), swapping the last two axes (and keeping the first two) is permute(2, 3, 0, 1), not permute(2, 3, 1, 0). The axis indices in permute are “in the order you want the new tensor to have” [3]. Practical example: x = torch.rand(3, 3, 512, 256) # new shape (512, 256, 3, 3) y = x.permute(2, 3, 0, 1) So the short answer to your question (“does it convert to int?”): permute expects integer axis indices; it does not reinterpret tensor data as axis indices via implicit casting. You should explicitly provide the axis order as Python ints (or an iterable/tuple of ints) [1][2].

Citations:


Replace torch.arange() with Python's range() in the permute call.

Unpacking torch.arange() produces 0-D tensor scalars, but Tensor.permute() expects Python integers. This will fail at runtime. Use Python's built-in range() instead, which yields integers directly.

Suggested fix
-    return tensor.permute(*torch.arange(tensor.ndim - 1, -1, -1))
+    return tensor.permute(*range(tensor.ndim - 1, -1, -1))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def bT(tensor: Tensor) -> Tensor:
"""Dimensionally agnostic "transpose".
Reverses the dimensions of a tensor like so [m, n, o] -> [o, n, m]. This is
designed to preserve the original functionality of the `torch.T` operator
in an effort to maintain dimensional/batch agnosticism. Recent versions of
PyTorch will only permit the transpose operator to be used on 2D matrices
which makes dimensionally agnostic treatment of tensors difficult in some
situations.
Arguments:
tensor: the tensor whose dimensions are to be flipped.
Returns:
flipped_tensor: the tensor with its dimensions reversed.
"""
return tensor.permute(*torch.arange(tensor.ndim - 1, -1, -1))
def bT(tensor: Tensor) -> Tensor:
"""Dimensionally agnostic "transpose".
Reverses the dimensions of a tensor like so [m, n, o] -> [o, n, m]. This is
designed to preserve the original functionality of the `torch.T` operator
in an effort to maintain dimensional/batch agnosticism. Recent versions of
PyTorch will only permit the transpose operator to be used on 2D matrices
which makes dimensionally agnostic treatment of tensors difficult in some
situations.
Arguments:
tensor: the tensor whose dimensions are to be flipped.
Returns:
flipped_tensor: the tensor with its dimensions reversed.
"""
return tensor.permute(*range(tensor.ndim - 1, -1, -1))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/utils/ewald_sum.py` around lines 199 - 217, The bT function uses
tensor.permute(*torch.arange(...)) which unpacks 0-D torch tensors and causes a
runtime type error; change the permute call to use Python integers by replacing
torch.arange(tensor.ndim - 1, -1, -1) with the built-in range(...) so permute
receives plain ints (update the permute invocation in bT accordingly).

Comment thread dptb/utils/interpolate/poly_interp.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
dptb/postprocess/elec_struc_cal.py (1)

631-681: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SOC eband/eband_free still inconsistent with rescaled occ.

For spindeg=1, ne = 2 * total_electrons is passed to calculate_fermi_level, so the returned occ (and the corresponding eband/eband_free) embed the internal g=2 factor. The rescale at line 669 then divides occ back to the physical electron count, but eband and eband_free are returned untouched. SOC callers therefore receive band energies in a different convention than the final occupations. This was raised in a previous review and is still unresolved here.

Recompute eband/eband_free after the rescale (or always after NR polish), e.g. by another calculate_fermi_level call with ne=total_electrons for the spindeg=1 path, or by scaling them with the same factor used to rescale occ.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/elec_struc_cal.py` around lines 631 - 681, The returned
eband/eband_free remain in the pre-rescale spin-degenerate convention while occ
is rescaled, causing a convention mismatch; after calling cls.rescale_occ(...)
(and after cls.newton_raphson_polish_fermi(...) when used) ensure eband and
eband_free are adjusted to the same convention as the final occ — either by
recomputing them via cls.calculate_fermi_level(...) with ne=total_electrons for
the spindeg=1 path or by applying the identical multiplicative scaling factor
used by cls.rescale_occ to eband and eband_free so they match the rescaled occ
before returning Ef, occ, diff_ne, eband, eband_free.
🧹 Nitpick comments (11)
dptb/nn/dftbsk.py (2)

91-100: 💤 Low value

Duplicate smooth_ski handling in __init__ and from_reference.

The smooth_ski override is applied twice — once in from_reference (lines 304-306) and once again here in __init__. Then from_reference calls cls(...) with smooth_ski=False to avoid retriggering. This is fragile; if a future maintainer changes one path the other can silently diverge. Consider centralizing this logic in __init__ and just passing smooth_ski=smooth_ski from from_reference.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftbsk.py` around lines 91 - 100, The smooth_ski override logic is
duplicated between from_reference and __init__ (the interp_method override to
'smooth_intp'); centralize it in __init__ by keeping the interp_method override
and validation there (using interp_method, VALID_INTERP_METHODS, and
self.interp_method) and remove the duplicate override in from_reference so it
simply forwards the smooth_ski argument (pass smooth_ski=smooth_ski into
cls(...)) instead of forcing False; ensure from_reference no longer modifies
interp_method and let __init__ enforce the rule.

281-282: 💤 Low value

assert for checkpoint extension is fragile.

If checkpoint is None or non-string, checkpoint.split(...) crashes with an opaque AttributeError before the assert. Also, asserts can be stripped under python -O. Prefer an explicit type/value check.

-        assert checkpoint.split(".")[-1] == "pth", "The checkpoint should be a pth file."
+        if not isinstance(checkpoint, str) or not checkpoint.endswith(".pth"):
+            raise ValueError(f"checkpoint must be a path to a .pth file, got {checkpoint!r}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftbsk.py` around lines 281 - 282, The assert checking the checkpoint
extension is fragile and can crash or be stripped; replace it with an explicit
validation before calling torch.load: verify that checkpoint is a non-empty str
and endswith(".pth"), and if not raise a ValueError with a clear message; then
proceed to call torch.load(checkpoint, map_location=device, weights_only=False).
Ensure you handle checkpoint is None or wrong type (not string) and avoid using
assert so the check remains in optimized runs.
dptb/postprocess/elec_struc_cal.py (2)

292-294: ⚡ Quick win

Inconsistent return arity from get_fermi_level based on eig_method.

get_fermi_level returns (data, E_fermi) for eig_method='eigvalsh' and (data, E_fermi, eband, occ) for 'eigh'. Callers must branch on self.eig_method to unpack correctly, which is error-prone and easy to break. Prefer always returning the same tuple (with None for fields not computed) or returning a small dataclass / dict.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/elec_struc_cal.py` around lines 292 - 294, get_fermi_level
currently returns different tuple shapes depending on self.eig_method which
forces callers to branch; modify get_fermi_level so it always returns the same
structure (e.g., a 4-tuple) by returning data, E_fermi, eband, occ and set eband
and occ to None when not computed (i.e., when self.eig_method != 'eigh'), or
alternatively return a small dict/dataclass with keys/attributes data, E_fermi,
eband, occ; update the return statements where get_fermi_level returns (data,
E_fermi) to instead include placeholders for eband and occ so callers can unpack
consistently.

263-264: ⚡ Quick win

Use not in for membership test (Ruff E713).

Minor readability fix flagged by static analysis.

-        if not AtomicDataDict.ENERGY_EIGENVALUE_KEY in data:
+        if AtomicDataDict.ENERGY_EIGENVALUE_KEY not in data:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/elec_struc_cal.py` around lines 263 - 264, The membership
test uses the banned pattern "not A in B"; change the condition in the
elec_struc_cal.py snippet to use the canonical form by replacing "if not
AtomicDataDict.ENERGY_EIGENVALUE_KEY in data:" with "if
AtomicDataDict.ENERGY_EIGENVALUE_KEY not in data:" so the block that calls
self.get_eigs(data=..., klist=klist, pbc=pbc,
AtomicData_options=AtomicData_options, eig_solver=eig_solver) executes correctly
and satisfies the Ruff E713 rule.
dptb/tests/test_occupy.py (1)

14-14: 💤 Low value

Unused import itertools.product as itprod.

Not referenced anywhere in the test module. Safe to drop.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_occupy.py` at line 14, The import alias itprod from
itertools.product in dptb/tests/test_occupy.py is unused; remove the line "from
itertools import product as itprod" to clean up the module and avoid linter
warnings, ensuring no references to itprod remain in the file.
dptb/nn/deeptb.py (1)

554-564: 💤 Low value

Backward-compat injection of mass buffer looks correct.

The fallback to inject dftbsk.mass/mass from the freshly constructed model when missing in the loaded state_dict is consistent with the same logic in DFTBSK.from_reference. Note that state_dict = ckpt["model_state_dict"] is a reference — the new key gets written into the loaded dict, but since del ckpt follows immediately the side effect is harmless. Consider state_dict = dict(ckpt["model_state_dict"]) if you want to avoid mutating the loaded checkpoint dict in case future code paths reuse it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/deeptb.py` around lines 554 - 564, The current code assigns
state_dict = ckpt["model_state_dict"] which mutates the checkpoint dict when we
inject the missing "dftbsk.mass"/"mass"; change this to make a shallow copy
(e.g., state_dict = dict(ckpt["model_state_dict"])) before adding the mass
fallback so ckpt remains unchanged; update both branches where state_dict is set
(the branch that calls model.load_state_dict and the branch that calls
model.dftbsk.load_state_dict) to use the copied dict.
dptb/tests/test_dftb_scc.py (2)

534-857: 💤 Low value

Strip unnecessary f prefix on placeholder-free strings (Ruff F541).

Lines 534, 548, 691, 781, and 857 contain assertion messages prefixed with f but with no {} placeholders, e.g. f"Reused instance Gamma differs". Ruff flags these. Drop the f prefix.

-    assert torch.allclose(Gamma_run1, Gamma_run2, atol=1e-10), \
-        f"Reused instance Gamma differs"
+    assert torch.allclose(Gamma_run1, Gamma_run2, atol=1e-10), \
+        "Reused instance Gamma differs"

(Apply the same change at lines 548, 691, 781, 857.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_dftb_scc.py` around lines 534 - 857, Several assertion
messages use an unnecessary f-string prefix with no placeholders (Ruff F541);
remove the leading f for those literal messages. Locate the assertions that
compare Gamma_run1/Gamma_run3 and other run comparisons (e.g., the assertions
using Gamma_run1, Gamma_run3, mul_charge_run1, mul_charge_run3, and the
reused-instance comparisons like "Reused instance Gamma differs" / "Fresh
instance Gamma differs") and change the message strings from f"..." to plain
"..." (same for the other listed messages at the indicated assertions).

668-700: 💤 Low value

float32-vs-float64 consistency tolerances may be tight for some platforms.

atol=1e-4 for total energy comparisons assumes the SCC mixing converged tightly enough that float32 and float64 paths agree at the 5th decimal, but float32 has only ~7 significant digits and the energies here are ~100 eV. On platforms where the float32 path takes a slightly different SCC trajectory (mixer ordering, BLAS routing), this can be flaky. Consider relaxing to atol=1e-3 or using a relative tolerance, or pin the SCC tol/max_iter to make the comparison deterministic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_dftb_scc.py` around lines 668 - 700, The float32 vs float64
equality tolerances in the test using DFTBSCC (asserts on elec_totE,
elec_H0_bandE, scc_shift, and mulliken.mul_charge) are too tight and can be
flaky across platforms; relax the absolute tolerances (e.g., change elec_totE
and elec_H0_bandE from atol=1e-4 to atol=1e-3 or use torch.allclose(...,
rtol=1e-5, atol=1e-3)), keep or slightly loosen scc_shift and
mulliken.mul_charge tolerances accordingly, or alternatively ensure
deterministic SCC convergence by pinning the SCC mixer settings (e.g., set the
SCC tol/max_iter passed via run_params) before comparing to guarantee both
float32 and float64 follow the same trajectory.
dptb/nn/dftb/interp.py (1)

161-172: 💤 Low value

Remove extraneous f prefix from strings without placeholders.

Static analysis (Ruff F541) flags f-strings at lines 162 and 171 that contain no placeholders. These should be regular strings.

Suggested fix
         except KeyError as e:
-            raise KeyError(f" Element not found in sigma_rep.") from e
+            raise KeyError("Element not found in sigma_rep.") from e
         except Exception as e:
-            raise KeyError(f" Element not found in symbol_to_Z.") from e
+            raise KeyError("Element not found in symbol_to_Z.") from e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/interp.py` around lines 161 - 172, The raised KeyError messages
use f-strings with no placeholders; change them to plain string literals by
removing the extraneous 'f' prefix in the two raise statements: the one inside
the except KeyError block that raises " Element not found in sigma_rep." and the
one in _get_Z_pair that raises " Element not found in symbol_to_Z." so both
messages are normal strings rather than f-strings.
dptb/nn/dftb/scc_params.py (1)

69-71: 💤 Low value

Avoid mutable default torch.device("cpu") in function signature.

Static analysis (Ruff B008) flags this: calling torch.device() in argument defaults is evaluated once at definition time, not per-call. While torch.device objects are immutable and this won't cause mutation bugs, it's still considered bad practice as it can cause subtle issues and is inconsistent with Python conventions.

Suggested fix
     `@classmethod`
     def from_database(
         cls,
         basis: Dict[str, Union[str, list]],
         idp_sk: OrbitalMapper,
         dtype: Union[str, torch.dtype] = torch.float64,
-        device: Union[str, torch.device] = torch.device("cpu"),
+        device: Union[str, torch.device, None] = None,
         r_max: Optional[Dict[str, float]] = None,
     ) -> "SCCParams":
+        if device is None:
+            device = torch.device("cpu")
         return cls.from_options(

Apply similar change to from_options at line 89.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/scc_params.py` around lines 69 - 71, Change the
function/constructor signature that currently uses device: Union[str,
torch.device] = torch.device("cpu") to use a None default (e.g. device:
Optional[Union[str, torch.device]] = None) and inside the function (the method
where dtype, device, r_max are parameters) set device = torch.device("cpu") when
device is None; this avoids calling torch.device() at definition time. Apply the
same change to the from_options function as well so both the signature and
runtime initialization use a None sentinel and lazily construct
torch.device("cpu") inside the body.
dptb/nn/dftb/dftb_scc.py (1)

117-132: 💤 Low value

Type hints use implicit Optional.

PEP 484 recommends explicit Optional[T] instead of T = None. While this works, it's clearer to be explicit.

Suggested fix
     def run_iters(self,
                   data: Union[ase.Atoms, str],
-                  nel_atom: dict = None,
+                  nel_atom: Optional[dict] = None,
                   kmeshgrid: Optional[List[int]] = None,
                   kmeshspacing: Optional[List[float]] = None,
                   kgamma_center: bool = True,
                   krotational_symmetry: bool = True,
                   ktime_inversion_symmetry: bool = True,
                   Temp: float = 300,
-                  AtomicData_options: dict = None,
+                  AtomicData_options: Optional[dict] = None,
                   smearing_method: str = 'Fermi-Dirac',
                   mixer: Union[str, SCCMixer] = 'simple',
                   tol: float = 1e-6,
                   mix_rate: float = 0.25,
                   max_iter: int = 50,
-                  mixer_options: Dict[str, Any] = None) -> AtomicDataDict:
+                  mixer_options: Optional[Dict[str, Any]] = None) -> AtomicDataDict:

Apply similar changes to get_total_energy method parameters.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` around lines 117 - 132, Change parameters that
default to None to use explicit Optional[...] annotations: update run_iters
signature so nel_atom: Optional[Dict[Any, Any]] = None (or a more specific Dict
key/value types if known), AtomicData_options: Optional[Dict[str, Any]] = None,
and mixer_options: Optional[Dict[str, Any]] = None (keep kmeshgrid and
kmeshspacing as Optional already). Apply the same pattern to get_total_energy
parameters that currently use implicit None defaults so all nullable params use
Optional[T] types; leave non-nullable params (e.g., Temp, smearing_method, mixer
default value) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dptb/nn/dftbsk.py`:
- Around line 102-107: The dftbsk model options only persist "skdata" and
"interp_method", so custom smooth_intp parameters get lost on save/load; update
the code that builds self.model_options["dftbsk"] to also include "dist_fudge"
and "n_interp_points" when provided, and update from_reference to mirror this by
reading these keys from saved_dftbsk_opts (falling back to DFTBPLUS_DIST_FUDGE
and DFTBPLUS_N_INTER only if absent) so checkpoint round-trips preserve custom
smooth_intp settings; refer to model_options, "dftbsk", interp_method,
dist_fudge, n_interp_points, and from_reference when making the change.
- Around line 276-317: from_reference currently only reads interp_method (and
skdata) from the checkpoint but ignores dist_fudge and n_interp_points despite
the PR claim; update the loader in from_reference to pull dist_fudge and
n_interp_points from saved_dftb_opts (falling back to provided arguments if
absent) so that the local variables passed into cls(...) reflect checkpoint
values, and ensure you check config_model_opts.get("dftb", {}) / .get("dftbsk",
{}) (saved_dftb_opts) for keys "dist_fudge" and "n_interp_points" the same way
interp_method is handled; adjust logging to indicate when these values are taken
from the checkpoint.

In `@dptb/tests/test_occupy.py`:
- Around line 144-178: Add a new test class header to group the entropy tests so
they are not orphaned: create "class TestEntropyFunctions:" and move/indent the
functions test_sgau_properties, test_sfd_properties, test_smp1_zero, and
test_scold_zero under it, and remove the top-level docstring """Test entropy
contribution functions.""" (or keep it as the class docstring directly under the
new class); ensure these tests are no longer methods of TestSmearingFunctions so
test discovery shows the correct grouping.

In `@dptb/tests/test_scc_params.py`:
- Line 211: Replace the hardcoded CWD-relative string assigned to skdata with a
path computed relative to the test file: use Path(__file__).resolve().parent /
"data" / "dftb" (and add "from pathlib import Path" at the top if not already
imported) so skdata is robust regardless of pytest’s working directory; update
the assignment to use that Path expression in test_scc_params.py.

In `@dptb/utils/interpolate/poly_interp.py`:
- Line 168: The unpacking "n_channels, nn = y_local.shape" raises a Ruff RUF059
because n_channels is unused; change the unpack to ignore the first value (e.g.,
use "_ , nn = y_local.shape" or assign only nn from y_local.shape[1]) so only
the used variable nn remains; update the occurrence where y_local.shape is
unpacked in the poly_interp function (reference y_local and nn) to remove the
unused n_channels identifier.

---

Duplicate comments:
In `@dptb/postprocess/elec_struc_cal.py`:
- Around line 631-681: The returned eband/eband_free remain in the pre-rescale
spin-degenerate convention while occ is rescaled, causing a convention mismatch;
after calling cls.rescale_occ(...) (and after
cls.newton_raphson_polish_fermi(...) when used) ensure eband and eband_free are
adjusted to the same convention as the final occ — either by recomputing them
via cls.calculate_fermi_level(...) with ne=total_electrons for the spindeg=1
path or by applying the identical multiplicative scaling factor used by
cls.rescale_occ to eband and eband_free so they match the rescaled occ before
returning Ef, occ, diff_ne, eband, eband_free.

---

Nitpick comments:
In `@dptb/nn/deeptb.py`:
- Around line 554-564: The current code assigns state_dict =
ckpt["model_state_dict"] which mutates the checkpoint dict when we inject the
missing "dftbsk.mass"/"mass"; change this to make a shallow copy (e.g.,
state_dict = dict(ckpt["model_state_dict"])) before adding the mass fallback so
ckpt remains unchanged; update both branches where state_dict is set (the branch
that calls model.load_state_dict and the branch that calls
model.dftbsk.load_state_dict) to use the copied dict.

In `@dptb/nn/dftb/dftb_scc.py`:
- Around line 117-132: Change parameters that default to None to use explicit
Optional[...] annotations: update run_iters signature so nel_atom:
Optional[Dict[Any, Any]] = None (or a more specific Dict key/value types if
known), AtomicData_options: Optional[Dict[str, Any]] = None, and mixer_options:
Optional[Dict[str, Any]] = None (keep kmeshgrid and kmeshspacing as Optional
already). Apply the same pattern to get_total_energy parameters that currently
use implicit None defaults so all nullable params use Optional[T] types; leave
non-nullable params (e.g., Temp, smearing_method, mixer default value)
unchanged.

In `@dptb/nn/dftb/interp.py`:
- Around line 161-172: The raised KeyError messages use f-strings with no
placeholders; change them to plain string literals by removing the extraneous
'f' prefix in the two raise statements: the one inside the except KeyError block
that raises " Element not found in sigma_rep." and the one in _get_Z_pair that
raises " Element not found in symbol_to_Z." so both messages are normal strings
rather than f-strings.

In `@dptb/nn/dftb/scc_params.py`:
- Around line 69-71: Change the function/constructor signature that currently
uses device: Union[str, torch.device] = torch.device("cpu") to use a None
default (e.g. device: Optional[Union[str, torch.device]] = None) and inside the
function (the method where dtype, device, r_max are parameters) set device =
torch.device("cpu") when device is None; this avoids calling torch.device() at
definition time. Apply the same change to the from_options function as well so
both the signature and runtime initialization use a None sentinel and lazily
construct torch.device("cpu") inside the body.

In `@dptb/nn/dftbsk.py`:
- Around line 91-100: The smooth_ski override logic is duplicated between
from_reference and __init__ (the interp_method override to 'smooth_intp');
centralize it in __init__ by keeping the interp_method override and validation
there (using interp_method, VALID_INTERP_METHODS, and self.interp_method) and
remove the duplicate override in from_reference so it simply forwards the
smooth_ski argument (pass smooth_ski=smooth_ski into cls(...)) instead of
forcing False; ensure from_reference no longer modifies interp_method and let
__init__ enforce the rule.
- Around line 281-282: The assert checking the checkpoint extension is fragile
and can crash or be stripped; replace it with an explicit validation before
calling torch.load: verify that checkpoint is a non-empty str and
endswith(".pth"), and if not raise a ValueError with a clear message; then
proceed to call torch.load(checkpoint, map_location=device, weights_only=False).
Ensure you handle checkpoint is None or wrong type (not string) and avoid using
assert so the check remains in optimized runs.

In `@dptb/postprocess/elec_struc_cal.py`:
- Around line 292-294: get_fermi_level currently returns different tuple shapes
depending on self.eig_method which forces callers to branch; modify
get_fermi_level so it always returns the same structure (e.g., a 4-tuple) by
returning data, E_fermi, eband, occ and set eband and occ to None when not
computed (i.e., when self.eig_method != 'eigh'), or alternatively return a small
dict/dataclass with keys/attributes data, E_fermi, eband, occ; update the return
statements where get_fermi_level returns (data, E_fermi) to instead include
placeholders for eband and occ so callers can unpack consistently.
- Around line 263-264: The membership test uses the banned pattern "not A in B";
change the condition in the elec_struc_cal.py snippet to use the canonical form
by replacing "if not AtomicDataDict.ENERGY_EIGENVALUE_KEY in data:" with "if
AtomicDataDict.ENERGY_EIGENVALUE_KEY not in data:" so the block that calls
self.get_eigs(data=..., klist=klist, pbc=pbc,
AtomicData_options=AtomicData_options, eig_solver=eig_solver) executes correctly
and satisfies the Ruff E713 rule.

In `@dptb/tests/test_dftb_scc.py`:
- Around line 534-857: Several assertion messages use an unnecessary f-string
prefix with no placeholders (Ruff F541); remove the leading f for those literal
messages. Locate the assertions that compare Gamma_run1/Gamma_run3 and other run
comparisons (e.g., the assertions using Gamma_run1, Gamma_run3, mul_charge_run1,
mul_charge_run3, and the reused-instance comparisons like "Reused instance Gamma
differs" / "Fresh instance Gamma differs") and change the message strings from
f"..." to plain "..." (same for the other listed messages at the indicated
assertions).
- Around line 668-700: The float32 vs float64 equality tolerances in the test
using DFTBSCC (asserts on elec_totE, elec_H0_bandE, scc_shift, and
mulliken.mul_charge) are too tight and can be flaky across platforms; relax the
absolute tolerances (e.g., change elec_totE and elec_H0_bandE from atol=1e-4 to
atol=1e-3 or use torch.allclose(..., rtol=1e-5, atol=1e-3)), keep or slightly
loosen scc_shift and mulliken.mul_charge tolerances accordingly, or
alternatively ensure deterministic SCC convergence by pinning the SCC mixer
settings (e.g., set the SCC tol/max_iter passed via run_params) before comparing
to guarantee both float32 and float64 follow the same trajectory.

In `@dptb/tests/test_occupy.py`:
- Line 14: The import alias itprod from itertools.product in
dptb/tests/test_occupy.py is unused; remove the line "from itertools import
product as itprod" to clean up the module and avoid linter warnings, ensuring no
references to itprod remain in the file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7285d5ec-64bd-45ec-b33d-b103ff03acdf

📥 Commits

Reviewing files that changed from the base of the PR and between e0b34c4 and e9918e6.

📒 Files selected for processing (15)
  • dptb/kpoints/reduction.py
  • dptb/nn/deeptb.py
  • dptb/nn/dftb/dftb_scc.py
  • dptb/nn/dftb/interp.py
  • dptb/nn/dftb/scc_mixer.py
  • dptb/nn/dftb/scc_params.py
  • dptb/nn/dftb/sk_param.py
  • dptb/nn/dftbsk.py
  • dptb/nn/energy.py
  • dptb/postprocess/elec_struc_cal.py
  • dptb/tests/test_dftb_scc.py
  • dptb/tests/test_energy_eigh_numpy.py
  • dptb/tests/test_occupy.py
  • dptb/tests/test_scc_params.py
  • dptb/utils/interpolate/poly_interp.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • dptb/nn/energy.py
  • dptb/nn/dftb/scc_mixer.py

Comment thread dptb/nn/dftbsk.py
Comment on lines 102 to 107
self.model_options = {
"dftbsk":{
"skdata": skdata
"skdata": skdata,
"interp_method": interp_method,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist dist_fudge and n_interp_points in model_options for proper checkpoint round-trip.

model_options["dftbsk"] only stores skdata and interp_method. When interp_method='smooth_intp' is used with a custom dist_fudge or n_interp_points, those values are not saved with the checkpoint and won't be reloaded by from_reference — the code only reads interp_method from the saved dftbsk_options, and falls back to module-level defaults (DFTBPLUS_DIST_FUDGE / DFTBPLUS_N_INTER). Save/load with custom smooth_intp parameters will silently revert to defaults.

🛠️ Proposed fix
         self.model_options = {
             "dftbsk":{
                 "skdata": skdata,
                 "interp_method": interp_method,
                 }
         }
+        if interp_method == 'smooth_intp':
+            self.model_options["dftbsk"]["dist_fudge"] = _dist_fudge
+            self.model_options["dftbsk"]["n_interp_points"] = _n_points

And mirror the read in from_reference so dist_fudge/n_interp_points are pulled from saved_dftbsk_opts when not explicitly provided.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftbsk.py` around lines 102 - 107, The dftbsk model options only
persist "skdata" and "interp_method", so custom smooth_intp parameters get lost
on save/load; update the code that builds self.model_options["dftbsk"] to also
include "dist_fudge" and "n_interp_points" when provided, and update
from_reference to mirror this by reading these keys from saved_dftbsk_opts
(falling back to DFTBPLUS_DIST_FUDGE and DFTBPLUS_N_INTER only if absent) so
checkpoint round-trips preserve custom smooth_intp settings; refer to
model_options, "dftbsk", interp_method, dist_fudge, n_interp_points, and
from_reference when making the change.

Comment thread dptb/nn/dftbsk.py
Comment on lines +276 to +317
dftbsk_options = {
"skdata": skdata,
"interp_method": interp_method,
}

assert checkpoint.split(".")[-1] == "pth", "The checkpoint should be a pth file."
assert checkpoint.split(".")[-1] == "pth", "The checkpoint should be a pth file."
f = torch.load(checkpoint, map_location=device, weights_only=False)

for k,v in common_options.items():
for k, v in common_options.items():
if v is None:
common_options[k] = f["config"]["common_options"][k]
log.info(f"{k} is not provided in the input json, set to the value {common_options[k]} in model ckpt.")
for k,v in dftb.items():

# Handle dftbsk model options (support both 'dftb' and 'dftbsk' keys for backward compatibility)
config_model_opts = f["config"].get("model_options", {})
saved_dftbsk_opts = config_model_opts.get("dftbsk", config_model_opts.get("dftb", {}))

for k, v in dftbsk_options.items():
if v is None:
dftb[k] = f["config"]["model_options"]["dftb"][k]
log.info(f"{k} is not provided in the input json, set to the value {dftb[k]} in model ckpt.")
if k in saved_dftbsk_opts:
dftbsk_options[k] = saved_dftbsk_opts[k]
log.info(f"{k} is not provided in the input json, set to the value {dftbsk_options[k]} in model ckpt.")
elif k == "interp_method":
# Default to 'linear' for backward compatibility with old checkpoints
dftbsk_options[k] = 'linear'
log.info(f"{k} is not in model ckpt, defaulting to 'linear'.")

# Handle smooth_ski flag
if smooth_ski:
log.debug("smooth_ski is True, override interp_method to 'smooth_intp'.")
dftbsk_options["interp_method"] = 'smooth_intp'

num_xgrid = f["model_state_dict"]["distance_param"].shape[0]
model = cls(**common_options, **dftb, num_xgrid=num_xgrid, transform=transform)

model = cls(
**common_options,
**dftbsk_options,
num_xgrid=num_xgrid,
transform=transform,
smooth_ski=False, # Already handled above
dist_fudge=dist_fudge,
n_interp_points=n_interp_points,
)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

from_reference does not actually load dist_fudge / n_interp_points from the checkpoint, contradicting the PR summary.

The PR summary states from_reference loads interp_method, smooth_ski, dist_fudge, and n_interp_points from the checkpoint config, but dftbsk_options only contains skdata and interp_method. dist_fudge and n_interp_points are taken solely from caller arguments and never read from saved_dftbsk_opts. Combined with the issue that model_options doesn't persist them either, smooth-interpolation checkpoints round-trip incorrectly when non-default values were used. Either fix the loader or update the documentation/summary to reflect the true behavior.

🛠️ Proposed change
         dftbsk_options = {
             "skdata": skdata,
             "interp_method": interp_method,
+            "dist_fudge": dist_fudge,
+            "n_interp_points": n_interp_points,
         }
@@
         model = cls(
             **common_options,
             **dftbsk_options,
             num_xgrid=num_xgrid,
             transform=transform,
             smooth_ski=False,  # Already handled above
-            dist_fudge=dist_fudge,
-            n_interp_points=n_interp_points,
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftbsk.py` around lines 276 - 317, from_reference currently only
reads interp_method (and skdata) from the checkpoint but ignores dist_fudge and
n_interp_points despite the PR claim; update the loader in from_reference to
pull dist_fudge and n_interp_points from saved_dftb_opts (falling back to
provided arguments if absent) so that the local variables passed into cls(...)
reflect checkpoint values, and ensure you check config_model_opts.get("dftb",
{}) / .get("dftbsk", {}) (saved_dftb_opts) for keys "dist_fudge" and
"n_interp_points" the same way interp_method is handled; adjust logging to
indicate when these values are taken from the checkpoint.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring this because dist_fudge / n_interp_points are not currently part of the validated config schema. Proper support should be added together with argcheck/schema, config loading, docs, and checkpoint round-trip tests in a follow-up PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QG-phy, understood — deferring makes sense given that dist_fudge and n_interp_points lack schema validation and round-trip test coverage. Would you like me to open a GitHub issue to track this follow-up work (schema support, config loading, docs, and checkpoint round-trip tests for these parameters)?

Comment thread dptb/tests/test_occupy.py
Comment thread dptb/tests/test_scc_params.py Outdated
Comment thread dptb/utils/interpolate/poly_interp.py Outdated
QG-phy added 2 commits May 7, 2026 16:22
Return generalized SCC eigenvectors in (norb, nstate) order to match
postprocess expectations and avoid incorrect Mulliken population handling.

Also ensure DFTB2NNSK creates its output directory before use and relax
DFTB SK tensor comparison precision to reduce test fragility.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
dptb/nn/dftb/dftb_scc.py (1)

697-705: ⚡ Quick win

Replace torch.linalg.inv on a triangular factor with torch.linalg.solve_triangular.

torch.linalg.inv(chklowt) performs a full O(n³) matrix inversion on a lower-triangular factor. torch.linalg.solve_triangular supports the required float/double/cfloat/cdouble dtypes and batched matrices natively, solving L·X = B in O(n²) per k-point and avoiding the explicit inverse entirely — which is also more numerically stable for near-singular overlap matrices.

♻️ Proposed refactor
         if overlap:
             chklowt = torch.linalg.cholesky(overlap_mat)
-            chklowtinv = torch.linalg.inv(chklowt)
-            H_mat = (
-                chklowtinv @ H_mat @ torch.transpose(chklowtinv,dim0=1,dim1=2).conj()
-                )
+            # Solve L·X = H  →  X = L⁻¹·H
+            LiH = torch.linalg.solve_triangular(chklowt, H_mat, upper=False)
+            # Solve L·Y = LiH.conj().T  →  H' = L⁻¹·H·L⁻ᴴ
+            H_mat = torch.linalg.solve_triangular(
+                chklowt, LiH.transpose(1, 2).conj(), upper=False
+            ).transpose(1, 2).conj()
         eigval, eigvec = torch.linalg.eigh(H_mat)
         if overlap:
-            eigvec = torch.transpose(chklowtinv,dim0=1,dim1=2).conj() @ eigvec
+            # Back-transform:  solve L·Z = eigvec  →  |ψ⟩ = L⁻ᴴ·|ψ'⟩
+            eigvec = torch.linalg.solve_triangular(
+                chklowt, eigvec, upper=False
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` around lines 697 - 705, The code currently computes
chklowtinv = torch.linalg.inv(chklowt) and uses it to transform H_mat and
eigvec; replace the explicit inverse by using two batched triangular solves with
torch.linalg.solve_triangular: for H_mat do tmp =
torch.linalg.solve_triangular(chklowt, H_mat, left=True, upper=False) and then
H_mat = torch.linalg.solve_triangular(chklowt, tmp.transpose(-2,-1).conj(),
left=True, upper=False).transpose(-2,-1).conj() (this yields chklowtinv @ H_mat
@ chklowtinv.T.conj() without forming the inverse), and for eigvec replace
eigvec = torch.transpose(chklowtinv,dim0=1,dim1=2).conj() @ eigvec with eigvec =
torch.linalg.solve_triangular(chklowt, eigvec, left=True,
upper=False,).transpose(-2,-1).conj() (or equivalently solve with the Hermitian
transpose of chklowt as a triangular solve); update uses of chklowtinv
accordingly and remove torch.linalg.inv(chklowt).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dptb/nn/dftb/dftb_scc.py`:
- Around line 275-278: The loop allows max_iter=0 which breaks out before
initializing self.elec_H0_bandE and self.scc_shift_energy, causing a TypeError
later; add an entry validation for max_iter (e.g., in the SCC routine before the
while loop using iteration and max_iter) that raises a ValueError or normalizes
max_iter to at least 1 when max_iter <= 0, so the loop runs correctly and
self.elec_H0_bandE and self.scc_shift_energy are always initialized before they
are used.
- Around line 227-231: The code can leave structase unassigned when data is
neither str nor ase.Atoms (e.g., an AtomicData object), causing
UnboundLocalError; update the branch to handle convertible types: if
isinstance(data, str) use read(data), elif isinstance(data, ase.Atoms) set
structase = data, elif hasattr(data, "to_ase") call structase = data.to_ase(),
elif hasattr(data, "atoms") set structase = data.atoms, otherwise raise a clear
TypeError (or ValueError) stating accepted input types; modify the block around
structase/data in dftb_scc.py to include these checks so structase is always
defined before use and mirrors how get_mulcharge accepts AtomicData.

In `@dptb/postprocess/charge_pop.py`:
- Around line 274-282: The code path where self.structase is None can leave the
local name structase unbound for unsupported data types; change the branch in
the block that handles data to either initialize structase = None at the top or,
better, explicitly handle the unsupported types by raising a TypeError with a
clear message: inside the if self.structase is None block (the code referencing
structase, read(), and data.to("cpu").to_ase()), add an else: raise
TypeError(f"Unsupported data type for structase: {type(data)!r}") so structase
is always assigned or a clear error is raised before the final
isinstance(self.structase, ase.Atoms) assertion. Ensure you still assign
self.structase = structase when appropriate.

---

Nitpick comments:
In `@dptb/nn/dftb/dftb_scc.py`:
- Around line 697-705: The code currently computes chklowtinv =
torch.linalg.inv(chklowt) and uses it to transform H_mat and eigvec; replace the
explicit inverse by using two batched triangular solves with
torch.linalg.solve_triangular: for H_mat do tmp =
torch.linalg.solve_triangular(chklowt, H_mat, left=True, upper=False) and then
H_mat = torch.linalg.solve_triangular(chklowt, tmp.transpose(-2,-1).conj(),
left=True, upper=False).transpose(-2,-1).conj() (this yields chklowtinv @ H_mat
@ chklowtinv.T.conj() without forming the inverse), and for eigvec replace
eigvec = torch.transpose(chklowtinv,dim0=1,dim1=2).conj() @ eigvec with eigvec =
torch.linalg.solve_triangular(chklowt, eigvec, left=True,
upper=False,).transpose(-2,-1).conj() (or equivalently solve with the Hermitian
transpose of chklowt as a triangular solve); update uses of chklowtinv
accordingly and remove torch.linalg.inv(chklowt).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b143ddd7-5172-44b2-9424-f88d375894d2

📥 Commits

Reviewing files that changed from the base of the PR and between e9918e6 and 9c4f9b9.

📒 Files selected for processing (5)
  • dptb/nn/dftb/dftb_scc.py
  • dptb/nn/dftb2nnsk.py
  • dptb/postprocess/charge_pop.py
  • dptb/tests/test_dftbsk.py
  • dptb/tests/test_optical_conductivity.py
✅ Files skipped from review due to trivial changes (1)
  • dptb/tests/test_dftbsk.py

Comment thread dptb/nn/dftb/dftb_scc.py
Comment thread dptb/nn/dftb/dftb_scc.py
Comment thread dptb/postprocess/charge_pop.py
@QG-phy QG-phy marked this pull request as ready for review May 7, 2026 12:26
@QG-phy QG-phy changed the title Migrate UniSK DFTB-SCC support Add DeePTB the DFTB-SCC support May 7, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
dptb/nn/dftb/dftb_scc.py (2)

656-722: 💤 Low value

eigh_solver silently accepts conflicting args.

When overlap=False, a non-None overlap_mat is silently ignored. That's a foot-gun for callers that pass overlap matrices conditionally and forget to flip the flag. Optional: raise on mismatch, or at minimum log a warning.

♻️ Optional
         if overlap and overlap_mat is None:
             raise ValueError("Overlap matrix must be provided when overlap is True.")
+        if not overlap and overlap_mat is not None:
+            log.warning("eigh_solver: overlap_mat provided but overlap=False; ignoring overlap_mat.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` around lines 656 - 722, The eigh_solver method
currently ignores a provided overlap_mat when overlap=False; update eigh_solver
to validate the argument pair by raising a ValueError (or logging a clear
warning) when overlap_mat is not None but overlap is False so callers don’t
silently pass conflicting args; specifically modify the start of
dptb.nn.dftb.dftb_scc.eigh_solver to check if overlap_mat is not None and
overlap is False and then raise an error (or call a module logger.warn) with a
message like "overlap_mat provided but overlap=False" so the conflict is
explicit.

14-14: ⚡ Quick win

Remove commented-out import and dead cal_ovp_r0 body.

Line 14 is a stale commented import, and lines 724–755 are a large commented-out method skeleton. Either drop them or land a working implementation behind an explicit feature flag. Carrying them as comments hides intent, breaks "find usages" tooling, and grows technical debt.

♻️ Suggested cleanup
-# from dptb.utils.constants import Bohr2Ang, Harte2eV
 from dptb.nn.dftb.interp import calculate_atomic_rep
-    # `@staticmethod`
-    # def cal_ovp_r0(data: AtomicDataDict,
-    #                model: DFTBSK) -> AtomicDataDict:
-    #     ...
-    #             else:
-    #                 ovp_R0_all[slice_i, slice_j] = ovp_R_dict[block_idx]
+    # (cal_ovp_r0 removed — track in a follow-up issue if the planned overlap-at-R=0 path is needed.)

Also applies to: 724-755

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/dftb_scc.py` at line 14, Remove the stale commented import and
the dead method skeleton: delete the commented-out "from dptb.utils.constants
import Bohr2Ang, Harte2eV" line and either remove the commented-out cal_ovp_r0
function body (lines 724–755) entirely or replace it with a real implementation
behind an explicit feature flag (e.g. a module-level ENABLE_CAL_OVP_R0 constant)
and a clear TODO + docstring; reference the cal_ovp_r0 function name to locate
the skeleton and ensure no leftover commented code remains so tooling and "find
usages" work correctly.
dptb/nn/dftb/scc_params.py (1)

383-388: 💤 Low value

Dead elif branch in _infer_r_max_from_model.

When len(parts) == 1 and parts[0] == symbol, the prior if symbol in parts: already matches (since symbol == parts[0]symbol in parts). The elif is unreachable.

♻️ Suggested simplification
             for key, value in rs.items():
                 parts = key.split("-")
                 if symbol in parts:
                     values.append(float(value))
-                elif len(parts) == 1 and parts[0] == symbol:
-                    values.append(float(value))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/scc_params.py` around lines 383 - 388, The elif branch in
_infer_r_max_from_model is dead because when len(parts) == 1 and parts[0] ==
symbol the prior if symbol in parts already matches; remove the redundant elif
and simplify the logic by keeping a single condition (e.g., if symbol in parts)
that appends float(value) to values for matching keys (refer to variables rs,
key, parts, symbol, values in the loop) so the unreachable branch is eliminated.
dptb/tests/test_dftb_scc.py (2)

309-319: 💤 Low value

Benchmark assertion is meaningful but lacks per-point reporting.

max_diff only surfaces the worst point. When this fails (e.g. after an interpolation tweak), debugging is faster if the failure prints the per-scale deltas alongside max_diff. Optional improvement.

♻️ Optional
     max_diff = np.max(np.abs(deeptb_elec - dftbp_elec))
-    assert max_diff < ELECTRONIC_ENERGY_ATOL, (
-        f"{struct_type} electronic energies deviate more than {ELECTRONIC_ENERGY_ATOL} eV; "
-        f"max_abs_diff={max_diff}"
-    )
+    assert max_diff < ELECTRONIC_ENERGY_ATOL, (
+        f"{struct_type} electronic energies deviate more than {ELECTRONIC_ENERGY_ATOL} eV; "
+        f"max_abs_diff={max_diff}; per-scale diffs={list(zip(SCALES.tolist(), (deeptb_elec - dftbp_elec).tolist()))}"
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_dftb_scc.py` around lines 309 - 319, The test
test_dftbscc_matches_dftbp_benchmarks currently only reports max_diff; enhance
the failure message to include per-point deltas to aid debugging by computing
diffs = np.abs(deeptb_elec - dftbp_elec) and include either the full diffs array
or the top-N largest entries (indices and values) in the assertion message
alongside max_diff and ELECTRONIC_ENERGY_ATOL so you can see which scale points
fail when the assert (max_diff < ELECTRONIC_ENERGY_ATOL) triggers; update the
assert's formatted f-string to embed that extra information.

358-374: ⚡ Quick win

CH4 expGamma regression compares without sorting.

Earlier in test_dftbscc_hBN (lines 220–222), expGamma is sorted before comparison precisely because order may vary across versions. This CH4 test compares dftbscc.expGamma against expGamma_ref directly. If the underlying ordering ever changes (e.g. due to neighbour-list reordering), this will flake while the hBN one will not. Apply the same sort-then-compare pattern for consistency.

♻️ Suggested fix
-    assert torch.allclose(dftbscc.expGamma, expGamma_ref, atol=3e-4)
+    expGamma_sorted, _ = torch.sort(dftbscc.expGamma)
+    expGamma_ref_sorted, _ = torch.sort(expGamma_ref)
+    assert torch.allclose(expGamma_sorted, expGamma_ref_sorted, atol=3e-4)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_dftb_scc.py` around lines 358 - 374, The expGamma comparison
in the CH4 test is brittle because it compares dftbscc.expGamma to expGamma_ref
without sorting; update the assertion to sort both values before comparing
(mirror the pattern used in test_dftbscc_hBN) by applying a deterministic sort
(e.g., torch.sort or .numpy() + np.sort) to dftbscc.expGamma and expGamma_ref
and then use torch.allclose/np.allclose with the same atol to perform the
equality check; locate the comparison around dftbscc.expGamma and expGamma_ref
in the test_dftb_scc.py test and replace the direct compare with the
sort-then-compare variant.
dptb/postprocess/charge_pop.py (1)

343-349: 💤 Low value

Conservation-warning thresholds use a magic constant.

5e-2 for both Mulliken-vs-valence and Σ delta_charge checks is hard-coded twice. If these thresholds are tuned later, they must be kept in sync. Consider promoting to a module-level constant (e.g. CHARGE_CONSERVATION_TOL).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/charge_pop.py` around lines 343 - 349, Replace the
duplicated magic literal 5e-2 with a single module-level constant (e.g.
CHARGE_CONSERVATION_TOL) and use that constant in both charge checks; update the
checks that reference mul_charge vs per_atom_charge and delta_charge (the two
log.warning calls) to compare against CHARGE_CONSERVATION_TOL (keep the same
comparison operators and messages), and add the new constant near the top of
dptb/postprocess/charge_pop.py so future tuning is centralized.
dptb/nn/dftb/interp.py (1)

187-191: ⚡ Quick win

Remove torch.tensor(...) wrapper to avoid UserWarning and preserve gradient flow.

_get_sigma_pair returns a tensor (line 164), so Bohr2Ang / sigma_pair is already a tensor on the correct dtype/device. Wrapping it in torch.tensor(...) triggers a UserWarning ("To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() ...") and detaches the autograd graph—this would silently break differentiability if sigma_pair ever becomes learnable.

♻️ Suggested fix
     def gamma_for_pair(self, sym_i: str, sym_j: str) -> torch.Tensor:
         # gamma = Bohr2Ang / sqrt(sigma_i^2 + sigma_j^2), sigma unit: angstrom
         sigma_pair = self._get_sigma_pair(sym_i, sym_j)  # angstrom
-        return torch.tensor(Bohr2Ang / sigma_pair, dtype=self.dtype, device=self.device)   # 1/Bohr
+        return (Bohr2Ang / sigma_pair).to(dtype=self.dtype, device=self.device)   # 1/Bohr
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/nn/dftb/interp.py` around lines 187 - 191, The gamma_for_pair function
wraps an existing tensor result in torch.tensor(...), which triggers a
UserWarning and breaks autograd; instead return the computed tensor expression
directly so gradients and device/dtype are preserved: remove the
torch.tensor(...) wrapper around Bohr2Ang / sigma_pair in gamma_for_pair and
ensure the returned value uses the same dtype/device as sigma_pair (leave
_get_sigma_pair usage unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dptb/nn/dftb/interp.py`:
- Around line 157-172: The KeyError messages in _get_sigma_pair and _get_Z_pair
drop the offending element names and have a stray leading space; update the
exception raises in those functions to include the actual symbol(s) (sym_i and
sym_j) in the formatted error message (e.g. "Element not found in sigma_rep:
{sym_i}" or include both symbols where appropriate) and remove the leading space
so the logged symbol is clear and precise.

---

Nitpick comments:
In `@dptb/nn/dftb/dftb_scc.py`:
- Around line 656-722: The eigh_solver method currently ignores a provided
overlap_mat when overlap=False; update eigh_solver to validate the argument pair
by raising a ValueError (or logging a clear warning) when overlap_mat is not
None but overlap is False so callers don’t silently pass conflicting args;
specifically modify the start of dptb.nn.dftb.dftb_scc.eigh_solver to check if
overlap_mat is not None and overlap is False and then raise an error (or call a
module logger.warn) with a message like "overlap_mat provided but overlap=False"
so the conflict is explicit.
- Line 14: Remove the stale commented import and the dead method skeleton:
delete the commented-out "from dptb.utils.constants import Bohr2Ang, Harte2eV"
line and either remove the commented-out cal_ovp_r0 function body (lines
724–755) entirely or replace it with a real implementation behind an explicit
feature flag (e.g. a module-level ENABLE_CAL_OVP_R0 constant) and a clear TODO +
docstring; reference the cal_ovp_r0 function name to locate the skeleton and
ensure no leftover commented code remains so tooling and "find usages" work
correctly.

In `@dptb/nn/dftb/interp.py`:
- Around line 187-191: The gamma_for_pair function wraps an existing tensor
result in torch.tensor(...), which triggers a UserWarning and breaks autograd;
instead return the computed tensor expression directly so gradients and
device/dtype are preserved: remove the torch.tensor(...) wrapper around Bohr2Ang
/ sigma_pair in gamma_for_pair and ensure the returned value uses the same
dtype/device as sigma_pair (leave _get_sigma_pair usage unchanged).

In `@dptb/nn/dftb/scc_params.py`:
- Around line 383-388: The elif branch in _infer_r_max_from_model is dead
because when len(parts) == 1 and parts[0] == symbol the prior if symbol in parts
already matches; remove the redundant elif and simplify the logic by keeping a
single condition (e.g., if symbol in parts) that appends float(value) to values
for matching keys (refer to variables rs, key, parts, symbol, values in the
loop) so the unreachable branch is eliminated.

In `@dptb/postprocess/charge_pop.py`:
- Around line 343-349: Replace the duplicated magic literal 5e-2 with a single
module-level constant (e.g. CHARGE_CONSERVATION_TOL) and use that constant in
both charge checks; update the checks that reference mul_charge vs
per_atom_charge and delta_charge (the two log.warning calls) to compare against
CHARGE_CONSERVATION_TOL (keep the same comparison operators and messages), and
add the new constant near the top of dptb/postprocess/charge_pop.py so future
tuning is centralized.

In `@dptb/tests/test_dftb_scc.py`:
- Around line 309-319: The test test_dftbscc_matches_dftbp_benchmarks currently
only reports max_diff; enhance the failure message to include per-point deltas
to aid debugging by computing diffs = np.abs(deeptb_elec - dftbp_elec) and
include either the full diffs array or the top-N largest entries (indices and
values) in the assertion message alongside max_diff and ELECTRONIC_ENERGY_ATOL
so you can see which scale points fail when the assert (max_diff <
ELECTRONIC_ENERGY_ATOL) triggers; update the assert's formatted f-string to
embed that extra information.
- Around line 358-374: The expGamma comparison in the CH4 test is brittle
because it compares dftbscc.expGamma to expGamma_ref without sorting; update the
assertion to sort both values before comparing (mirror the pattern used in
test_dftbscc_hBN) by applying a deterministic sort (e.g., torch.sort or .numpy()
+ np.sort) to dftbscc.expGamma and expGamma_ref and then use
torch.allclose/np.allclose with the same atol to perform the equality check;
locate the comparison around dftbscc.expGamma and expGamma_ref in the
test_dftb_scc.py test and replace the direct compare with the sort-then-compare
variant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 841a183e-072b-4eab-a2ed-29fca2207b98

📥 Commits

Reviewing files that changed from the base of the PR and between 9c4f9b9 and 055b4cd.

📒 Files selected for processing (58)
  • dptb/nn/dftb/dftb_scc.py
  • dptb/nn/dftb/gamma.py
  • dptb/nn/dftb/interp.py
  • dptb/nn/dftb/scc_params.py
  • dptb/postprocess/charge_pop.py
  • dptb/tests/test_charge_pop.py
  • dptb/tests/test_dftb_scc.py
  • dptb/tests/test_dftbplus_interp.py
  • dptb/tests/test_scc_params.py
  • dptb/utils/interpolate/dftbplus_interp.py
  • examples/dftb_scc/README.md
  • examples/dftb_scc/benchmark_eos_dftbp.ipynb
  • examples/dftb_scc/hBN_dftb/band.json
  • examples/dftb_scc/hBN_dftb/band_plot.ipynb
  • examples/dftb_scc/hBN_dftb/data/kpath.0/eigenvalues.npy
  • examples/dftb_scc/hBN_dftb/data/kpath.0/info.json
  • examples/dftb_scc/hBN_dftb/data/kpath.0/kpoints.npy
  • examples/dftb_scc/hBN_dftb/data/kpath.0/xdat.traj
  • examples/dftb_scc/hBN_dftb/data/struct.vasp
  • examples/dftb_scc/hBN_dftb/hotapi.ipynb
  • examples/dftb_scc/hBN_dftb/run/band.json
  • examples/dftb_scc/hBN_dftb/sk20251/B-B.skf
  • examples/dftb_scc/hBN_dftb/sk20251/B-N.skf
  • examples/dftb_scc/hBN_dftb/sk20251/N-B.skf
  • examples/dftb_scc/hBN_dftb/sk20251/N-N.skf
  • examples/dftb_scc/hBN_dftb/slakos/B-B.skf
  • examples/dftb_scc/hBN_dftb/slakos/B-N.skf
  • examples/dftb_scc/hBN_dftb/slakos/C-C.skf
  • examples/dftb_scc/hBN_dftb/slakos/C-H.skf
  • examples/dftb_scc/hBN_dftb/slakos/H-C.skf
  • examples/dftb_scc/hBN_dftb/slakos/H-H.skf
  • examples/dftb_scc/hBN_dftb/slakos/N-B.skf
  • examples/dftb_scc/hBN_dftb/slakos/N-N.skf
  • examples/dftb_scc/hBN_dftb/slakos/Si-Si.skf
  • examples/dftb_scc/hBN_dftb/test_gamma.ipynb
  • examples/dftb_scc/hBN_dftb/test_pot.ipynb
  • examples/dftb_scc/hBN_dftb/test_total_energy.ipynb
  • examples/dftb_scc/hBN_scc/band.json
  • examples/dftb_scc/hBN_scc/data/POSCAR
  • examples/dftb_scc/hBN_scc/data/kpath.0/eigenvalues.npy
  • examples/dftb_scc/hBN_scc/data/kpath.0/info.json
  • examples/dftb_scc/hBN_scc/data/kpath.0/kpoints.npy
  • examples/dftb_scc/hBN_scc/data/kpath.0/xdat.traj
  • examples/dftb_scc/hBN_scc/data/struct.vasp
  • examples/dftb_scc/hBN_scc/run/band.json
  • examples/dftb_scc/hBN_scc/slakos/B-B.skf
  • examples/dftb_scc/hBN_scc/slakos/B-N.skf
  • examples/dftb_scc/hBN_scc/slakos/C-C.skf
  • examples/dftb_scc/hBN_scc/slakos/C-H.skf
  • examples/dftb_scc/hBN_scc/slakos/H-C.skf
  • examples/dftb_scc/hBN_scc/slakos/H-H.skf
  • examples/dftb_scc/hBN_scc/slakos/N-B.skf
  • examples/dftb_scc/hBN_scc/slakos/N-N.skf
  • examples/dftb_scc/hBN_scc/slakos/Si-Si.skf
  • examples/dftb_scc/hBN_scc/test_ewald.ipynb
  • examples/dftb_scc/hBN_scc/test_hbn_scc_orth.ipynb
  • examples/dftb_scc/hBN_scc/test_scc_hBN.ipynb
  • examples/dftb_scc/hBN_scc/test_scc_hBN_nnsk.ipynb
✅ Files skipped from review due to trivial changes (1)
  • dptb/utils/interpolate/dftbplus_interp.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • dptb/nn/dftb/gamma.py
  • dptb/tests/test_scc_params.py

Comment thread dptb/nn/dftb/interp.py
@QG-phy QG-phy changed the title Add DeePTB the DFTB-SCC support Add DFTB-SCC support to DeePTB May 7, 2026
@AsymmetryChou AsymmetryChou self-requested a review May 7, 2026 14:32
@AsymmetryChou AsymmetryChou self-assigned this May 7, 2026
QG-phy added 6 commits May 8, 2026 00:40
Add tests ensuring DFTBSK preserves r_max in model options when built
directly or from scratch through build_model. Cover scalar, species, and
pair-style r_max inputs to validate configuration handling.

Also clean up DFTBSK load signature indentation and refresh notebook
outputs for the updated DFTBSK SCC/no-SCC example.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dptb/postprocess/unified/properties/dos.py (1)

169-185: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the active overlap path, not the model capability, when postprocessing PDOS.

After switching to self._system.get_eigenstates(...), Line 175 is no longer a reliable proxy for how the returned eigenvectors are represented. In forced-orthogonal SCC on an overlap-capable model, this branch still transposes vecs even though OVERLAP_KEY is absent, which will skew the PDOS weights.

Suggested fix
-            sk = None
-            if self._system.calculator.overlap:
-                # s_out_field is typically 'overlap' or 'edge_overlap' transformed?
-                # data should contain Sk if eigh_solver.s2k was called
-                # Check calculator implementation: it calls s2k.
-                # Eigh stores it in s_out_field ('overlap')
-                sk = data.get(AtomicDataDict.OVERLAP_KEY)
-                # FIX: dptb.nn.energy.Eigh transposes eigenvectors when overlap is present [State, Basis]
-                # We need standardized [Basis, State]
-                if vecs is not None:
-                    vecs = vecs.transpose(-2, -1)
+            sk = data.get(AtomicDataDict.OVERLAP_KEY)
+            # Only apply the overlap-specific eigenvector convention when the
+            # active solve actually produced an overlap matrix.
+            if sk is not None and vecs is not None:
+                vecs = vecs.transpose(-2, -1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/unified/properties/dos.py` around lines 169 - 185, The code
currently decides to transpose eigenvectors based on the model capability
(self._system.calculator.overlap) which is unreliable after switching to
self._system.get_eigenstates; instead, inspect the returned data for an actual
overlap (AtomicDataDict.OVERLAP_KEY / sk) and only apply the vecs.transpose(-2,
-1) when sk is present (i.e., when data.get(AtomicDataDict.OVERLAP_KEY) is not
None) so that get_eigenstates, vecs, and PDOS weighting remain consistent;
update the branch around get_eigenstates, sk, and vecs to use the active overlap
path rather than self._system.calculator.overlap.
🧹 Nitpick comments (1)
dptb/postprocess/charge_pop.py (1)

137-167: 💤 Low value

Optional: align dtype enforcement between Numba and NumPy paths.

direct_diag_rhos/bincount_sum only coerce inputs to contiguous complex128/float64/int64 on the Numba branch. The NumPy fallbacks accept whatever dtype the caller passes, which can silently produce different precisions if call sites ever drift from the current to(torch.complex128).numpy() convention used in cal_mul_charge. Mirroring np.ascontiguousarray(..., dtype=...) in both branches would make the dispatcher dtype-stable regardless of caller hygiene.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/charge_pop.py` around lines 137 - 167, Both direct_diag_rhos
and bincount_sum only enforce dtype/contiguity on the NUMBA_AVAILABLE branch,
causing the NumPy fallbacks to accept arbitrary dtypes; make the dispatcher
dtype-stable by coercing inputs in both branches. In direct_diag_rhos, always
call np.ascontiguousarray(V, dtype=np.complex128), np.ascontiguousarray(SV,
dtype=np.complex128), and np.ascontiguousarray(occ, dtype=np.float64) before
dispatching to either _direct_diag_rhos_numba or _direct_diag_rhos_numpy;
similarly in bincount_sum always coerce diag_vals to np.float64 and orb2atom to
np.int64 (with np.ascontiguousarray) before calling _bincount_sum_numba or
_bincount_sum_numpy so both NumPy and Numba paths get identical dtypes and
memory layout.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dptb/postprocess/unified/system.py`:
- Around line 344-349: The SCC branch in get_eigenstates drops the incoming
solver argument (it calls self.get_hk(...) then self._scc.eigh_solver(...)
without forwarding solver), causing inconsistent behavior; update
get_eigenstates to either forward the solver through to self._scc.eigh_solver
(add a solver parameter to the call and pass it to the SCC eigensolver) or
validate/raise immediately when a non-default solver is requested (fail-fast).
Specifically, ensure the same guard/logic used in the non-SCC (bare) path is
applied for the SCC path by referencing get_eigenstates, get_hk, and
_scc.eigh_solver and either thread the solver argument through or raise an error
when solver != default.
- Around line 168-172: The SCC-related cached state reset currently clears
self.has_bands, self.has_dos and self._scc_state but leaves stale electronic
caches; update that reset block (the same place where
has_bands/has_dos/_scc_state are cleared in the TBSystem) to also clear
self._efermi and self._total_electrons (set them to None or their initial
sentinel) so that after a geometry/structure swap any subsequent band/DOS or SCC
work cannot reuse stale Fermi level or electron-count values.

In `@dptb/tests/test_build_model.py`:
- Around line 145-174: The test test_build_dftbsk_from_scratch_with_r_max is
overwriting the parametrized r_max by reassigning r_max = {"B": ..., "N": ...}
on Line 165; remove that hardcoded reassignment so the function uses the r_max
coming from the pytest.mark.parametrize decorator, and ensure
model_options["dftbsk"]["r_max"] is set from the parameter before calling
build_model (refer to r_max, model_options, and build_model in the test).

In `@dptb/tests/test_dftb_scc.py`:
- Line 576: Assertion messages like f"Reused instance Gamma differs" are using
an unnecessary f-string prefix (triggering Ruff F541); remove the leading f from
these literal messages so they become regular strings (e.g., "Reused instance
Gamma differs"). Update all occurrences mentioned (the messages at "Reused
instance Gamma differs", and the other literal messages reported at the comment:
lines with the same pattern around the file) so each assertion message uses a
plain string instead of an f-string.

In `@dptb/tests/test_occupy.py`:
- Around line 662-670: The comment/docstring contains Unicode multiplication
signs ("×") that trigger confusable-character lint warnings; replace them with
ASCII "x" in the relevant docstring and inline comment around the C dimer test
(the descriptive line that reads "2 C atoms × 4 valence electrons" near the
eigs/eigs_3d/wk/ne block) so the text becomes "2 C atoms x 4 valence electrons"
and re-run linters.

---

Outside diff comments:
In `@dptb/postprocess/unified/properties/dos.py`:
- Around line 169-185: The code currently decides to transpose eigenvectors
based on the model capability (self._system.calculator.overlap) which is
unreliable after switching to self._system.get_eigenstates; instead, inspect the
returned data for an actual overlap (AtomicDataDict.OVERLAP_KEY / sk) and only
apply the vecs.transpose(-2, -1) when sk is present (i.e., when
data.get(AtomicDataDict.OVERLAP_KEY) is not None) so that get_eigenstates, vecs,
and PDOS weighting remain consistent; update the branch around get_eigenstates,
sk, and vecs to use the active overlap path rather than
self._system.calculator.overlap.

---

Nitpick comments:
In `@dptb/postprocess/charge_pop.py`:
- Around line 137-167: Both direct_diag_rhos and bincount_sum only enforce
dtype/contiguity on the NUMBA_AVAILABLE branch, causing the NumPy fallbacks to
accept arbitrary dtypes; make the dispatcher dtype-stable by coercing inputs in
both branches. In direct_diag_rhos, always call np.ascontiguousarray(V,
dtype=np.complex128), np.ascontiguousarray(SV, dtype=np.complex128), and
np.ascontiguousarray(occ, dtype=np.float64) before dispatching to either
_direct_diag_rhos_numba or _direct_diag_rhos_numpy; similarly in bincount_sum
always coerce diag_vals to np.float64 and orb2atom to np.int64 (with
np.ascontiguousarray) before calling _bincount_sum_numba or _bincount_sum_numpy
so both NumPy and Numba paths get identical dtypes and memory layout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b60e4ec2-0849-45ac-b40a-539175de4337

📥 Commits

Reviewing files that changed from the base of the PR and between 055b4cd and 0650445.

📒 Files selected for processing (18)
  • dptb/nn/dftb/dftb_scc.py
  • dptb/nn/dftb/interp.py
  • dptb/nn/dftb/scc_params.py
  • dptb/nn/dftbsk.py
  • dptb/postprocess/charge_pop.py
  • dptb/postprocess/topybinding.py
  • dptb/postprocess/unified/properties/band.py
  • dptb/postprocess/unified/properties/dos.py
  • dptb/postprocess/unified/system.py
  • dptb/tests/test_build_model.py
  • dptb/tests/test_dftb_scc.py
  • dptb/tests/test_dftbsk.py
  • dptb/tests/test_occupy.py
  • dptb/tests/test_tbsystem_scc.py
  • dptb/utils/argcheck.py
  • dptb/utils/interpolate/poly_interp.py
  • examples/dftb_scc/hBN_scc/test_hbn_scc_tbsystem.ipynb
  • examples/dftb_scc/hBN_scc/test_scc_hBN.ipynb
🚧 Files skipped from review as they are similar to previous changes (3)
  • dptb/tests/test_dftbsk.py
  • dptb/nn/dftb/interp.py
  • dptb/utils/interpolate/poly_interp.py

Comment on lines 168 to 172
# Reset state flags
self.has_bands=False
self.has_dos=False
self._scc_state = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear cached electronic state when the structure changes.

Line 171 resets the SCC cache, but _efermi and _total_electrons survive a geometry swap. After an SCC run, reusing the same TBSystem with new atoms can therefore carry a stale Fermi level/electron count into later band/DOS work.

Suggested fix
         # Reset state flags
         self.has_bands=False
         self.has_dos=False
         self._scc_state = None
+        self._efermi = None
+        self._total_electrons = None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/unified/system.py` around lines 168 - 172, The SCC-related
cached state reset currently clears self.has_bands, self.has_dos and
self._scc_state but leaves stale electronic caches; update that reset block (the
same place where has_bands/has_dos/_scc_state are cleared in the TBSystem) to
also clear self._efermi and self._total_electrons (set them to None or their
initial sentinel) so that after a geometry/structure swap any subsequent
band/DOS or SCC work cannot reuse stale Fermi level or electron-count values.

Comment thread dptb/postprocess/unified/system.py Outdated
Comment on lines +344 to +349
hk, sk = self.get_hk(atomic_data=atomic_data, k_points=k_points, use_scc=True)
eigvecs, eigvals = self._scc.eigh_solver(
H_mat=hk,
overlap=self._scc.overlap,
overlap_mat=sk,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't silently ignore solver once SCC is enabled.

The bare path forwards solver to the calculator, but the SCC path drops it entirely. That means the same call changes behavior when SCC is turned on, which is hard to diagnose. Either thread solver through to the SCC eigensolver or fail fast when a custom solver is requested.

Suggested fix
         hk, sk = self.get_hk(atomic_data=atomic_data, k_points=k_points, use_scc=True)
+        if solver is not None:
+            raise NotImplementedError("Custom solver selection is not yet supported for SCC-enabled eigenvalue solves.")
         eigvecs, eigvals = self._scc.eigh_solver(
             H_mat=hk,
             overlap=self._scc.overlap,
             overlap_mat=sk,
         )

Apply the same guard in get_eigenstates().

Also applies to: 368-373

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 345-345: Unpacked variable eigvecs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/postprocess/unified/system.py` around lines 344 - 349, The SCC branch in
get_eigenstates drops the incoming solver argument (it calls self.get_hk(...)
then self._scc.eigh_solver(...) without forwarding solver), causing inconsistent
behavior; update get_eigenstates to either forward the solver through to
self._scc.eigh_solver (add a solver parameter to the call and pass it to the SCC
eigensolver) or validate/raise immediately when a non-default solver is
requested (fail-fast). Specifically, ensure the same guard/logic used in the
non-SCC (bare) path is applied for the SCC path by referencing get_eigenstates,
get_hk, and _scc.eigh_solver and either thread the solver argument through or
raise an error when solver != default.

Comment on lines +145 to +174
@pytest.mark.parametrize(
"r_max",
[
5.0,
{"B": 6.349479778742587, "N": 5.366822193937187},
{"B-B": 6.349479778742587, "B-N": 5.858150986339887, "N-B": 5.858150986339887, "N-N": 5.366822193937187},
],
)
def test_build_dftbsk_from_scratch_with_r_max(r_max):
skdatapath = f"{rootdir}/../../../../examples/hBN_dftb/slakos"
common_options = {
"basis": {
"B": ["2s", "2p"],
"N": ["2s", "2p"]
},
"device": "cpu",
"dtype": "float32",
"overlap": True,
"seed": 3982377700
}
r_max = {"B": 6.349479778742587, "N": 5.366822193937187}
model_options = {
"dftbsk": {
"skdata": skdatapath,
"r_max": r_max,
}
}
model = build_model(None, model_options, common_options)
assert model.name == 'dftbsk'
assert model.model_options["dftbsk"]["r_max"] == r_max
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The parametrized r_max cases never actually run.

Line 165 overwrites the pytest.mark.parametrize value, so this test only exercises the species-dict case. The scalar and pair-style inputs added in the decorator are currently untested.

Suggested fix
 def test_build_dftbsk_from_scratch_with_r_max(r_max):
     skdatapath = f"{rootdir}/../../../../examples/hBN_dftb/slakos"
     common_options = {
         "basis": {
             "B": ["2s", "2p"],
             "N": ["2s", "2p"]
         },
         "device": "cpu",
         "dtype": "float32",
         "overlap": True,
         "seed": 3982377700
     }
-    r_max = {"B": 6.349479778742587, "N": 5.366822193937187}
     model_options = {
     "dftbsk": {
             "skdata": skdatapath,
             "r_max": r_max,
         }
     }
🧰 Tools
🪛 OpenGrep (1.20.0)

[ERROR] 149-149: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.

(coderabbit.pii.credit-card-number)


[ERROR] 150-150: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.

(coderabbit.pii.credit-card-number)


[ERROR] 165-165: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.

(coderabbit.pii.credit-card-number)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_build_model.py` around lines 145 - 174, The test
test_build_dftbsk_from_scratch_with_r_max is overwriting the parametrized r_max
by reassigning r_max = {"B": ..., "N": ...} on Line 165; remove that hardcoded
reassignment so the function uses the r_max coming from the
pytest.mark.parametrize decorator, and ensure model_options["dftbsk"]["r_max"]
is set from the parameter before calling build_model (refer to r_max,
model_options, and build_model in the test).

assert torch.allclose(scc_shift_run1, scc_shift_run2, atol=1e-10), \
f"Reused instance scc_shift differs: {scc_shift_run1} vs {scc_shift_run2}"
assert torch.allclose(Gamma_run1, Gamma_run2, atol=1e-10), \
f"Reused instance Gamma differs"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the unused f prefixes in these assertion messages.

These strings do not interpolate anything, so Ruff reports F541 on the new test module.

Also applies to: 590-590, 733-733, 823-823, 899-899

🧰 Tools
🪛 Ruff (0.15.12)

[error] 576-576: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_dftb_scc.py` at line 576, Assertion messages like f"Reused
instance Gamma differs" are using an unnecessary f-string prefix (triggering
Ruff F541); remove the leading f from these literal messages so they become
regular strings (e.g., "Reused instance Gamma differs"). Update all occurrences
mentioned (the messages at "Reused instance Gamma differs", and the other
literal messages reported at the comment: lines with the same pattern around the
file) so each assertion message uses a plain string instead of an f-string.

Comment thread dptb/tests/test_occupy.py
Comment on lines +662 to +670
For 8 electrons (2 C atoms × 4 valence electrons), we need 4 filled bands.
Fermi level should be around -8.5 eV (between bands 4 and 5).
"""
# C dimer eigenvalues (1 k-point, 8 bands)
eigs = np.array([[-18.606747, -10.180909, -9.288945, -9.288941,
-7.631103, 2.259443, 2.259443, 76.89657]])
eigs_3d = eigs.reshape(1, 1, 8)
wk = np.array([1.0])
ne = 8 # 2 C atoms × 4 valence electrons
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace confusable × characters in docs/comments to avoid lint noise.

Lines 662 and 670 use Unicode multiplication sign (×), which triggers Ruff confusable-character warnings. Prefer ASCII x for consistency and cleaner lint output.

✏️ Minimal doc/comment cleanup
-        For 8 electrons (2 C atoms × 4 valence electrons), we need 4 filled bands.
+        For 8 electrons (2 C atoms x 4 valence electrons), we need 4 filled bands.
@@
-        ne = 8  # 2 C atoms × 4 valence electrons
+        ne = 8  # 2 C atoms x 4 valence electrons
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
For 8 electrons (2 C atoms × 4 valence electrons), we need 4 filled bands.
Fermi level should be around -8.5 eV (between bands 4 and 5).
"""
# C dimer eigenvalues (1 k-point, 8 bands)
eigs = np.array([[-18.606747, -10.180909, -9.288945, -9.288941,
-7.631103, 2.259443, 2.259443, 76.89657]])
eigs_3d = eigs.reshape(1, 1, 8)
wk = np.array([1.0])
ne = 8 # 2 C atoms × 4 valence electrons
For 8 electrons (2 C atoms x 4 valence electrons), we need 4 filled bands.
Fermi level should be around -8.5 eV (between bands 4 and 5).
"""
# C dimer eigenvalues (1 k-point, 8 bands)
eigs = np.array([[-18.606747, -10.180909, -9.288945, -9.288941,
-7.631103, 2.259443, 2.259443, 76.89657]])
eigs_3d = eigs.reshape(1, 1, 8)
wk = np.array([1.0])
ne = 8 # 2 C atoms x 4 valence electrons
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 662-662: Docstring contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF002)


[warning] 670-670: Comment contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF003)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dptb/tests/test_occupy.py` around lines 662 - 670, The comment/docstring
contains Unicode multiplication signs ("×") that trigger confusable-character
lint warnings; replace them with ASCII "x" in the relevant docstring and inline
comment around the C dimer test (the descriptive line that reads "2 C atoms × 4
valence electrons" near the eigs/eigs_3d/wk/ne block) so the text becomes "2 C
atoms x 4 valence electrons" and re-run linters.

QG-phy added 5 commits May 9, 2026 23:58
Update TBSystem SCC resolution so enabling SCC makes default HK access use
the SCC path even before state is available. This preserves the expected
RuntimeError when SCC has not been run, while still allowing explicit
bare Hamiltonian access with use_scc=False.

Add regression coverage for the enabled-but-not-run SCC workflow.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

This advisory review plan was generated from changed file names using trusted base-branch code.
此审查计划由受信任的 base 分支代码根据变更文件名生成,仅作为维护者辅助。

DeePTB PR Review Plan / DeePTB PR 审查计划

Risk / 风险等级: High (高) · Changed files / 变更文件: 141

Why / 风险来源

  • Config schema changed. / 配置 schema 有变更。
  • Postprocess and export changed. / 后处理与导出 有变更。
  • Tests changed. / 测试 有变更。
  • Examples and configs changed. / 示例和配置有变更。
  • Some files are unclassified by current risk_map.md. / 部分文件尚未被当前风险地图覆盖。

Recommended Review / 建议审查重点

  • Run the Maintainer Review Prompt. / 运行维护者审查 prompt。
  • Run the Test Gap Review Prompt. / 运行测试缺口审查 prompt。
  • Focus human review on config/API/data/checkpoint compatibility. / 人工重点看配置、API、数据和 checkpoint 兼容性。
Detailed risk areas
  • Config schema (High): dptb/utils/argcheck.py
  • Postprocess and export (Medium): dptb/postprocess/init.py, dptb/postprocess/charge_pop.py, dptb/postprocess/elec_struc_cal.py, dptb/postprocess/topybinding.py, dptb/postprocess/totbplas.py, ... (11 files)
  • Tests (Medium): dptb/tests/data/ch4/CH4.vasp, dptb/tests/data/dftb/B-B.skf, dptb/tests/data/dftb/B-N.skf, dptb/tests/data/dftb/C-C.skf, dptb/tests/data/dftb/C-H.skf, ... (50 files)
  • Examples and configs (Low): examples/dftb_scc/README.md, examples/dftb_scc/benchmark_eos_dftbp.ipynb, examples/dftb_scc/hBN_dftb/band.json, examples/dftb_scc/hBN_dftb/band_plot.ipynb, examples/dftb_scc/hBN_dftb/data/kpath.0/eigenvalues.npy, ... (49 files)
  • Unclassified by current risk_map.md: dptb/kpoints/init.py, dptb/kpoints/geometry.py, dptb/kpoints/mesh.py, dptb/kpoints/path.py, dptb/kpoints/reduction.py, dptb/kpoints/sampling.py, dptb/nn/deeptb.py, dptb/nn/dftb/dftb_scc.py, dptb/nn/dftb/gamma.py, dptb/nn/dftb/hopping_dftb.py, dptb/nn/dftb/interp.py, dptb/nn/dftb/scc_mixer.py, dptb/nn/dftb/scc_params.py, dptb/nn/dftb/sk_param.py, dptb/nn/dftb2nnsk.py, dptb/nn/dftbsk.py, dptb/nn/energy.py, dptb/nn/nnsk.py, dptb/nn/sktb/HubbardUDB.py, dptb/nn/sktb/electronic_configDB.py, ... (30 files)
Human review focus
  • Config schema: defaults, backward compatibility, docs/examples alignment
  • Postprocess and export: exported formats, unit conventions, and compatibility
  • Tests: marker choice, regression coverage, skipped environments, and assertions
  • Examples and configs: valid JSON, command consistency, and config/schema compatibility
  • Confirm compatibility for public API, CLI/config schema, checkpoints, and datasets.
Local commands and hold conditions

Suggested local commands:

  • python scripts/ci/check_repository_hygiene.py
  • uv run pytest ./dptb/tests -m regression
  • uv run pytest ./dptb/tests

Hold conditions:

  • CI failed and the failure is not explained.
  • The PR scope is unclear or mixes unrelated changes.
  • AI review found a plausible correctness issue that was not fixed or explicitly waived.
  • High-risk behavior changed without a regression test or documented waiver.
  • Config, checkpoint, dataset, tensor key/order, or numerical compatibility is unclear.

Advisory only. / 仅作为审查辅助。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants