Skip to content

Arkane matching method name#814

Merged
calvinp0 merged 7 commits intomainfrom
arkane_match
Mar 29, 2026
Merged

Arkane matching method name#814
calvinp0 merged 7 commits intomainfrom
arkane_match

Conversation

@calvinp0
Copy link
Copy Markdown
Member

@calvinp0 calvinp0 commented Jan 6, 2026

When ARC hands off to Arkane for thermochemistry, it needs to look up atom energy corrections (AEC), bond additivity corrections (BAC), and frequency scale factors from the RMG-database quantum_corrections/data.py. That file uses RMG's own naming convention — all lowercase, no hyphens, and sometimes year-suffixed variants like b97d32023 alongside b97d3.

Previously, ARC did exact string matching against these entries. If a user wrote CCSD(T)-F12/cc-pVTZ-F12 and the database had ccsd(t)f12/ccpvtzf12, it wouldn't match. Users had to know RMG's internal naming to get corrections applied.

Changes

  1. Level.year attribute
    Adds an optional year (4-digit int) to the Level class. Validated on construction, included in str, simple(), as_dict(), and build() for serialization/restart support.
  2. Fuzzy matching engine
  • _normalize_method: lowercase, strip hyphens → DLPNO-CCSD(T)-F12 becomes dlpnoccsd(t)f12
  • _normalize_basis: lowercase, strip hyphens and spaces → cc-pVTZ-F12 becomes ccpvtzf12
  • _split_method_year: extracts trailing 4-digit year → b97d32023 → (b97d3, 2023)
  • _parse_lot_params: regex-extracts method/basis/software from LevelOfTheory(...) strings
  • _find_best_level_key_for_sp_level: core matcher — iterates all LevelOfTheory(...) keys in a data.py section, normalizes both sides, and selects by year priority:
    • Explicit year → exact match only
    • No year specified + no-year entry exists → prefer it
    • No year specified + only year-suffixed entries → pick latest
  • _find_best_across_files: searches multiple data.py locations in priority order
  • _warn_no_match / _all_available_years: actionable warnings listing what years are available
  • _extract_section: now supports section_end=None (read to EOF), fixing the freq_dict section which is last in the file
  1. arkane_level_of_theory changes in input.yml

The year is specified on arkane_level_of_theory, the Level object dedicated to Arkane lookups:

  arkane_level_of_theory:
    method: b97d3
    basis: def2tzvp
    year: 2023

Most users won't need year at all — the fuzzy matching handles the common case automatically.

  1. Doc updates
    Documents the year key on arkane_level_of_theory, warns against putting year suffixes in level_of_theory, explains the fallback behavior.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for optional 4-digit year suffixes to distinguish method variants in the Arkane database (e.g., b97d3 vs b97d32023). The implementation allows users to omit the year (ARC matches the latest available), specify an explicit year, or receive warnings when a requested year is unavailable.

Key changes:

  • Added year parameter to the Level class with validation requiring 4-digit integers
  • Implemented flexible method/basis matching logic with year-aware searching in Arkane database
  • Enhanced error reporting to list available years when a requested year is not found

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/source/advanced.rst Documents the new year parameter with usage examples
arc/level.py Adds year attribute with 4-digit validation and updates string representations
arc/level_test.py Adds test for year validation (rejects 2-digit, accepts 4-digit)
arc/statmech/arkane.py Core implementation: adds normalization functions, year-aware matching logic, and improved error messages; also adds spinMultiplicity to non-SMILES species template and thread limit environment variables
arc/statmech/arkane_test.py Adds test for year-not-found warning behavior and year in _level_to_str

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@calvinp0 calvinp0 force-pushed the arkane_match branch 2 times, most recently from 7955fd4 to 67e2cfe Compare January 7, 2026 13:55
@calvinp0 calvinp0 force-pushed the arkane_match branch 2 times, most recently from 1926671 to 773ab92 Compare January 22, 2026 11:27
@calvinp0 calvinp0 requested review from Lilachn91 and Copilot January 22, 2026 17:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +157 to +158
and are not valid QC methods. Do not include year suffixes in ``level_of_theory``; instead, set
``arkane_level_of_theory`` with a ``year`` value if you need a specific correction year.
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation incorrectly advises users to set year in arkane_level_of_theory, but based on the PR description and implementation, the year parameter should be set directly on sp_level (or other level arguments like freq_level, opt_level). The arkane_level_of_theory is typically derived from sp_level unless explicitly overridden. Users should be advised to set the year parameter on sp_level or other level_of_theory parameters, not on arkane_level_of_theory specifically.

Suggested change
and are not valid QC methods. Do not include year suffixes in ``level_of_theory``; instead, set
``arkane_level_of_theory`` with a ``year`` value if you need a specific correction year.
and are not valid QC methods. Do not include year suffixes in ``level_of_theory``; instead, specify a
``year`` key on ``sp_level`` (and/or other job-specific level dictionaries such as ``opt_level``,
``freq_level``, or ``scan_level``) if you need a specific correction year.

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +91
set ``arkane_level_of_theory`` with a ``year`` value instead.
If ``year`` is omitted, ARC will prefer the no-year Arkane entry for that method/basis; if none exists,
ARC will fall back to the latest available year in the Arkane database.
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation incorrectly advises users to set year in arkane_level_of_theory. Based on the PR implementation and examples shown earlier in the documentation (lines 98-112), the year parameter should be set directly on sp_level or other level parameters (opt_level, freq_level, etc.), not specifically on arkane_level_of_theory. The arkane_level_of_theory is typically auto-derived from sp_level unless explicitly overridden.

Suggested change
set ``arkane_level_of_theory`` with a ``year`` value instead.
If ``year`` is omitted, ARC will prefer the no-year Arkane entry for that method/basis; if none exists,
ARC will fall back to the latest available year in the Arkane database.
set a ``year`` field on the relevant level parameter (e.g., ``sp_level``, ``opt_level``, ``freq_level``,
etc.) rather than on ``level_of_theory`` or ``arkane_level_of_theory``. If ``year`` is omitted, ARC will
prefer the no-year Arkane entry for that method/basis; if none exists, ARC will fall back to the latest
available year in the Arkane database.

Copilot uses AI. Check for mistakes.
f"available years: {_format_years(years)}. "
f"Specify a year to select a matching entry."
)
return _level_to_str(sp_level)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent behavior when Arkane AEC entry is not found: For composite methods (line 978), the function returns _level_to_str(sp_level) even when no matching entry exists in the database. However, for non-composite methods with freq_scale_factor (line 1005), it returns None. This inconsistency could lead to Arkane attempting to use a level of theory that lacks proper corrections. Both cases should handle missing entries consistently, likely by returning None after logging the warning.

Suggested change
return _level_to_str(sp_level)
# No matching AEC level in Arkane DB for this composite method
return None

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calvinp0, can you take a look at this co-pilot comment?

# Iodine SOC calculated as a weighted average of the electronic spin splittings of the lowest energy state.
# The splittings are obtained from Huber, K.P.; Herzberg, G., Molecular Spectra and Molecular Structure. IV.
# Constants of Diatomic Molecules, Van Nostrand Reinhold Co., 1979
# Spin orbit correction for F, Si, Cl, Br and B taken form https://cccbdb.nist.gov/elecspin.asp
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: 'form' should be 'from'

Suggested change
# Spin orbit correction for F, Si, Cl, Br and B taken form https://cccbdb.nist.gov/elecspin.asp
# Spin orbit correction for F, Si, Cl, Br and B taken from https://cccbdb.nist.gov/elecspin.asp

Copilot uses AI. Check for mistakes.
Comment on lines +779 to +782
def _available_years_for_level(level: "Level",
file_path: str,
section_start: str,
section_end: str) -> list[Optional[int]]:
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hint inconsistency: This function uses lowercase list[Optional[int]] for the return type, but the file imports and uses List from typing module elsewhere (see line 9 and line 683). For consistency with the rest of the codebase, use List[Optional[int]] instead of list[Optional[int]].

Copilot uses AI. Check for mistakes.
return sorted(years, key=lambda y: (-1 if y is None else y))


def _format_years(years: list[Optional[int]]) -> str:
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hint inconsistency: This function parameter uses lowercase list[Optional[int]], but the file imports and uses List from typing module elsewhere (see line 9 and line 683). For consistency with the rest of the codebase, use List[Optional[int]] instead of list[Optional[int]].

Suggested change
def _format_years(years: list[Optional[int]]) -> str:
def _format_years(years: List[Optional[int]]) -> str:

Copilot uses AI. Check for mistakes.

target_method_norm = _normalize_method(level.method)
target_base, method_year = _split_method_year(target_method_norm)
target_year = getattr(level, 'year', None) if getattr(level, 'year', None) is not None else method_year
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code allows year suffixes in method names (e.g., 'b97d32023') and extracts them for matching, but the documentation explicitly advises against including year suffixes in method names. Consider adding validation to warn or raise an error if a user provides both a year suffix in the method name AND an explicit year parameter with conflicting values, to prevent user confusion and potential bugs.

Suggested change
target_year = getattr(level, 'year', None) if getattr(level, 'year', None) is not None else method_year
explicit_year = getattr(level, 'year', None)
if explicit_year is not None and method_year is not None and explicit_year != method_year:
raise InputError(
f"Conflicting year specifications for level '{level}': "
f"explicit year={explicit_year}, method suffix year={method_year}. "
"Please remove the year suffix from the method name or update the 'year' attribute to match."
)
target_year = explicit_year if explicit_year is not None else method_year

Copilot uses AI. Check for mistakes.
Comment on lines +936 to +937
If multiple entries only differ by year, the one with the *latest* year
is chosen (year=0 if no year in that entry).
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring description at lines 936-937 is misleading. It states that "the one with the latest year is chosen" but this only applies when the user does NOT specify a year AND no no-year entry exists. When a user specifies a year explicitly via the year parameter, only that exact year will be matched. The docstring should clarify that the latest-year fallback only occurs when no year is specified and no no-year entry exists.

Suggested change
If multiple entries only differ by year, the one with the *latest* year
is chosen (year=0 if no year in that entry).
When a year is explicitly specified in the Level, only entries with that exact
year are matched. If no year is specified and an entry without a year exists,
that entry is used. Only when no year is specified and no no-year entry exists,
if multiple entries differ only by year, the one with the *latest* year is
chosen (treating entries with no year as year=0).

Copilot uses AI. Check for mistakes.
Comment on lines +947 to +949
qm_corr_files = _get_qm_corrections_files()
if not qm_corr_files:
return None
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no quantum corrections files are found (qm_corr_files is empty), the function silently returns None without any warning or error message. Consider adding a logger.warning() to inform users that no quantum corrections database files were found, which would help with debugging configuration issues.

Copilot uses AI. Check for mistakes.
@calvinp0 calvinp0 force-pushed the arkane_match branch 2 times, most recently from aeabe6c to 8dee498 Compare January 25, 2026 11:41
@alongd
Copy link
Copy Markdown
Member

alongd commented Feb 2, 2026

@calvinp0, can you please see which of the copilot comments are relevant?

Copy link
Copy Markdown
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added a couple of comments, both on the same topic

arc/main.py Outdated
e.g., 'ZINDO/2', 'DLPNO-MP2-F12/D'
For these cases, use the dictionary-type job-specific level of theory arguments
instead (e.g., ``opt_level``).
level_of_theory_year (int or str, optional): An optional 4-digit year suffix to apply to the single-point
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be an arg of ARC. Instead, ARC has individual Level objects for sp_level, opt_liver, etc., and each one has a year attr as you implemented in the prev commit. I recommend removing this from main

for Arkane database matching only and are not valid QC methods. If you need a specific correction year,
specify either:

- ``level_of_theory_year: 2023`` (applies to the SP side of ``level_of_theory``), or
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead only give the year in the Level argument, and then use it just for Arkane and not for the ESS calcs?

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.81%. Comparing base (7a890a8) to head (0db4191).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
+ Coverage   58.65%   58.81%   +0.15%     
==========================================
  Files          97       97              
  Lines       29222    29355     +133     
  Branches     7755     7791      +36     
==========================================
+ Hits        17140    17264     +124     
- Misses       9882     9885       +3     
- Partials     2200     2206       +6     
Flag Coverage Δ
functionaltests 58.81% <ø> (+0.15%) ⬆️
unittests 58.81% <ø> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @calvinp0!
Please see some minor comments. There's also one comment from co-pilot that's worth looking at, I responded to it

)
if year is not None:
self.year = int(year)
if self.year < 1000 or self.year > 9999:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why limit ARC just up to 9999 AD? :)


self.level_of_theory = '' # Reset the level_of_theory argument to avoid conflicts upon restarting ARC.

def _warn_year_on_non_arkane_levels(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you implement this in level.py, and call it here? just so we store level-related-things at one place

"""
if basis is None:
return None
return basis.replace('-', '').replace(' ', '').lower()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect to have white spaces in the basis set?
Regardless, can this function be combined with _normalize_method()?

year: 2023

THe following are examples for **equivalent** definitions::
If ``year`` is omitted, ARC will prefer the no-year Arkane entry for that method/basis. If none exists,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what "If none exists" means. If we don't have an entry without a year and neither an entry with a year? Then what do we fall back to?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the wording. Although do we not fall back to the quantum corrections file in ARC?


Note: Year suffixes in the method (e.g., ``wb97xd32023``) are meant for Arkane database matching
and are not valid QC methods. Do not include year suffixes in ``level_of_theory``; instead, specify a
``year`` key on ``arkane_level_of_theory`` if you need a specific correction year.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this clarification to the users. To make it clearer, I suggest instead of if you need a specific correction year. to write explicitly if you need a specific atom or bond energy correction year. [See here for all of Arkane's corrections](https://github.com/ReactionMechanismGenerator/RMG-database/blob/main/input/quantum_corrections/data.py)

@calvinp0
Copy link
Copy Markdown
Member Author

Thanks, @calvinp0! Please see some minor comments. There's also one comment from co-pilot that's worth looking at, I responded to it

I am not sure which comment you mean. I see Outdated ones and a Fixed one

- Introduced an optional year attribute for differentiating methods.
- Implemented validation for the year attribute to ensure it is a 4-digit integer.
- Added unit tests for year validation and logging warnings when year is set.
alongd
alongd previously approved these changes Mar 29, 2026
- Added support for year suffix in LevelOfTheory methods.
- Implemented functions to extract and normalize method names with years.
- Updated get_arkane_model_chemistry to select the latest year when no year is specified.
- Added unit tests for year handling, including cases for missing years and conflicting year specifications.
@calvinp0 calvinp0 requested a review from alongd March 29, 2026 11:16
Copy link
Copy Markdown
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@calvinp0 calvinp0 merged commit 73d661c into main Mar 29, 2026
8 checks passed
@calvinp0 calvinp0 deleted the arkane_match branch March 29, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants