Skip to content

Implements CSV rotation for job log files#842

Merged
calvinp0 merged 2 commits intomainfrom
csv_rotation
Mar 28, 2026
Merged

Implements CSV rotation for job log files#842
calvinp0 merged 2 commits intomainfrom
csv_rotation

Conversation

@calvinp0
Copy link
Copy Markdown
Member

This was originally an issue the team experienced once in a blue moon where we got OSError because the csvs had become too large.

So this feature adds a mechanism to automatically rotate job log CSV files (initiated_jobs.csv and completed_jobs.csv) once they exceed a predefined maximum number of lines (default 10,000).

This prevents these files from growing indefinitely, which can improve performance during file operations and better manage disk space. When rotated, the existing file is renamed with a timestamp suffix, archiving its content, and a new file is then implicitly created for ongoing logs.

Includes dedicated unit tests to ensure the rotation logic functions as expected under various conditions. Also incorporates some minor test data refactoring and housekeeping.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds log-CSV rotation to ARC’s job logging to prevent initiated_jobs.csv / completed_jobs.csv from growing unbounded (and occasionally triggering OSError), and updates related test fixtures.

Changes:

  • Add JobAdapter._rotate_csv_if_needed() and invoke it before reading/writing job log CSVs.
  • Add unit tests covering basic rotation/no-rotation behavior.
  • Refactor the ServerTimeLimit test fixture setup and add a PBS walltime err.txt fixture; remove unused old fixture files.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
arc/job/adapter.py Introduces CSV rotation helper and hooks it into initiated/completed job CSV handling.
arc/job/adapter_test.py Adds rotation unit tests; adjusts ServerTimeLimit fixture setup to copy a PBS err.txt.
arc/testing/server/pbs/timelimit/err.txt New PBS walltime error fixture used by tests.
arc/testing/test_JobAdapter_ServerTimeLimit/calcs/Species/spc1/spc1/submit.sub Removed unused/obsolete fixture.
arc/testing/test_JobAdapter_ServerTimeLimit/calcs/Species/spc1/spc1/input.gjf Removed unused/obsolete fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +537 to +538


Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rotation tests don’t cover the same-day collision case implied by the current date-only archive suffix. Adding a test that performs two rotations with the same mocked date/time (and asserting two distinct archives) would prevent regressions once the archive naming is made unique.

Suggested change
def test_same_day_multiple_rotations(self):
"""Test that two rotations on the same day produce distinct archive files."""
with tempfile.TemporaryDirectory() as tmp:
csv_path = os.path.join(tmp, 'jobs.csv')
# Both rotations use the same mocked datetime
fixed_dt = datetime.datetime(2026, 3, 1, 12, 0, 0)
# First rotation on the fixed "day/time"
self._make_csv(csv_path, 50)
with patch('arc.job.adapter.datetime') as mock_dt:
mock_dt.datetime.now.return_value = fixed_dt
mock_dt.timedelta = datetime.timedelta
JobAdapter._rotate_csv_if_needed(csv_path, max_lines=50)
self.assertFalse(os.path.isfile(csv_path))
# Second rotation on the same fixed "day/time"
self._make_csv(csv_path, 50)
with patch('arc.job.adapter.datetime') as mock_dt:
mock_dt.datetime.now.return_value = fixed_dt
mock_dt.timedelta = datetime.timedelta
JobAdapter._rotate_csv_if_needed(csv_path, max_lines=50)
self.assertFalse(os.path.isfile(csv_path))
archives = glob.glob(os.path.join(tmp, 'jobs.old.*.csv'))
# Expect two distinct archives even though the datetime is the same
self.assertEqual(len(archives), 2)
self.assertEqual(len({os.path.basename(p) for p in archives}), 2)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, highly improbably that this scenario will occur

Comment on lines +503 to +512
def test_rotation_at_threshold(self):
"""Test that the file is rotated when it reaches the threshold."""
with tempfile.TemporaryDirectory() as tmp:
csv_path = os.path.join(tmp, 'jobs.csv')
self._make_csv(csv_path, 10000)
JobAdapter._rotate_csv_if_needed(csv_path, max_lines=10000)
self.assertFalse(os.path.isfile(csv_path))
archives = glob.glob(os.path.join(tmp, 'jobs.old.*.csv'))
self.assertEqual(len(archives), 1)

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_rotation_at_threshold writes/reads a 10,000-line file, which can slow the unit test suite. Since max_lines is a parameter, consider using a much smaller threshold (e.g., 20–100) to validate the boundary condition without heavy I/O, and keep a separate assertion that the default value is wired correctly if needed.

Copilot uses AI. Check for mistakes.
Comment on lines +668 to +674
Rotate a CSV file if it exceeds ``max_lines`` lines (including the header).
The archived file is renamed with a timestamp suffix in the same directory.
A fresh file (with headers) will be created by the caller as needed.

Args:
csv_path (str): The path to the CSV file.
max_lines (int): The maximum number of lines before rotation.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says rotation happens when the file "exceeds" max_lines, but the implementation rotates when line_count >= max_lines (i.e., at the threshold). Please align the wording and behavior (either change the condition to > or update the docstring/parameter description to say "reaches or exceeds").

Suggested change
Rotate a CSV file if it exceeds ``max_lines`` lines (including the header).
The archived file is renamed with a timestamp suffix in the same directory.
A fresh file (with headers) will be created by the caller as needed.
Args:
csv_path (str): The path to the CSV file.
max_lines (int): The maximum number of lines before rotation.
Rotate a CSV file if it reaches or exceeds ``max_lines`` lines (including the header).
The archived file is renamed with a timestamp suffix in the same directory.
A fresh file (with headers) will be created by the caller as needed.
Args:
csv_path (str): The path to the CSV file.
max_lines (int): The maximum number of lines at or above which rotation occurs.

Copilot uses AI. Check for mistakes.
Comment on lines 695 to 699
# 1. Set job_number.
local_arc_path_ = local_arc_path if os.path.isdir(local_arc_path) else ARC_PATH
csv_path = os.path.join(local_arc_path_, 'initiated_jobs.csv')
self._rotate_csv_if_needed(csv_path)
if os.path.isfile(csv_path):
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_set_job_number() now reads initiated_jobs.csv an extra full pass just to count lines (rotation), and then reads it again to compute job_num. Since job_num is already derived by iterating all rows, consider reusing that count to decide rotation (or combining the logic) to avoid the extra O(n) file scan on every job submission.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.65%. Comparing base (1a72830) to head (7a890a8).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #842      +/-   ##
==========================================
+ Coverage   58.58%   58.65%   +0.06%     
==========================================
  Files          97       97              
  Lines       29203    29222      +19     
  Branches     7752     7755       +3     
==========================================
+ Hits        17110    17140      +30     
+ Misses       9889     9882       -7     
+ Partials     2204     2200       -4     
Flag Coverage Δ
functionaltests 58.65% <ø> (+0.06%) ⬆️
unittests 58.65% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds logic to rotate `initiated_jobs.csv` and `completed_jobs.csv` once they reach a threshold (10,000 lines). This prevents these log files from growing indefinitely, ensuring better performance and disk space management. The archived files are renamed with a date-based suffix.

Also includes minor cleanup in `adapter_test.py` to improve directory handling and fix path-related issues in existing tests.

Optimize CSV rotation to avoid redundant file reads

Calculates the line count while parsing the job log to determine the job number, then passes this count to the rotation logic. This prevents reading the entire file twice. Tests were also updated to use smaller line thresholds for better performance.
Did not like how we held the `err.txt` example in a generated test folder whilst also holding onto the irrelevant `gjf` and `sub` file. Here it is moved to a better spot (could be changed again in the future...) and this way when we run the test, it copies the error file to the designated folder for testing and then afterwards we can aggressively remove the generated test folder.
@calvinp0 calvinp0 merged commit 6112a7d into main Mar 28, 2026
6 of 7 checks passed
@calvinp0 calvinp0 deleted the csv_rotation branch March 28, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants