Problem
A single AutoRecLab run can produce output directories of ~50 GB or more.
This is far too large for most machines and makes it impractical to store or
compare multiple experiments.
Expected behaviour
A single run should only consume storage proportional to the actual model
outputs (generated code, results, statistics). Storage should not silently
balloon due to implementation artefacts.
Likely causes to investigate
- Checkpoints: The
checkpoint/ subfolder saves the full workspace state
for every tree node. If the agent downloads or generates large datasets, every
node checkpoint duplicates that data.
- Workspace accumulation: The
workspace/ directory accumulates all files
produced by the generated code (downloaded datasets, trained model weights,
intermediate files) and is never pruned between nodes.
keep_only_relevant_files = false (current default): All intermediate
files are retained. Switching to true already deletes some artefacts, but
apparently not enough.
- No size cap / warning: There is currently no mechanism to alert the user
when the output directory exceeds a configurable threshold.
Problem
A single AutoRecLab run can produce output directories of ~50 GB or more.
This is far too large for most machines and makes it impractical to store or
compare multiple experiments.
Expected behaviour
A single run should only consume storage proportional to the actual model
outputs (generated code, results, statistics). Storage should not silently
balloon due to implementation artefacts.
Likely causes to investigate
checkpoint/subfolder saves the full workspace statefor every tree node. If the agent downloads or generates large datasets, every
node checkpoint duplicates that data.
workspace/directory accumulates all filesproduced by the generated code (downloaded datasets, trained model weights,
intermediate files) and is never pruned between nodes.
keep_only_relevant_files = false(current default): All intermediatefiles are retained. Switching to
truealready deletes some artefacts, butapparently not enough.
when the output directory exceeds a configurable threshold.