Skip to content

Aggregator overhaul and handling Not Provided keys (Issues 484, 515, 524)#530

Merged
liefeld merged 10 commits intomainfrom
issue-484
Mar 24, 2026
Merged

Aggregator overhaul and handling Not Provided keys (Issues 484, 515, 524)#530
liefeld merged 10 commits intomainfrom
issue-484

Conversation

@jluebeck
Copy link
Copy Markdown
Member

@jluebeck jluebeck commented Mar 14, 2026

Summary

  • Updates caper to use the new AmpliconSuiteAggregator v6.0.0 Python API (Aggregator(input_paths=..., project_name=..., work_dir=...)). It is highly similar but not identical.
  • Adds AA_cycles_file and AA_graph_file as recognized schema fields; makes AA_summary_file optional
  • Records aggregator_version on each project in MongoDB
  • Fixes sample page crash when AA_amplicon_number is 'Not Provided' or other sentinel strings (issue Views.py img_file = fs_handle.get(ObjectId(feature_id)).read() does not handle "Not Provided" gracefully #515): sentinels are now
    normalized to None, invalid features are filtered from the table and plot
  • Fixes stacked bar chart and project table showing NA/None as a classification and issue with legacy naming (issue Summary barchart does not handle mixed legacy/current classification names #524)
  • Fixes S3 download key path when S3_DOWNLOADS_BUCKET_PATH lacks a trailing /
  • Sample download archives now include only result_data.tsv (not the redundant .json version too)
  • cnvkit.tar.gz not sent to MongoDB anymore - not needed for anything, also it is removed from sample download. The CNV_CALLS.bed is enough
  • The new Aggregator keeps the AA_results as a directory, not a .tar.gz. This speeds up aggregation, and decompression for re-aggregation. It also means we do not need to keep compressed and uncompressed copies of specific AA files (e.g. plots), and the .PDF plots do not benefit from compression anyways, so this shrinks archives significantly.
  • A .tar.gz version of the AA outputs is created immediately before sending files to GridFS, and it does not propagate into the complete archive. It will be included in the sample download.
  • The new aggregations are much more robust and it produces an identical structure across archives, regardless of how the input files are configured.
  • Aggregator 6.0 implements a deep renaming strategy. If a --name_map file is given, then matching samples will be renamed at the file level, and within files.

Testing

Setup:
I have not made a pip release for AmpliconSuiteAggregator 6.0.0, but you could do that and update requirements.txt, however, in lieu of a release you can configure a local Aggregator usage by doing the following:

  1. Clone https://github.com/AmpliconSuite/AmpliconSuiteAggregator. Set the appropriate variable in config.sh to the src/ directory, e.g. export AGGREGATOR_DEV_PATH='/home/jens/Dropbox/BafnaLab/ecDNA/AmpliconSuiteAggregator/src
  2. source config.sh if needed
  3. Start the server normally.

Things to test:

  • Create a new project by uploading a .tar.gz AmpliconSuite results archive. Confirm it processes without error and aggregator_version is visible in the DB.
  • Edit/re-aggregate an existing project via the project edit page — confirm the new Aggregator call signature works and the project updates correctly.
  • Project page: Check plots on project page and check that project table does not contain weird "NA" values.
  • Project download: confirm the S3 key is correctly formed (no missing / between bucket path and project ID) and download works.
  • Sample download: download a sample archive and confirm it contains result_data.tsv but not result_data.json. CNV_CALLS.bed and AA results.tar.gz should be present.
  • Legacy projects: Confirm that sample and project downloads still work as expected (they will not include the newly disincluded files, that is fine).
  • Test name renaming and does it break metadata mapping?

@jluebeck jluebeck requested a review from liefeld March 14, 2026 02:25
@jluebeck jluebeck changed the title Aggregator overhaul and handling Not Provided keys (Issue 484 & issue 524) Aggregator overhaul and handling Not Provided keys (Issues 484, 515, 524) Mar 19, 2026
@liefeld liefeld merged commit 4faa948 into main Mar 24, 2026
@liefeld liefeld deleted the issue-484 branch March 24, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants