Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
6b4462b
feat: add discovery mode for barcode identification without whitelist
eos-jin Mar 27, 2026
f18a3e6
Add parameter validation for discovery_mode and clone_barcodes_reference
eos-jin Mar 27, 2026
1857a5b
Add test suite and synthetic test data for discovery mode
eos-jin Mar 27, 2026
b0c9918
Fix test data structure: include both 5' and 3' adapters
eos-jin Mar 27, 2026
e14eb6e
Fix workflow structure and improve test suite
eos-jin Mar 27, 2026
eeac3e6
Remove tenx_whitelist parameter - simplify discovery mode
eos-jin Mar 27, 2026
d51e658
Add flexiplex-filter to extract_sc_env for discovery mode
Mar 30, 2026
e02c08c
Merge feature/discovery-mode into main (fork)
Mar 30, 2026
45552e4
Remove defaults channel from conda envs (WEHI HPC Anaconda policy)
Mar 30, 2026
2435207
Sync defaults channel removal from main
Mar 30, 2026
d679c14
Add filter_discovered_barcodes parameter for low-clone-count datasets
Mar 30, 2026
ee1881f
Update README: document filter_discovered_barcodes parameter
Mar 30, 2026
ce5aca0
Fix: keep all discovered barcodes by default (--no-inflection)
eos-jin Apr 1, 2026
910fb9b
Add report generators: single-run and comparison HTML dashboards
eos-jin Apr 1, 2026
e8e20c6
UX: add dropdown sample selector to comparison report
eos-jin Apr 1, 2026
8cab366
Auto-generate HTML report as final Nextflow step
eos-jin Apr 1, 2026
d02b6e3
Fix: charts blank on load - defer auto-select to window.load event
eos-jin Apr 1, 2026
c7dc04a
Fix: remove duplicate 'Sample: xxx' heading below dropdown
eos-jin Apr 1, 2026
ac21572
Merge feature/discovery-mode into main
eos-jin Apr 1, 2026
90e5534
README: rewrite for clarity, fix outdated filtering description
eos-jin Apr 1, 2026
af19581
Rename auto-generated report to nextclone_qc_report.html
eos-jin Apr 1, 2026
9118e95
Fix: remove sc_merge_discovered_barcodes_nofilter - merge both modes …
eos-jin Apr 9, 2026
486b1f8
feat: Enhanced report v2 with overlap table, heterogeneity metrics, d…
eos-jin Apr 9, 2026
54e19fe
docs: Add detailed CLI usage examples for single report generation
eos-jin Apr 9, 2026
8e2630c
docs: Update main README with v2 report features + CLI usage
eos-jin Apr 9, 2026
c260031
fix: Address feedback - remove avg metrics, fix run mode, density chart
eos-jin Apr 9, 2026
396d218
fix: Sort cross-sample charts alphabetically
eos-jin Apr 9, 2026
f015e9e
feat: Add all_barcodes.txt, run_log.txt, and fix filtering issue
eos-jin Apr 9, 2026
c5e33a4
chore: Remove backup file generate_report.py.bak
eos-jin Apr 9, 2026
1dcf743
fix: Don't call flexiplex-filter when filtering disabled (root cause …
eos-jin Apr 10, 2026
c7c8e8c
fix: Gini/Shannon to 2 decimals, add barcode header, enhance run_log
eos-jin Apr 10, 2026
5e61028
fix: Enable mamba for faster/more reliable conda env management
eos-jin Apr 10, 2026
e1bb4dd
docs: Add Output Management section to README
eos-jin Apr 10, 2026
19c3acb
fix: Add validation for combined_barcodes_counts.txt + debug output
eos-jin Apr 10, 2026
6b37f6e
fix: Use cp instead of cat for filtered_barcodes.txt when filtering d…
eos-jin Apr 10, 2026
2f334e2
feat: Add comprehensive debugging to sc_merge_discovered_barcodes
eos-jin Apr 10, 2026
f4cb150
fix: Escape all bash $ variables in Nextflow template string
eos-jin Apr 10, 2026
165b480
fix: Two critical bugs in discovery mode pipeline
eos-jin Apr 10, 2026
f1e6753
chore: Remove dead sc_filter_discovered_barcodes process
eos-jin Apr 10, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,206 @@ It is heavily optimised for usage in high-performance computing (HPC) platforms.

For instructions on how to use *NextClone*, please visit the [user guide](https://phipsonlab.github.io/NextClone/).

## Modes

### Whitelist mode (default)

Provide a list of known barcode sequences. Flexiplex maps all reads against the whitelist.

```bash
nextflow run main.nf --clone_barcodes_reference /path/to/barcodes.txt
```

### Discovery mode

NextClone supports **discovery mode**, which identifies barcodes directly from the data without a pre-defined whitelist. This is useful when:

- The exact barcode sequences are unknown
- You are working with a new or custom clonal barcoding system
- You want to validate or supplement a known barcode list

Discovery mode uses a two-pass approach powered by [Flexiplex](https://github.com/DavidsonGroup/flexiplex):

1. **Pass 1 (Discovery):** Run Flexiplex without a barcode list (`-k` flag) using strict flanking sequence matching (`-f 0`) to identify candidate barcodes.
2. **Pass 2 (Mapping):** Run Flexiplex with the discovered barcode list using standard edit distance parameters.

```bash
nextflow run main.nf --discovery_mode true
```

#### Barcode filtering in discovery mode

By default (`filter_discovered_barcodes = false`), **all barcodes discovered in Pass 1 are passed to Pass 2**, including singletons. This is recommended for lineage tracing experiments where rare clones are biologically meaningful.

Setting `filter_discovered_barcodes = true` applies `flexiplex-filter` knee-plot inflection filtering, which removes low-count barcodes. Use this only for noisy datasets — **it will discard singleton and low-count clones**:

```bash
nextflow run main.nf --discovery_mode true --filter_discovered_barcodes true
```

## Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `mode` | `"scRNAseq"` | Workflow mode: `"scRNAseq"` or `"DNAseq"` |
| `clone_barcodes_reference` | — | Path to known barcode whitelist (required when `discovery_mode = false`) |
| `discovery_mode` | `false` | Enable two-pass barcode discovery mode |
| `filter_discovered_barcodes` | `false` | Apply knee-plot filtering to discovered barcodes (see above) |
| `barcode_edit_distance` | `2` | Maximum edit distance for barcode matching |
| `adapter_edit_distance` | `6` | Maximum edit distance for flanking adapter matching |
| `adapter_5prime` | — | 5′ flanking adapter sequence |
| `adapter_3prime` | — | 3′ flanking adapter sequence |
| `barcode_length` | `20` | Expected barcode length (bp) |
| `n_chunks` | `2` | Number of read chunks for parallel processing |
| `publish_dir` | `output/` | Output directory |
| `report_title` | — | Custom title for the HTML report (defaults to date-stamped title) |

## Output Files

NextClone generates the following files in your `publish_dir`:

| File | Description |
|------|-------------|
| `all_barcodes.txt` | **All discovered barcodes** with counts (no filtering). Header: `#barcode\tcount` |
| `filtered_barcodes.txt` | Barcodes after filtering. Same as `all_barcodes.txt` if `filter_discovered_barcodes=false` |
| `clone_barcodes.csv` | Final clone assignments to cells (for downstream analysis) |
| `nextclone_qc_report.html` | Interactive QC dashboard |
| `run_log.txt` | Run parameters and command line (for reproducibility) |

**Note:** `all_barcodes.txt` contains ALL barcodes discovered in Pass 1, including singletons. This is useful for debugging and QC.

## HTML Reports

### Standard report (auto-generated)

NextClone automatically generates an interactive HTML dashboard at the end of every run, saved to your `publish_dir` as `nextclone_qc_report.html`.

**New in v2 (2026-04-09):**
- **Clone overlap table** — shared clones across samples at different thresholds (≥5, 10, 15, 20, 50, 100 cells)
- **Heterogeneity metrics** — Gini coefficient and Shannon index for each sample
- **Clone size density plot** — KDE-style curve showing clone size distribution
- **Reversed top 20 clones** — largest clones now at top (easier to read)

**All charts included:**
- Sample overview table (reads, cells, clones, Gini, Shannon)
- Clone overlap across samples (new!)
- Heterogeneity metrics summary (new!)
- Ranked clone abundance (log scale, top 3 annotated)
- Clone size density curve (new!)
- Top 20 clones (horizontal bar, reversed, with % labels)
- Edit distance QC (FlankEditDist & BarcodeEditDist)
- Cross-sample clonality comparison

To set a custom title:
```bash
nextflow run main.nf --report_title "My Experiment — ZR751 2026"
```

### Manual report generation (CLI)

You can also generate reports manually from any `clone_barcodes.csv` file:

```bash
# Basic usage
cd /path/to/nextclone/output
python3 /path/to/NextClone/reports/generate_report.py clone_barcodes.csv

# Custom output and title
python3 reports/generate_report.py clone_barcodes.csv \
--output my_report.html \
--title "ZR751 Clonal Analysis — 2026-04-09"
```

**Command-line options:**
```bash
python3 generate_report.py <input_csv> [OPTIONS]

Positional:
input_csv Path to clone_barcodes.csv from NextClone output

Options:
--output FILE Output HTML file (default: report.html)
--title TEXT Report title (default: "NextClone Report")
--help Show help message
```

For full documentation, see [`reports/README.md`](reports/README.md).

## Output Management

### Recommended Usage

**Always use timestamped output directories** to prevent overwriting previous runs:

```bash
# DNA-seq mode
nextflow run main.nf \\
--mode DNAseq \\
--dnaseq_fastq_files /path/to/fastq \\
--discovery_mode true \\
--filter_discovered_barcodes false \\
--publish_dir "results_DNAseq_$(date +%Y-%m-%d_%H-%M-%S)"

# scRNA-seq mode
nextflow run main.nf \\
--mode scRNAseq \\
--scrnaseq_bam_files /path/to/bams \\
--discovery_mode true \\
--filter_discovered_barcodes false \\
--publish_dir "results_scRNAseq_$(date +%Y-%m-%d_%H-%M-%S)"
```

**Example output:**
```
results_DNAseq_2026-04-10_11-45-22/
├── all_barcodes.txt # All discovered barcodes
├── filtered_barcodes.txt # Filtered barcodes (same as above if filter=false)
├── clone_barcodes.csv # Final clone assignments
├── nextclone_qc_report.html # Interactive QC dashboard
└── run_log.txt # Run parameters + software versions
```

### When to Clear Work Directory

**Clear `work/` directory only when:**
- Updating NextClone code (to avoid cached old results)
- Conda environments are corrupted
- Debugging unexpected behavior

```bash
# Clear work directory
rm -rf work/

# Clear conda cache (if needed)
rm -rf /path/to/nextflow_local/conda_cache/
```

**For routine runs:** Keep `work/` to save compute time (Nextflow caches task results).

### Comparison report (manual)

To compare two runs side by side (e.g. reference mode vs discovery mode), use the comparison script after both runs are complete:

```bash
python3 reports/generate_comparison_report.py \
/path/to/run_a/clone_barcodes.csv \
/path/to/run_b/clone_barcodes.csv \
--label-a "Reference" \
--label-b "Discovery" \
--output comparison_report.html \
--title "Reference vs Discovery — My Experiment"
```

The comparison report shows:
- Δ reads, cells, and clones between the two runs
- Per-sample ranked abundance overlay (both modes, log-scale)
- Clone size distribution side by side
- Top clone overlap (concordance between modes)
- Clonality metrics comparison (top1%, top3%, top10%)
- Cell recovery validation across samples

> **No pip installs required.** Both report scripts use Python stdlib only, with Chart.js loaded via CDN.

<!-- ## Citation -->

<!-- If you use NextClone in your study, please kindly cite our preprint on bioRxiv. -->
1 change: 0 additions & 1 deletion conda_env/extract_dnaseq_env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: extract_dnaseq_env
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- python=3.8
- Biopython
Expand Down
5 changes: 3 additions & 2 deletions conda_env/extract_sc_env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@ name: extract_sc_env
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- python=3.8
- pysam
- pandas
- numpy
- Biopython

- pip
- pip:
- flexiplex-filter
Loading