STRiVE is a sketch-based structural variant (SV) discovery tool for assembly-to-assembly comparison. Instead of relying on whole-genome alignment (WGA), STRiVE uses sparse genomic markers to identify insertions, deletions, and inversions efficiently.
The method is described in the accompanying paper, Characterization of structural variation through assembly-to-assembly comparison.
STRiVE is designed for fast SV characterization from marker mappings between a reference assembly and a query assembly. The implementation is written in C++ and follows a streaming design, keeping memory use low while detecting:
- Insertions from sustained positive positional shifts
- Deletions from runs of unmapped markers followed by negative positional offsets
- Inversions from local reversals in marker ordering using a sliding-window intersection strategy
At a high level, the program:
- Loads a reference marker map.
- Scans query marker mappings to detect insertions and deletions.
- Extracts inversion-supporting mappings.
- Builds and merges inversion intervals.
- Writes predictions in BED-like tabular output.
main.cpp— core implementation of STRiVEConfig.h— configuration parameters and default thresholdsmakefile— build rules for thestriveexecutable
Note The uploaded snapshot references
ArgParser.hfrommain.cpp. Make sure that file is included in the submission repository as well, since it defines the command-line interface and argument parsing.
Compile with:
makeThe provided makefile uses:
g++- optimization level
-O3 - C++20 standard
- warning flags
-Wall -Wextra -Wpedantic
STRiVE expects two primary inputs.
A tab-delimited text file where each line contains:
<marker_id>\t<reference_position>
Example:
1039482 154320
2048129 154701
This file is loaded into an in-memory map from marker identifier to reference coordinate.
A SAM-like, tab-delimited alignment file for the same markers mapped against the query assembly.
The current implementation expects these fields:
- field
0: marker / minimizer name - field
1: flag - field
2: reference sequence name (rname) - field
3: mapped position - field
4: mapping quality (MAPQ) - field
5: CIGAR - fields
11+: optional tags
The code also inspects the optional NM:i: tag and uses it to filter mappings by mismatch count.
STRiVE writes three output files into the output directory:
ins.bed— predicted insertionsdel.bed— predicted deletionsinv.bed— predicted inversions
Each output line is written in a BED-like four-column format:
<chromosome>\t<start>\t<end>\t<size>
The chromosome name is inferred from the query mapping file when possible.
The following defaults are defined in Config.h:
| Parameter | Default | Meaning |
|---|---|---|
WINDOW_SIZE |
-1 |
positional tolerance / window threshold used during indel detection |
MAX_SV_SIZE |
1000000 |
maximum SV size considered |
MIN_SV_SIZE |
200 |
minimum SV size considered |
MIN_MAPQ |
50 |
minimum mapping quality threshold |
MAX_MISMATCHES |
0 |
maximum allowed mismatches from NM:i: |
INV_MIN_STREAK |
3 |
minimum number of consecutive inversion-supporting mappings |
INV_LOOKAHEAD |
10000 |
sliding-window size for inversion detection |
INV_MIN_SPAN |
200 |
minimum inversion span |
INV_MERGE_GAP |
0 |
gap tolerance when merging inversion intervals |
The exact command-line flag names are defined in ArgParser.h, which is referenced by main.cpp but was not part of the uploaded snapshot used to draft this README. In practice, the executable needs values for:
- reference marker map path
- query mapping path
- output directory
WINDOW_SIZE
and may optionally expose the remaining configuration fields shown above.
A typical invocation will look conceptually like this:
./strive \
--reference <reference_markers.tsv> \
--query <query_mappings.sam> \
--out-dir <output_directory>/ \
--window-size <W> \
--min-mapq 50 \
--min-sv-size 200 \
--max-sv-size 1000000 \
--max-mismatches 0 \
--inv-min-streak 3 \
--inv-lookahead 10000 \
--inv-min-span 200 \
--inv-merge-gap 0If your ArgParser.h uses different flag names, replace them accordingly.
STRiVE scans mapped markers in order and maintains a running positional offset. Insertions are detected when corrected query positions consistently exceed expected reference positions by more than the minimum SV size. Deletions are detected when a run of unmapped markers is followed by a sufficiently large negative positional shift.
For inversions, STRiVE keeps only high-confidence inversion-supporting mappings and looks for local reversals in ordering. It represents marker mappings geometrically and detects inversions by identifying line-segment intersections within a bounded sliding window. Overlapping candidate intervals are then merged.
- Supplementary alignments are ignored.
- Unmapped markers are used as part of deletion discovery.
- Inversion discovery requires mappings flagged as inverted.
- Mismatch filtering is based on the
NM:i:tag. - Output paths are generated as
<out-dir>/ins.bed,<out-dir>/del.bed, and<out-dir>/inv.bed.
If you use STRiVE in your work, please cite the accompanying paper.