Skip to content

BilkentCompGen/strive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

STRiVE

STRiVE is a sketch-based structural variant (SV) discovery tool for assembly-to-assembly comparison. Instead of relying on whole-genome alignment (WGA), STRiVE uses sparse genomic markers to identify insertions, deletions, and inversions efficiently.

The method is described in the accompanying paper, Characterization of structural variation through assembly-to-assembly comparison.

Overview

STRiVE is designed for fast SV characterization from marker mappings between a reference assembly and a query assembly. The implementation is written in C++ and follows a streaming design, keeping memory use low while detecting:

  • Insertions from sustained positive positional shifts
  • Deletions from runs of unmapped markers followed by negative positional offsets
  • Inversions from local reversals in marker ordering using a sliding-window intersection strategy

At a high level, the program:

  1. Loads a reference marker map.
  2. Scans query marker mappings to detect insertions and deletions.
  3. Extracts inversion-supporting mappings.
  4. Builds and merges inversion intervals.
  5. Writes predictions in BED-like tabular output.

Repository contents

  • main.cpp — core implementation of STRiVE
  • Config.h — configuration parameters and default thresholds
  • makefile — build rules for the strive executable

Note The uploaded snapshot references ArgParser.h from main.cpp. Make sure that file is included in the submission repository as well, since it defines the command-line interface and argument parsing.

Build

Compile with:

make

The provided makefile uses:

  • g++
  • optimization level -O3
  • C++20 standard
  • warning flags -Wall -Wextra -Wpedantic

Input format

STRiVE expects two primary inputs.

1. Reference marker map

A tab-delimited text file where each line contains:

<marker_id>\t<reference_position>

Example:

1039482	154320
2048129	154701

This file is loaded into an in-memory map from marker identifier to reference coordinate.

2. Query marker mappings

A SAM-like, tab-delimited alignment file for the same markers mapped against the query assembly.

The current implementation expects these fields:

  • field 0: marker / minimizer name
  • field 1: flag
  • field 2: reference sequence name (rname)
  • field 3: mapped position
  • field 4: mapping quality (MAPQ)
  • field 5: CIGAR
  • fields 11+: optional tags

The code also inspects the optional NM:i: tag and uses it to filter mappings by mismatch count.

Output

STRiVE writes three output files into the output directory:

  • ins.bed — predicted insertions
  • del.bed — predicted deletions
  • inv.bed — predicted inversions

Each output line is written in a BED-like four-column format:

<chromosome>\t<start>\t<end>\t<size>

The chromosome name is inferred from the query mapping file when possible.

Parameters

The following defaults are defined in Config.h:

Parameter Default Meaning
WINDOW_SIZE -1 positional tolerance / window threshold used during indel detection
MAX_SV_SIZE 1000000 maximum SV size considered
MIN_SV_SIZE 200 minimum SV size considered
MIN_MAPQ 50 minimum mapping quality threshold
MAX_MISMATCHES 0 maximum allowed mismatches from NM:i:
INV_MIN_STREAK 3 minimum number of consecutive inversion-supporting mappings
INV_LOOKAHEAD 10000 sliding-window size for inversion detection
INV_MIN_SPAN 200 minimum inversion span
INV_MERGE_GAP 0 gap tolerance when merging inversion intervals

Running STRiVE

The exact command-line flag names are defined in ArgParser.h, which is referenced by main.cpp but was not part of the uploaded snapshot used to draft this README. In practice, the executable needs values for:

  • reference marker map path
  • query mapping path
  • output directory
  • WINDOW_SIZE

and may optionally expose the remaining configuration fields shown above.

A typical invocation will look conceptually like this:

./strive \
  --reference <reference_markers.tsv> \
  --query <query_mappings.sam> \
  --out-dir <output_directory>/ \
  --window-size <W> \
  --min-mapq 50 \
  --min-sv-size 200 \
  --max-sv-size 1000000 \
  --max-mismatches 0 \
  --inv-min-streak 3 \
  --inv-lookahead 10000 \
  --inv-min-span 200 \
  --inv-merge-gap 0

If your ArgParser.h uses different flag names, replace them accordingly.

Method summary

Insertions and deletions

STRiVE scans mapped markers in order and maintains a running positional offset. Insertions are detected when corrected query positions consistently exceed expected reference positions by more than the minimum SV size. Deletions are detected when a run of unmapped markers is followed by a sufficiently large negative positional shift.

Inversions

For inversions, STRiVE keeps only high-confidence inversion-supporting mappings and looks for local reversals in ordering. It represents marker mappings geometrically and detects inversions by identifying line-segment intersections within a bounded sliding window. Overlapping candidate intervals are then merged.

Practical notes

  • Supplementary alignments are ignored.
  • Unmapped markers are used as part of deletion discovery.
  • Inversion discovery requires mappings flagged as inverted.
  • Mismatch filtering is based on the NM:i: tag.
  • Output paths are generated as <out-dir>/ins.bed, <out-dir>/del.bed, and <out-dir>/inv.bed.

Citation

If you use STRiVE in your work, please cite the accompanying paper.

About

Characterization of structural variation through assembly-to-assembly comparison

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors