SuperIntervals

The R Bioconductor package superintervalsr provides fast interval intersection queries, using a novel superset-lookup approach with batch operations in mind.

Why SuperIntervals?

Faster interval intersection queries
Low memory overhead
Bioconductor friendly: GRanges → IntervalMap → query → back to GRanges

Installation

remotes::install_github("kcleal/superintervalsr")

Core Workflow

1. Create IntervalMap from Your Data

# From vectors (recommended - no build() needed)
imap <- IntervalMap.from_vectors(starts, ends, values)

# From GRanges (Bioconductor integration)
library(GenomicRanges)
gr <- GRanges("chr1", IRanges(starts, ends), gene_id = values)
imap <- IntervalMap(gr, value_column = "gene_id", seqname = "chr1")

# From data.frame
df <- data.frame(start = starts, end = ends, gene = values)
imap <- IntervalMap(df, value_column = "gene")

2. Batch Query Operations (The Fast Way)

# Multiple query regions
query_starts <- c(2000, 7000, 12000, 50000)
query_ends <- c(4000, 9000, 14000, 55000)

# Batch counting - orders of magnitude faster
counts <- count.batch(imap, query_starts, query_ends)
#> [1] 3 2 1 4

# Batch index search - get all overlapping interval indices  
indices_list <- search_idxs.batch(imap, query_starts, query_ends)
#> [[1]]
#> [1] 12 15 23
#> [[2]] 
#> [1] 45 67
#> ... etc

# Use indices to get back to original data
for (i in seq_along(indices_list)) {
  if (length(indices_list[[i]]) > 0) {
    overlapping_features <- values[indices_list[[i]]]
    cat("Query", i, "overlaps:", paste(overlapping_features, collapse = ", "), "\n")
  }
}

3. Single Query Operations (When You Need Just One)

# For occasional single queries
single_count <- count(imap, 4000, 6000)
single_indices <- search_idxs(imap, 4000, 6000)
single_values <- search_values(imap, 4000, 6000)

# Get complete overlap information
overlaps <- search_items(imap, 4000, 6000)
# Returns: list(start = c(...), end = c(...), value = list(...))

Real-World Genomics Example

library(superintervalsr)
library(GenomicRanges)

# Load your gene annotations
genes <- GRanges("chr1", IRanges(c(1000, 5000, 10000), c(4000, 8000, 15000)),
                 gene_id = c("ENSG001", "ENSG002", "ENSG003"),
                 gene_name = c("GENE_A", "GENE_B", "GENE_C"))

# Convert to IntervalMap for fast queries
gene_imap <- IntervalMap(genes, value_column = "gene_id", seqname = "chr1")

# You have thousands of ChIP-seq peaks to annotate
n_peaks <- 5000
peak_starts <- sort(sample(1:20000, n_peaks))
peak_ends <- peak_starts + sample(200:1000, n_peaks, replace = TRUE)

# Batch annotate all peaks (lightning fast!)
peak_gene_counts <- count.batch(gene_imap, peak_starts, peak_ends)
peak_gene_indices <- search_idxs.batch(gene_imap, peak_starts, peak_ends)

# Analyze results
peaks_with_genes <- sum(peak_gene_counts > 0)
cat("Peaks overlapping genes:", peaks_with_genes, "out of", n_peaks, "\n")

# Get detailed annotations for peaks with genes
for (i in which(peak_gene_counts > 0)[1:5]) {  # Show first 5
  overlapping_genes <- genes[peak_gene_indices[[i]]]
  cat("Peak", i, "overlaps:", 
      paste(mcols(overlapping_genes)$gene_name, collapse = ", "), "\n")
}

Performance Comparison

Rscript ./benchmark.R

>>> 4. SUMMARY
----------
Performance vs IRanges (NCLS algorithm):
- Overlap counting: 48.9x faster
- Overlap finding:  10.6x faster

Complete API Reference

Core Data Structure

# Create IntervalMap
IntervalMap()                                     # Empty map
IntervalMap.from_vectors(starts, ends, values)    # From vectors (recommended)
IntervalMap(granges, value_column, seqname)       # From GRanges
IntervalMap(dataframe, start_column, end_column)  # From data.frame

Batch Operations (Use These!)

count.batch(imap, starts, ends)        # Count overlaps for multiple queries
search_idxs.batch(imap, starts, ends)  # Get indices for multiple queries

Single Query Operations

has_overlaps(imap, start, end)    # TRUE/FALSE for any overlap
count(imap, start, end)           # Count overlapping intervals
search_idxs(imap, start, end)     # Get indices of overlapping intervals
search_values(imap, start, end)   # Get values of overlapping intervals
search_keys(imap, start, end)     # Get coordinates of overlapping intervals
search_items(imap, start, end)    # Get complete overlap information (coords + values)
coverage(imap, start, end)        # Coverage statistics

Bioconductor Integration

as_granges(search_items_result, seqname, valuename)  # Convert back to GRanges

Utilities

length(imap)           # Number of intervals
at(imap, index)        # Get interval at specific index
clear(imap)            # Remove all intervals
reserve(imap, n)       # Pre-allocate space for n intervals

Advanced: Manual Interval Addition

# For dynamic construction (less common)
imap <- IntervalMap()
add(imap, 1000, 5000, "interval1")
add(imap, 3000, 8000, "interval2")
build(imap)  # Required before queries when using add()

# Batch operations work the same way
counts <- count.batch(imap, query_starts, query_ends)

Notes

End-inclusive intervals: Both start and end positions are included
Automatic sorting: Intervals are maintained in sorted order
SIMD optimization: Automatic vectorization on modern CPUs
Memory efficient: One size_t per interval overhead
Thread safe: Read operations can be parallelized
Cache friendly: Optimized memory layout for fast iteration

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
R		R
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
benchmark.R		benchmark.R
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperIntervals

Why SuperIntervals?

Installation

Core Workflow

1. Create IntervalMap from Your Data

2. Batch Query Operations (The Fast Way)

3. Single Query Operations (When You Need Just One)

Real-World Genomics Example

Performance Comparison

Complete API Reference

Core Data Structure

Batch Operations (Use These!)

Single Query Operations

Bioconductor Integration

Utilities

Advanced: Manual Interval Addition

Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SuperIntervals

Why SuperIntervals?

Installation

Core Workflow

1. Create IntervalMap from Your Data

2. Batch Query Operations (The Fast Way)

3. Single Query Operations (When You Need Just One)

Real-World Genomics Example

Performance Comparison

Complete API Reference

Core Data Structure

Batch Operations (Use These!)

Single Query Operations

Bioconductor Integration

Utilities

Advanced: Manual Interval Addition

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages