Strategies for highly heterozygous haplotype-resolved eukaryotic pangenomes with extreme structural variation

Hi Cactus developers,

We are building a pangenome for Mytilus galloprovincialis using multiple haplotype-resolved assemblies (currently 12 haplotypes).

Our biological goal is not only genome alignment, but especially:
- localization of presence–absence variable (PAV) genes,
- structural interpretation of accessory genes,
- graph-aware gene projection across haplotypes.

However, we are encountering substantial graph complexity and fragmentation.

Current setup:
- haplotype-resolved assemblies
- very high heterozygosity
- extensive structural variation
- high repeat content
- large gene presence/absence variation

Symptoms:
- extremely dense graph structures
- many crossing paths / graph “hairball” regions
- very large node/edge counts
- difficult interpretation of local gene neighborhoods
- very slow vg giraffe mapping

We suspect this may reflect a combination of:
- biological variation,
- fragmented alignments,
- excessive local graph branching,
- and possibly the limits of whole-genome cactus alignment for extremely polymorphic eukaryotic genomes.

We would like to ask:

1. Are there recommended strategies for highly heterozygous haplotype-aware eukaryotic pangenomes?

2. Would you recommend:
   - stronger graph filtering?
   - chromosome-by-chromosome construction?
   - pre-collapsing haplotypes?
   - different minigraph/cactus settings?
   - alternative graph normalization approaches?

We would greatly appreciate any guidance or best practices for this type of dataset.

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategies for highly heterozygous haplotype-resolved eukaryotic pangenomes with extreme structural variation #1937

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Strategies for highly heterozygous haplotype-resolved eukaryotic pangenomes with extreme structural variation #1937

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions