Hi Cactus developers,
We are building a pangenome for Mytilus galloprovincialis using multiple haplotype-resolved assemblies (currently 12 haplotypes).
Our biological goal is not only genome alignment, but especially:
- localization of presence–absence variable (PAV) genes,
- structural interpretation of accessory genes,
- graph-aware gene projection across haplotypes.
However, we are encountering substantial graph complexity and fragmentation.
Current setup:
- haplotype-resolved assemblies
- very high heterozygosity
- extensive structural variation
- high repeat content
- large gene presence/absence variation
Symptoms:
- extremely dense graph structures
- many crossing paths / graph “hairball” regions
- very large node/edge counts
- difficult interpretation of local gene neighborhoods
- very slow vg giraffe mapping
We suspect this may reflect a combination of:
- biological variation,
- fragmented alignments,
- excessive local graph branching,
- and possibly the limits of whole-genome cactus alignment for extremely polymorphic eukaryotic genomes.
We would like to ask:
-
Are there recommended strategies for highly heterozygous haplotype-aware eukaryotic pangenomes?
-
Would you recommend:
- stronger graph filtering?
- chromosome-by-chromosome construction?
- pre-collapsing haplotypes?
- different minigraph/cactus settings?
- alternative graph normalization approaches?
We would greatly appreciate any guidance or best practices for this type of dataset.
Thanks a lot!
Hi Cactus developers,
We are building a pangenome for Mytilus galloprovincialis using multiple haplotype-resolved assemblies (currently 12 haplotypes).
Our biological goal is not only genome alignment, but especially:
However, we are encountering substantial graph complexity and fragmentation.
Current setup:
Symptoms:
We suspect this may reflect a combination of:
We would like to ask:
Are there recommended strategies for highly heterozygous haplotype-aware eukaryotic pangenomes?
Would you recommend:
We would greatly appreciate any guidance or best practices for this type of dataset.
Thanks a lot!