Skip to content

Aligning to a genome with an "assembly gap" (NNNNN) calls a deletion #410

@dannagifford

Description

@dannagifford

Hello,

Something I noticed recently working with some new de novo assemblies I made. For context, we made the assembly using shovill, and as (it turns out) our new strain was very close to an existing reference genome, we used RagTag to help scaffold the assemblies. (And then prokka for annotation.)

RagTag introduces N's at assembly gaps.

We then used breseq to map the original Illumina reads back to the ragtag assembly. Where there are N's/assembly gaps, breseq calls these as a deletion, but:

  • the MC plots show reads spanning this region---they just don't match the N's in the reference genome
  • the JC evidence suggests there's a new junction at either side of the N's
Image

I'm not sure if this logic makes sense---I think the fact that there are reads that span the assembly gap means that, possibly, those reads could be used to correct the gap? I don't think it's a "deletion"?

I've attached the html files for the JC/MC evidence. I could possibly share more if needed, though I wonder if you could replicate this just by replacing arbitrary genome with N's...

Ns-called-as-deletion.zip

Shovill and RagTag for info:
https://github.com/tseemann/shovill
https://github.com/malonge/RagTag
https://github.com/tseemann/prokka

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions