Skip to content

trying to build a new eukaryotic species SingleM metapackage #286

@fanch1122

Description

@fanch1122

I am attempting to build a custom SingleM metapackage (smpkg) for eukaryotic species using BUSCO single-copy orthologs as marker genes. While the package construction completes successfully, all OTU annotations from short-read metagenomic data are returned as "root", indicating failed taxonomic assignment.

  • Construction workflow
  1. Marker selection: Extracted universal single-copy orthologs from BUSCO datasets
  2. Sequence retrieval: Retrieved reference sequences for target eukaryotic clades
  3. Taxonomy integration: Built custom NCBI taxonomy tree incorporating new lineages
  4. Package assembly: Generated smpkg using singlem metapackage create
  • Questions
  1. Does SingleM officially support custom eukaryotic metapackages, or is eukaryotic analysis currently restricted?
  2. Are there known limitations with short-read alignment against BUSCO-derived markers?
  3. What diagnostic steps would help identify whether this is a taxonomy formatting issue vs. others failure?
  • example otu_table.tsv

gene sample sequence num_hits coverage taxonomy
s3.108097 SRR12711264_R1 ACCGGCATCAAGGCCATTGACGGCATGATCCCCATCGGCAAGGGTCAGCGTGAGCTGATC 2 3.30 Root
s3.108097 SRR12711264_R1 ACAGGTATTAAGGCAATTGATGCCATGGTTCCAATCGGAAGAGGTCAGAGAGAGTTAATT 3 4.95 Root
s3.108097 SRR12711264_R1 ACCGGTATTAAATGTATCGACGCTCTCGTACCTATCGGACGTGGCCAACGTGAACTTATC 4 6.59 Root
s3.108097 SRR12711264_R1 ACCGGTATCAAGGTTGTTGACCTGATCTGCCCCTACGCAAAGGGCGGTAAGATCGGTCTG 3 4.95 Root
s3.108097 SRR12711264_R1 ACAGGCATAAAGGTGATTGACCTGCTGGAACCATACTGCAAAGGTGGGAAGATTGGACTC 1 1.65 Root
s3.108097 SRR12711264_R1 ACCGGCTTTAAGGCTATCGACGCGATGATTCCTATCGGTCGTGGTCAGCGTGAGTTGATT 6 9.89 Root
s3.108097 SRR12711264_R1 ACAGGCATTAAGGTAATAGATTTGCTCGAGCCCTACCTTAAAGGCGGCAAGATCGGTCTT 18 29.67 Root
s3.108097 SRR12711264_R1 ACAGGTATCAAGGCTATTGACAGTATGATTCCTATCGGCAGAGGCCAGAGAGAACTTATC 1 1.65 Root
s3.108097 SRR12711264_R1 ACCGGCATCAAGGCCATCGACTCCATGATCCCCATCGGTCGTGGCCAGCGTGAGCTGATC 2 3.30 Root
s3.108097 SRR12711264_R1 ACGGGCATCAAGGTCATCGATCTGCTCGAACCATATCTGAAAGGAGGAAAGATCGGACTT 1 1.65 Root
s3.108097 SRR12711264_R1 ACGGGCATCAAGGTCATCGACCTGATCTGCCCCTACGCCAAGGGTGGCAAGATCGGCCTG 3 4.95 Root

best wish ~

part_log.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions