Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LiftOff generates better BUSCO scores than LiftOn #24

Open
14zac2 opened this issue Aug 20, 2024 · 2 comments
Open

LiftOff generates better BUSCO scores than LiftOn #24

14zac2 opened this issue Aug 20, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request fixed

Comments

@14zac2
Copy link

14zac2 commented Aug 20, 2024

Hi there,

I was testing LiftOff and LiftOn to see which one is "best" for genome annotation. I was looking at some RefSeq genomes, lifting the brown-headed cowbird and red-winged blackbird onto the bronzed cowbird. Interestingly, in both cases LiftOff created better BUSCO scores and LiftOn also generated some weird features where the end coordinate of the feature was earlier than the start coordinate. Although LiftOff generated better BUSCO scores, GFFCompare suggested that the LiftOn genome had more matching transcripts, but the precision was a tad lower. As a result, I feel I trust LiftOff more as an annotation tool and wanted to bring these results to your attention.

Here is my BUSCO score for LiftOff, brown-headed cowbird on bronzed cowbird:

    ----------------------------------------------------
    |Results from dataset passeriformes_odb10           |
    ----------------------------------------------------
    |C:93.6%[S:60.2%,D:33.4%],F:1.2%,M:5.2%,n:10844     |
    |10148    Complete BUSCOs (C)                       |
    |6526    Complete and single-copy BUSCOs (S)        |
    |3622    Complete and duplicated BUSCOs (D)         |
    |127    Fragmented BUSCOs (F)                       |
    |569    Missing BUSCOs (M)                          |
    |10844    Total BUSCO groups searched               |
    ----------------------------------------------------

And here is GFFCompare, when comparing this brown-headed cowbird LiftOff annotation to the RefSeq annotation:

#-----------------| Sensitivity | Precision  |
        Base level:    89.2     |    84.4    |
        Exon level:    85.6     |    87.7    |
      Intron level:    90.0     |    92.2    |
Intron chain level:    59.9     |    58.1    |
  Transcript level:    60.4     |    58.7    |
       Locus level:    71.2     |    74.4    |

     Matching intron chains:   17652
       Matching transcripts:   18650
              Matching loci:   13081

          Missed exons:   15337/199727  (  7.7%)
           Novel exons:   10077/195294  (  5.2%)
        Missed introns:   11294/180584  (  6.3%)
         Novel introns:    5639/176399  (  3.2%)
           Missed loci:    1914/18371   ( 10.4%)
            Novel loci:    1166/17582   (  6.6%)

Here is BUSCO for LiftOn, brown-headed cowbird on bronzed cowbird:

    -----------------------------------------------------
    |Results from dataset passeriformes_odb10            |
    -----------------------------------------------------
    |C:71.1%[S:47.5%,D:23.6%],F:0.4%,M:28.5%,n:10844     |
    |7708    Complete BUSCOs (C)                         |
    |5150    Complete and single-copy BUSCOs (S)         |
    |2558    Complete and duplicated BUSCOs (D)          |
    |46    Fragmented BUSCOs (F)                         |
    |3090    Missing BUSCOs (M)                          |
    |10844    Total BUSCO groups searched                |
    -----------------------------------------------------

GFFCompare for LiftOn brown-headed cowbird compared to RefSeq:

#-----------------| Sensitivity | Precision  |
        Base level:    89.5     |    84.2    |
        Exon level:    86.0     |    87.2    |
      Intron level:    90.4     |    91.7    |
Intron chain level:    60.2     |    57.9    |
  Transcript level:    60.8     |    58.4    |
       Locus level:    71.8     |    73.5    |

     Matching intron chains:   17721
       Matching transcripts:   18765
              Matching loci:   13198

          Missed exons:   14512/199727  (  7.3%)
           Novel exons:   11255/197273  (  5.7%)
        Missed introns:   10598/180584  (  5.9%)
         Novel introns:    6515/177980  (  3.7%)
           Missed loci:    1760/18371   (  9.6%)
            Novel loci:    1388/17956   (  7.7%)

BUSCO for LiftOff of red-winged blackbird on bronzed cowbird:

    ----------------------------------------------------
    |Results from dataset passeriformes_odb10           |
    ----------------------------------------------------
    |C:97.3%[S:69.3%,D:28.0%],F:0.4%,M:2.3%,n:10844     |
    |10551    Complete BUSCOs (C)                       |
    |7510    Complete and single-copy BUSCOs (S)        |
    |3041    Complete and duplicated BUSCOs (D)         |
    |39    Fragmented BUSCOs (F)                        |
    |254    Missing BUSCOs (M)                          |
    |10844    Total BUSCO groups searched               |
    ----------------------------------------------------

GFFCompare of LiftOff red-winged blackbird compared to RefSeq annotation:

#-----------------| Sensitivity | Precision  |
        Base level:    76.9     |    90.0    |
        Exon level:    82.0     |    85.5    |
      Intron level:    87.0     |    90.2    |
Intron chain level:    44.3     |    46.0    |
  Transcript level:    45.3     |    46.8    |
       Locus level:    62.1     |    66.1    |

     Matching intron chains:   13052
       Matching transcripts:   13987
              Matching loci:   11409

          Missed exons:   19028/199727  (  9.5%)
           Novel exons:   10245/191797  (  5.3%)
        Missed introns:   12796/180584  (  7.1%)
         Novel introns:    5096/174094  (  2.9%)
           Missed loci:    2101/18371   ( 11.4%)
            Novel loci:     923/17215   (  5.4%)

BUSCO of LiftOn red-winged blackbird onto bronzed cowbird:

    -----------------------------------------------------
    |Results from dataset passeriformes_odb10            |
    -----------------------------------------------------
    |C:61.3%[S:44.0%,D:17.3%],F:0.4%,M:38.3%,n:10844     |
    |6649    Complete BUSCOs (C)                         |
    |4768    Complete and single-copy BUSCOs (S)         |
    |1881    Complete and duplicated BUSCOs (D)          |
    |44    Fragmented BUSCOs (F)                         |
    |4151    Missing BUSCOs (M)                          |
    |10844    Total BUSCO groups searched                |
    -----------------------------------------------------

GFFCompare of LiftOn red-winged blackbird compared to RefSeq annotation:

#-----------------| Sensitivity | Precision  |
        Base level:    77.2     |    89.9    |
        Exon level:    82.4     |    85.3    |
      Intron level:    87.4     |    90.1    |
Intron chain level:    44.9     |    46.2    |
  Transcript level:    45.9     |    47.0    |
       Locus level:    63.0     |    66.1    |

     Matching intron chains:   13209
       Matching transcripts:   14176
              Matching loci:   11571

          Missed exons:   18326/199727  (  9.2%)
           Novel exons:   10918/193146  (  5.7%)
        Missed introns:   12242/180584  (  6.8%)
         Novel introns:    5639/175141  (  3.2%)
           Missed loci:    1977/18371   ( 10.8%)
            Novel loci:    1050/17477   (  6.0%)
@Kuanhao-Chao
Copy link
Owner

Thanks @14zac2 for sharing the results with us! I’ll definitely be looking into them closely. If possible, could you please share the genome and annotation files with me? That would be incredibly helpful.

There’s still some work to be done to address a few edge cases to improve LiftOn. I’m confident that after resolving these issues, LiftOn will perform as well as, if not better than, current methods on those more divergent genes.

I’m currently on an internship until the end of August, so I’ll revisit this in September. It was great meeting you at the conference, and thanks again for testing LiftOn!

@Kuanhao-Chao Kuanhao-Chao self-assigned this Aug 21, 2024
@Kuanhao-Chao Kuanhao-Chao added enhancement New feature or request fixed labels Aug 21, 2024
@14zac2
Copy link
Author

14zac2 commented Aug 22, 2024

Sure thing! All of these were RefSeq genomes and annotations, so I'll link to the FTPs of the species below. In each case, I used the *.fna FASTA files and the GFF annotations.

Bronzed cowbird: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/037/042/795/GCF_037042795.1_BPBGC_Maene_1.0/

Brown-headed cowbird: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/012/460/135/GCF_012460135.2_BPBGC_Mater_1.1/

Red-winged blackbird: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/745/825/GCF_020745825.1_Agelaius_phoeniceus_1.1/

For LiftOff, I used a Docker container for version 1.6.3 and my script was as follows:

echo "Target genome: $1"
echo "Reference genome: $2"
echo "Reference GFF: $3"
echo "Output GFF: $4"
#echo "Feature list: $5"

mkdir liftoff

docker run -v "$(pwd)":/tmp staphb/liftoff liftoff \
 "/tmp/$1" "/tmp/$2" -g "/tmp/$3" -o "/tmp/liftoff/$4" \
 -u "/tmp/liftoff/unmapped_features.txt" \
 -dir "/tmp/liftoff/intermediate_files" \
 -copies -p 20 -polish -flank 0.5

I tried to replicate the same parameters with LiftOn:

echo "Target genome: $1"
echo "Reference genome: $2"
echo "Reference GFF: $3"
echo "Output GFF: $4"

source /home/zclarke/anaconda2/etc/profile.d/conda.sh
source ~/bin/lifton_env/bin/activate
conda activate lifton

mkdir lifton
cd lifton

lifton "../$1" "../$2" -g "../$3" -o $4 \
 -copies -t 20 -polish -flank 0.5

It was great meeting you, as well, and best of luck with your tool! I'd be very curious to understand what's going on here.

Cheers,
Zoe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed
Projects
None yet
Development

No branches or pull requests

2 participants