Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

human herpesvirus 2 missing from database #26

Open
rgiannico opened this issue Dec 11, 2023 · 0 comments
Open

human herpesvirus 2 missing from database #26

rgiannico opened this issue Dec 11, 2023 · 0 comments

Comments

@rgiannico
Copy link

Human herpesvirus 2 sequence is absent from the Viral Kraken Database (and all the derivate Databases).
It's strange because:

  • all other non-human herpesvirus 2 sequences are present in the Viral Kraken Database (see code below)
  • Human herpesvirus 2 sequence "NC_001798.2" is present in the Viral NCBI RefSeq Database (see code below)
  • Human herpesvirus 2 is quite an important human virus, I dare to say it MUST be present in the Kraken database.

Is there a specific reason why it is missing?
Here some simple code for reproducibility:

# get krakendb viral taxa
$ wget https://genome-idx.s3.amazonaws.com/kraken/viral_20231009/library_report.tsv
$ grep "herpesvirus 2" library_report.tsv | cut -f 2 | sort > krakendb.txt

# get RefSeq viral taxas
$ wget https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.1.genomic.fna.gz
$ zgrep "^>" viral.1.1.genomic.fna.gz | grep "herpesvirus 2" | sort > viralgenomic.txt

# find differences
$ diff -y krakendb.txt viralgenomic.txt
>NC_001350.1 Saimiriine herpesvirus 2 complete genome           >NC_001350.1 Saimiriine herpesvirus 2 complete genome
>NC_001650.2 Equid herpesvirus 2 strain 86/67, complete genom   >NC_001650.2 Equid herpesvirus 2 strain 86/67, complete genom
                                                              > >NC_001798.2 Human herpesvirus 2 strain HG52, complete genome
>NC_002229.3 Gallid herpesvirus 2, complete genome              >NC_002229.3 Gallid herpesvirus 2, complete genome
>NC_003521.1 Panine herpesvirus 2 strain Heberling, complete    >NC_003521.1 Panine herpesvirus 2 strain Heberling, complete
>NC_006560.1 Cercopithecine herpesvirus 2, complete genome      >NC_006560.1 Cercopithecine herpesvirus 2, complete genome
>NC_007646.1 Ovine herpesvirus 2 strain BJ1035, complete geno   >NC_007646.1 Ovine herpesvirus 2 strain BJ1035, complete geno
>NC_007653.1 Papiine herpesvirus 2, complete genome             >NC_007653.1 Papiine herpesvirus 2, complete genome
>NC_008210.1 Ranid herpesvirus 2 strain ATCC VR-568, complete   >NC_008210.1 Ranid herpesvirus 2 strain ATCC VR-568, complete
>NC_019495.1 Cyprinid herpesvirus 2 strain ST-J1, complete ge   >NC_019495.1 Cyprinid herpesvirus 2 strain ST-J1, complete ge
>NC_020231.1 Caviid herpesvirus 2 strain 21222, complete geno   >NC_020231.1 Caviid herpesvirus 2 strain 21222, complete geno
>NC_024382.1 Alcelaphine herpesvirus 2 isolate topi-AlHV-2, c   >NC_024382.1 Alcelaphine herpesvirus 2 isolate topi-AlHV-2, c
>NC_036579.1 Ictalurid herpesvirus 2 strain 760/94, complete    >NC_036579.1 Ictalurid herpesvirus 2 strain 760/94, complete
>NC_038265.1 Porcine lymphotropic herpesvirus 2 isolate 568 l   >NC_038265.1 Porcine lymphotropic herpesvirus 2 isolate 568 l
>NC_038860.1 Pongine herpesvirus 2 (Orangutan herpesvirus) gB   >NC_038860.1 Pongine herpesvirus 2 (Orangutan herpesvirus) gB
>NC_043042.1 Acipenserid herpesvirus 2 strain SRWSHV, partial   >NC_043042.1 Acipenserid herpesvirus 2 strain SRWSHV, partial
>NC_043044.1 Salmonid herpesvirus 2 isolate NeVTA ORF68-like    >NC_043044.1 Salmonid herpesvirus 2 isolate NeVTA ORF68-like
>NC_043059.1 Caprine herpesvirus 2 glycoprotein B (gB) and DN   >NC_043059.1 Caprine herpesvirus 2 glycoprotein B (gB) and DN
>NC_043062.1 Phocid herpesvirus 2 DNA-dependent DNA polymeras   >NC_043062.1 Phocid herpesvirus 2 DNA-dependent DNA polymeras
>NC_043063.1 Iguanid herpesvirus 2 DNA-dependent DNA polymera   >NC_043063.1 Iguanid herpesvirus 2 DNA-dependent DNA polymera
>NC_075563.1 Cervid alphaherpesvirus 2 strain Norway, complet   >NC_075563.1 Cervid alphaherpesvirus 2 strain Norway, complet
>NC_075802.1 Salmonid herpesvirus 2 isolate NeVTA DNA polymer   >NC_075802.1 Salmonid herpesvirus 2 isolate NeVTA DNA polymer
>NC_076512.1 Bovine alphaherpesvirus 2 strain C1Z FZR, comple   >NC_076512.1 Bovine alphaherpesvirus 2 strain C1Z FZR, comple
>NC_076513.1 Macropodid alphaherpesvirus 2 strain V3077/08, c   >NC_076513.1 Macropodid alphaherpesvirus 2 strain V3077/08, c
>NC_076966.1 Cacatuid alphaherpesvirus 2 isolate CaHV2/Melbou   >NC_076966.1 Cacatuid alphaherpesvirus 2 isolate CaHV2/Melbou
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant