-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question on GTDBr214.1 gtdb taxdump file regarding taxID #8
Comments
Yes, it's "removed" during taxdump file creation. There are some doc in the help message:
|
Thank you for your reply. When I used your taxdump files for kraken2 database comrised of GTDBr214.1 Species representative 85202 genome , the Kraken and bracken report file report the absence of specific taxonomic unit. As I understancd, the GTDB database taxonomy units holded in specific name placeholders such as case I mentioned, are genuine taxonomy that should be considered in analysis . When I look at the kraken report results, the duplicated intermediate taxonomy unit names (such as class order family ) are just omitted would affect taxonomy abundance analysis in those rank. or I could misunderstand somepoint the way of action in kraken2 and bracken taxonomy processing. When I converted the bracken to mpa style taxonomic composition report file by using KrakenTools provided by kraken2 authos, they produced output files in this way. bracken2mpa:d__Archaea|p__Halobacteriota|c__Bog-38 0.0002 In the species level or phylum class , the reports will be complete , but regarding family and order level , the information will be just vanished in my guess . Though Kraken2 and bracken is not taxdump, the analysis are heavily dependent on taxonomy files, so I wonder your thought about it. Thank you very much |
I understand your worries. In practice, we only summarize at rank phylum and species. Besides, for predictions with an abundance lower to 0.0002, which probably are false positives. You can also ask if KrakenTools can support these cases. |
Okay. |
Hello. Regarding this I have two question . Thank you for your great contribution wei. |
Yes. shenwei356/taxonkit#92 (comment)
0.16.2 is not released yet :), It's 0.16.0.
Yes, old taxonkit versions can still be used for updated GTDB taxonomy as the taxudmp file format is not changed. taxonkit v0.2.5 (Oct 12, 2018)
The latest
|
Hello Thank you for your nice work.
I downloaded GTDB taxa to utilize it for kraken database. (the taxonomy files you've created)
I utilized GTDBr214.1 taxdump files.
I just found out one specific taxa is not aligned with seven level taxnonmy (domain to species).
I did it directly on the downloaded taxonomy dataset too.
$grep 1830337315 *
GTDBr214.1_taxid_taxonomy:GCA_003162175.1 1830337315 Archaea;Halobacteriota;Bog-38;Bog-38 sp003162175;003162175
$taxonkit lineage <(echo 1830337315) --data-dir /data1/DBs/kraken2/gtdbr214.1/gtdb-taxdump/R214.1/
1830337315 Archaea;Halobacteriota;Bog-38;Bog-38 sp003162175;003162175
I think duplicated names are removed , that have same names in different taxonomy units somehow.
In officail GTDB site, they have duplicated names in different taxonomy unit.
I don't know it's removed during taxonkit execution or taxdump file creation .
Can you inspect about it?
Thank you very much.
The text was updated successfully, but these errors were encountered: