explore_database.py
Explore the database’s composition using taxonomic terms as search queries, and update its metadata and taxonomy tree color annotations.
explore_database.py [OPTIONS]
Optional arguments:
-d
,--database <db_dir>
Path to the database directory ifconfig.py
has not been run.-t
,--higher_taxonomy
Show higher taxonomy and number of taxa in each group in the database.-l
,--lower_taxonomy
Show lower taxonomy and number of taxa in each group in the database.-r
,--get_higher <HigherTax>
Return table with all taxa assigned the given higher taxonomy that displays the UniqueID, Long Name, Higher and Lower taxonomy, the number of orthologs, and the number of paralogs present in the database.-w
,--get_lower <LowerTax>
Return table with all taxa assigned the given lower taxonomy that displays the UniqueID, Long Name, Higher and Lower Taxonomy, the number of orthologs, and the number of paralogs present in the database.-o
,--get_org <UniqueID>
For the given Unique ID returns the Long Name, Higher and Lower Taxonomic Designation, Data Type, Orthologs (number present in the database for the taxon), Paralogs (number present in the database for the taxon), and the Accession.--update_metadata
Update the metadata in the database with the latest information from a provided metadata TSV file.--show_taxonomy_colors
Display all taxonomy terms with their associated colors.--update_taxonomy_colors
TSV file to update taxonomy colors. Columns: Taxonomy, Color.--update_unique_ids
TSV file to update Unique IDs and sequence headers. Columns: Old ID, New ID.--threads
Number of threads. Default is 1. Only to be used with--update_unique_ids
.--dry_run
Do not update the database, just print what would be changed. For use with –update_metadata, –update_taxonomy_colors, or –update_unique_ids.
--update_metadata
TSV file format:
- The first line should contain the column headers:
Unique ID
,Long Name
,Higher Taxonomy
,Lower Taxonomy
,Data Type
, andSource
. - Each subsequent line should contain the corresponding values for each column, separated by tabs.
Unique ID | Long Name | Higher Taxonomy | Lower Taxonomy | Data Type | Source |
---|---|---|---|---|---|
Crypparv | Cryptosporidium parvum | Alveolata | Apicomplexa | Genomic | GCF_000165345.1 |
Table 8: Example update_metadata.tsv
file for the --update_metadata
option. The first line is the header, and each subsequent line contains the Unique ID, Long Name, Higher Taxonomy, Lower Taxonomy, Data Type, and Source for each taxon to be updated.
--update_taxonomy_colors
TSV file format:
- The first line should contain the column headers:
Taxonomy
andColor
. - Each subsequent line should contain the corresponding values for each column, separated by tabs.
Taxonomy | Color |
---|---|
Amoebozoa | Red |
Table 9: Example update_taxonomy_colors.tsv
file for the --update_taxonomy_colors
option. The first line is the header, and each subsequent line contains the Taxonomy and Color for each taxonomy to be updated.
--update_unique_ids
TSV file format:
- The first line should contain the column headers:
Old ID
andNew ID
. - Each subsequent line should contain the corresponding values for each column, separated by tabs.
Old ID | New ID |
---|---|
Acancast | Acancas2 |
Table 10: Example update_unique_ids.tsv
file for the --update_unique_ids
option. The first line is the header, and each subsequent line contains the Old ID and New ID.