explore_database.py
Explore the database’s composition using taxonomic terms as search queries, and update its metadata and taxonomy tree color annotations.
explore_database.py [OPTIONS]
Optional arguments:
-d,--database <db_dir>Path to the database directory ifconfig.pyhas not been run.-t,--higher_taxonomyShow higher taxonomy and number of taxa in each group in the database.-l,--lower_taxonomyShow lower taxonomy and number of taxa in each group in the database.-r,--get_higher <HigherTax>Return table with all taxa assigned the given higher taxonomy that displays the UniqueID, Long Name, Higher and Lower taxonomy, the number of orthologs, and the number of paralogs present in the database.-w,--get_lower <LowerTax>Return table with all taxa assigned the given lower taxonomy that displays the UniqueID, Long Name, Higher and Lower Taxonomy, the number of orthologs, and the number of paralogs present in the database.-o,--get_org <UniqueID>For the given Unique ID returns the Long Name, Higher and Lower Taxonomic Designation, Data Type, Orthologs (number present in the database for the taxon), Paralogs (number present in the database for the taxon), and the Accession.--update_metadataUpdate the metadata in the database with the latest information from a provided metadata TSV file.--show_taxonomy_colorsDisplay all taxonomy terms with their associated colors.--update_taxonomy_colorsTSV file to update taxonomy colors. Columns: Taxonomy, Color.--update_unique_idsTSV file to update Unique IDs and sequence headers. Columns: Old ID, New ID.--threadsNumber of threads. Default is 1. Only to be used with--update_unique_ids.--dry_runDo not update the database, just print what would be changed. For use with –update_metadata, –update_taxonomy_colors, or –update_unique_ids.
--update_metadata TSV file format:
- The first line should contain the column headers:
Unique ID,Long Name,Higher Taxonomy,Lower Taxonomy,Data Type, andSource. - Each subsequent line should contain the corresponding values for each column, separated by tabs.
| Unique ID | Long Name | Higher Taxonomy | Lower Taxonomy | Data Type | Source |
|---|---|---|---|---|---|
| Crypparv | Cryptosporidium parvum | Alveolata | Apicomplexa | Genomic | GCF_000165345.1 |
Table 8: Example update_metadata.tsv file for the --update_metadata option. The first line is the header, and each subsequent line contains the Unique ID, Long Name, Higher Taxonomy, Lower Taxonomy, Data Type, and Source for each taxon to be updated.
--update_taxonomy_colors TSV file format:
- The first line should contain the column headers:
TaxonomyandColor. - Each subsequent line should contain the corresponding values for each column, separated by tabs.
| Taxonomy | Color |
|---|---|
| Amoebozoa | Red |
Table 9: Example update_taxonomy_colors.tsv file for the --update_taxonomy_colors option. The first line is the header, and each subsequent line contains the Taxonomy and Color for each taxonomy to be updated.
--update_unique_ids TSV file format:
- The first line should contain the column headers:
Old IDandNew ID. - Each subsequent line should contain the corresponding values for each column, separated by tabs.
| Old ID | New ID |
|---|---|
| Acancast | Acancas2 |
Table 10: Example update_unique_ids.tsv file for the --update_unique_ids option. The first line is the header, and each subsequent line contains the Old ID and New ID.