Metadata Explanation
The file metadata.tsv
initially contains information regarding the data used to construct the database. As users add taxa to the database metadata.tsv
acts as a permanent archival file and is updated after each round of addition of new taxa. The contents of metadata.tsv
can be explored using the provided utility explore_database.py
. A user should never be required to manually open this file to view its contents. However, the structure of metadata.tsv
is detailed below.
metadata.tsv
is a tab delimited file with six columns:- Unique ID - an abbreviated name for each taxon. We have followed an eight-letter convention. For user’s making a
metadata.tsv
for a custom database,Unique IDs cannot contain underscores “_”, at symbols “@”, double dots “..”, or white spaces. - Long Name - Full name of the input taxon. For user’s making a
metadata.tsv
for a custom database, the long name cannot contain underscores “_”, at symbols “@”, double dots “..”, or white spaces. - Higher Taxonomy - The highest taxonomic rank for the input organism desired. We have chosen to use the “supergroup” level here but any user defined rank is accepted. However, we do recommend users choose a highly inclusive taxonomic rank here (phylum or class level in Linnaean terms) to promote the best possible performance of the fisher algorithm and other downstream tools.
- Lower Taxonomy - a taxonomic rank above the genus level for the input taxon
- Notes on Taxonomy: The default taxonomic terms in
metadata.tsv
can be replaced with user preferred terms. If terms are to be replaced though, we recommend replacing them with synonyms of similar rank. DO NOT MIX SYNONYMOUS TERMS (see example below). This will lead to misbehavior in the fisher algorithm and wrongly denoted suspicious clades during generation of files for visualisation and manual inspection of single gene trees. The same erratic behavior will likely occur if our chosen taxonomic ranks are split up and replaced by several less inclusive taxonomic terms.- Example Taxonomic Change: If the term “Chromist” is preferred over “Stramenopiles” replace all instances of “Stramenopiles” in
metadata.tsv
with “Chromist” before adding new taxa with “Chromist” chosen as the higher taxonomic term ininput_metadata.tsv
. DO NOT MIX THE TWO, erratic behavior will ensue. Go here for more information regardinginput_metadata.tsv
requirements related tometadata.tsv
.
- Example Taxonomic Change: If the term “Chromist” is preferred over “Stramenopiles” replace all instances of “Stramenopiles” in
- Notes on Taxonomy: The default taxonomic terms in
- Data Type - A place to add a note about the type of data being added (ex. transcriptomic, genomic, EST . . .
- Source - A place to add notes about the source of the data such as accession numbers, “in-house information”, strain information etc.
- Unique ID - an abbreviated name for each taxon. We have followed an eight-letter convention. For user’s making a
NOTE: COMMA (,) MAY NOT BE USED ANYWHERE IN METADATA.TSV
!