In the identification of metabolites, spectral information of metabolite ions and secondary fragment ions is obtained after the raw data is processed by software. This includes parameters such as mass-to-charge ratio (m/z), retention time, and signal intensity. Matching this information with the spectral data of first and second-level metabolites in databases, such as METLIN, MassBank, mzCloud, HMDB, KEGG, MetaCyc, Lipidmaps, MS-Dial for LC-MS, and NIST, Fiehn, GMD for GC-MS, helps determine the detected metabolites.
Select Service
Learn more
METLIN Database
The METLIN database, developed by the Scripps Research Institute in the United States, comprises over 960,000 compounds. It is currently the largest secondary mass spectrometry database and is widely used in metabolomics research. Access to this database requires payment.
METLIN (https://metlin.scripps.edu/)
mzCloud Database
The mzCloud database, created by Thermo Fisher Scientific using the QE series mass spectrometer with standard substances, is an online database containing high-resolution, accurate mass spectra of first and second-level compounds. The database includes over 19,000 compounds, with 3,700+ being endogenous substances, and it is continuously updated in real-time.
mzCloud (https://www.mzcloud.org/)
MassBank Database
The MassBank database primarily includes mass spectra obtained from chemical standards of metabolites. It provides information about the mass spectrometer model and settings used for each standard. MassBank is an open-source database widely used in the field.
MassBank (http://www.massbank.jp/)
HMDB Database
The Human Metabolome Database (HMDB) is a comprehensive human metabolomics database founded by the Canada Metabolomics Innovation Centre (TMIC). It is one of the most commonly used metabolomic databases. HMDB is open-source and can be accessed and downloaded for free. The database, currently updated to version 4.0, contains information on over 110,000 metabolites, including chemical information, clinical data, and molecular biology data. Under the HMDB umbrella, other databases include DrugBank (a drug database with information on approximately 2,280 drug metabolites), T3DB (a toxin database with information on around 3,670 common toxins and environmental pollutants), SMPDB (a small molecule metabolic pathway database with information on over 30,000 human metabolites and disease pathways), FooDB (a food research database with information on around 28,000 food components and food additive metabolites), and HFMDB (a fecal metabolite database with detailed information on many small molecule metabolites found in human feces, including concentration values).
HMDB (https://hmdb.ca/)
DrugBank (https://go.drugbank.com/)
T3DB (http://www.t3db.ca/)
SMPDB (https://smpdb.ca/)
FooDB (https://foodb.ca/)
HFMDB (https://fecalmetabolome.ca/)
KEGG Database
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the most widely used pathway database, containing a large amount of metabolite, reaction, enzyme, and gene information for all species. The KEGG database contains information aimed at understanding the function and interactions of genes in biological systems (e.g., cells, tissues, etc.), metabolism and metabolite functions and interactions in biological systems (e.g. cells, tissues, etc.).
KEGG (https://www.kegg.jp/)
MetaCyc Database
Like the KEGG database, MetaCyc is a pathway database containing experimentally elucidated metabolic pathways from all areas of life. MetaCyc contains pathways involved in primary and secondary metabolism, as well as associated metabolites, reactions, enzymes, and genes, and is more commonly used in plant metabolomics. MetaCyc currently contains 2,937 pathways, 17,780 reactions, and 18,124 metabolites, and is being updated in real time.
MetaCyc (https://metacyc.org/)
Lipidmaps Database
Lipidmaps database, created by the National Institutes of Health (NIH), is the largest and most authoritative lipid database, including the structure, spectrum and classification information of more than 40,000 kinds of lipids.Lipidmaps classifies lipids into 8 categories according to the structure and function of lipids, and the classification standard is widely used.Lipidmaps database is open source and can be accessed and downloaded for free. The Lipidmaps database is open source and can be accessed and downloaded for free.
Lipidmaps (https://www.lipidmaps.org/)
Lipidblast Database
The Lipidblast database is an open source database put together by the Fiehn lab. The database contains more than 200,000 MS2 mass spectra of about 100,000 metabolites, and LipidBlast also contains information on many bacterial and plant lipids that are not included in the LMSD database. It can be used as a complement to Lipidmaps analysis, and Lipidmaps and Lipidblast are currently the most commonly used databases for lipidome identification.
Lipidblast (https://fiehnlab.ucdavis.edu/projects/LipidBlast)
NIST Chemistry Database
The NIST Chemistry Database was established by the National Institute of Science and Technology (NIST) and others. NIST contains more than 200,000 EI mass spectra of more than 160,000 metabolites, and the latest version of the NIST database also contains ESI MS/MS mass spectra of small molecules, which is the most commonly used database for GC-MS analysis.
NIST (https://www.nist.gov/mml/odi)
The Fiehn library
The Fiehn library contains more than 2,200 EI mass spectra for over 1,000 metabolites, including detection information for both quadrupole and TOF mass analyzers, and the Fiehn database is now commonly used for GC-MS analysis.
Fiehn database (https://fiehnlab.ucdavis.edu/projects/softwaredev)
GMD Database
The GMD database is a plant metabolome database primarily designed for non-targeted GC-MS analysis. It contains a vast collection of GC-MS spectra of plant metabolites.
GMD ( http://gmd.mpimpgolm.mpg.de/ )
Due to the large and diverse number of identified metabolites, which can be challenging for data retrieval and analysis, annotations are commonly performed during metabolite identification. Annotations provide insights into the functions, classifications, and statistical information of identified metabolites. Commonly used metabolite annotation databases include HMDB (https://hmdb.ca/), KEGG (https://www.kegg.jp/), Lipidmaps (https://www.lipidmaps.org/), and other comprehensive databases. While analysis software and websites often include partial annotation information during metabolite identification, it may be relatively concise. Additional detailed annotations can be added using programming languages like R, Python, Perl, etc.
Identification to metabolites in the HMDB database super class classification statistics
Identification to metabolites in KEGG database class and sub class classification statistics
Identification to metabolites in lipidmaps database class and sub class classification statistics