Publication Details

ID: 48

MarkerDB 2.0: a comprehensive molecular biomarker database for 2025.

Authors

Jackson H; Oler E; Torres-Calzada C; Kruger R; Hira AS; Lopez-Hernandez Y; Pandit D; Wang J; Yang K; Fatokun O; Berjanskii M; MacKay S; Sajed T; Han S; Woudstra R; Sykes G; Poelzer J; Sivakumaran A; Gautam V; Wong G; Wishart DS

Journal/Conference

Nucleic acids research Vol. 53 (D1) , pp. D1415-D1426

Abstract

MarkerDB (https://markerdb.ca) has become a leading resource for comprehensive information on molecular biomarkers. Over the past 3 years, the database has evolved significantly, reflecting the dynamic landscape of biomarker research and increasing demands from its user community. This year's update, which is called MarkerDB 2.0, introduces key improvements to enhance the database's usability, consistency and the range of biomarkers covered. These improvements include (i) the addition of thousands of new biomarkers and associated health conditions, (ii) the inclusion of many new biomarker types and categories, (iii) upgraded searches and data filtering functionalities, (iv) new features for exploring and understanding biomarker panels and (v) significantly expanded and improved descriptions. These upgrades, along with numerous minor improvements in content, interface, layout and overall website performance, have greatly enhanced MarkerDB's usability and capacity to facilitate biomarker interpretation across various research domains. MarkerDB remains committed to providing a free, publicly accessible platform for consolidated information on a wide range of molecular (protein, genetic, chromosomal and chemical/small molecule) biomarkers, covering diagnostic, prognostic, risk, monitoring, safety and response-related biomarkers. We are confident that these upgrades and updates will improve MarkerDB's user friendliness, increase its utility and greatly expand its potential applications to many other areas of clinical medicine and biomedical research.

Publication Info

  • Year: 2025
  • Publication Date: Nov. 13, 2024
  • Source: Google Scholar

Identifiers

PubMed Data

Additional Information

  • Publication Type: Journal Article
  • Language: eng
  • Last PubMed Update: April 22, 2025

Full Text

The term biomarker was first introduced in the 1970s (

Given the enormous variation in the type, character and quality of biomarkers, it is easy to understand why detailed information about clinically useful biomarkers is often hard to find and even harder to understand. Combined with the huge proliferation of biomarker studies over the past two decades, accessing comprehensive, up-to-date and useful biomarker data, especially for molecular biomarkers, remains challenging. Indeed, it is not uncommon to see the same biomarkers being repeatedly ‘rediscovered’, completely contradictory biomarker results being presented or new biomarkers being sought out when outstanding biomarkers already exist. This is especially true with molecular biomarkers where the advent of ‘omics’ measurements has led to thousands of new molecular biomarkers appearing in clinical tests and many more investigational biomarkers appearing in the literature.

It is because of the confusing state of the literature on biomarkers and the enormity of the number of known or suspected molecular biomarkers that MarkerDB was created. First released in 2021, MarkerDB was designed to simplify the challenge of navigating biomarker space by offering free, consolidated and user-friendly information on a wide range of molecular biomarkers in an easily searchable and readily downloadable format. MarkerDB’s definition of a molecular biomarker is relatively broad and includes a diverse array of biological molecules covering chromosomes, DNA, RNA, proteins, lipids and small molecules (both exogenous and endogenous). MarkerDB also focused on providing detailed data on a subset of clinically approved molecular biomarkers for which there was extensive information available, namely clinical diagnostic, risk, monitoring, safety, response and prognostic biomarkers.

As one of the first, comprehensive, consolidated resources on molecular biomarkers to be made freely available, MarkerDB has been well received, having already been cited ∼100 times. However, as with any initial database release, MarkerDB was also a work in progress. Over the past 3 years, our team has received invaluable feedback about MarkerDB’s strengths and weaknesses from its many users. We have also carefully reviewed MarkerDB’s content and identified a number of oversights and shortcomings with version 1.0 of the database. For instance, several well-known clinical markers were missing or incompletely described in version 1.0. Likewise, thousands of genetic risk [single nucleotide polymorphism (SNP)] biomarkers were not included or formally presented in an easily viewable format. Similarly, the number of previously defined exposure biomarkers was far too limited and certainly needed updating. Likewise, the depth and detail of many disease descriptions as well as several biomarker descriptions needed improvement and more extensive referencing. Additionally, MarkerDB’s layout and data formatting was outdated, clumsy to search/navigate and inconsistent compared with our other databases such as HMDB (

In addition to this internal evaluation, we also did a detailed survey of other recently published biomarker databases to further refine MarkerDB’s position in the biomarker ‘universe’. Specifically, we looked at databases such as CellMarker (

These internal and external assessments allowed us to design a new and improved version of MarkerDB—MarkerDB 2.0. To create this new version of MarkerDB the curation team primarily focused on making significant additions and improvements to the database’s content. These included adding 12 674 new risk biomarkers, 45 new safety biomarkers, 636 new monitoring biomarkers and 25 new response biomarkers. In total, 694 clinically approved as well as 6329 investigational biomarkers were added to the database. In addition to these data enhancements, the old MarkerDB website design was completely revamped and modernized. This involved the creation of a new and improved layout offering refined searching, filtering and display features. This redesign will allow users to navigate through the database and find or interpret biomarker information much faster and far more easily. Other improvements to MarkerDB 2.0 include more complete, fully referenced biomarker and condition descriptions, an updated, more consistent Disease Ontology (DO) and the addition of many more external and internal hyperlinks (and references). Data on biomarker panels (i.e. collections of >1 biomarker) have also been enhanced to provide comprehensive information on individual biomarkers within each panel, displayed in a tabular format with clickable links. Collectively, these changes have further distinguished MarkerDB 2.0 from most other biomarker databases. Indeed MarkerDB 2.0 is now quite unique in terms of its size and scope, particularly through its inclusion of both clinically approved and investigational biomarkers across multiple diseases, conditions and molecular marker types as well as across all major biomarker categories (see Table

Comparison of MarkerDB 2.0 with other biomarker databases currently available

More detailed descriptions regarding each of these improvements to MarkerDB are given under the following subsections: (i) more biomarker entries, (ii) improved layout, design and usability (iii) biomarker panel, search and filtering improvements and (iv) database implementation, curation and FAIRness.

Since its release in 2021 (

Summary of biomarkers, biomarker classes, molecular categories and word counts that have changed substantially from MarkerDB 1.0 to MarkerDB 2.0

To identify relevant studies for investigational biomarkers, the following medical subject headings terms were used in various combinations: ‘Biomarkers’, ‘Metabolites’, ‘Metabolomics’, ‘Genetic Markers’, ‘Proteomics’, ‘Genomics’, ‘Metabolic Networks and Pathways’, ‘Bioinformatics’, ‘Molecular Diagnostic Techniques’, ‘Metabolic Diseases’, ‘Randomized Controlled Trials’, ‘Chemical Compounds’, ‘Data Integration’ and ‘Enzymes’. Boolean operators (AND, OR) combined terms, such as ‘Biomarkers’ AND ‘Metabolic Diseases’ or ‘Genetic Markers’ AND ‘Metabolomics’. The search covered studies published between 1994 and 2024 without language restrictions and included studies identifying or validating investigational biomarkers in human health, RCTs, cohort studies and those providing quantitative biomarker data, such as genome-wide association studies (GWAS) or bioinformatics integration. Non-peer-reviewed, animal-only and

A significant focus for MarkerDB 2.0 was on expanding the number and type of exposure biomarkers, especially diet-related biomarkers and pollutant (i.e. chemical exposure) biomarkers. Pollutant and other chemical exposure biomarkers were sourced primarily from the National Health and Nutrition Examination Survey (NHANES) (

Another major focus of MarkerDB 2.0 was on expanding the number of clinically approved biomarkers. A particularly rich source of clinically approved diagnostic biomarkers was the Mayo Clinic biomarker database (

Another major focus for MarkerDB 2.0 was the inclusion of additional genetic biomarkers, particularly SNP ‘risk’ biomarkers generated via GWAS. While very few GWAS markers have made it into the clinic, they are of general interest for disease risk assessment and have been used in many consumer-oriented genetic testing services. Given their potential importance, we collected a significant number of GWAS markers or polygenic marker panels and added them to MarkerDB 2.0. Using data from our previously published GWAS-ROCS (

In addition to these SNP biomarkers, we also revisited our genetic mutation biomarker data. While MarkerDB’s collection on mutation biomarkers is already very extensive, we were able to collect new data on clinically significant cancer mutation biomarkers from My Cancer Genome (

MarkerDB 2.0 has also added RNA, circulating tumor DNA (ctDNA) and cell-free DNA (cfDNA) biomarkers, which were not present in version 1.0. The addition of these molecular biomarkers contributed 196 new entries, including 71 clinical and 125 investigational biomarkers, covering a variety of conditions. The COSMIC database (

Counting both clinically approved and investigational biomarkers, a total of 574 new chemical biomarkers and 76 new protein biomarkers were added to MarkerDB 2.0. This led to the inclusion of 325 new medical conditions and the updating of 362 existing medical conditions in MarkerDB 2.0. Additional performance information [sensitivity, specificity, areas under the curve (AUCs), etc.] about many of these new protein and chemical biomarkers was generated via MetaboAnalyst (

While normal reference values (or reference intervals) for proteins or metabolites in various biofluids are not typically considered biomarkers, they are vital to many diagnostic protocols and for interpreting diagnostic tests. Adult reference intervals for many important chemical biomarkers or metabolites have long been available in reference books and journals and have been compiled in the HMDB (

The CALIPER project (

The CALIPER pediatric reference intervals have been integrated into MarkerDB 2.0, adding 905 new biomarker concentration values. These reference interval additions should help support more precise interpretation of pediatric biomarker data, reducing the unnecessary follow-up tests and improving clinical decision making. By including these values, MarkerDB reaffirms its commitment to providing comprehensive, age-appropriate reference data wherever possible. Indeed, the CALIPER dataset was recently complemented by our own clinical reference interval data for urine collected on a population of healthy neonates (

To allow these values to be more easily accessed by users, a ‘Browse Reference Concentrations’ page has been added, providing detailed information on healthy concentrations or normal states for all biomarkers in MarkerDB 2.0. Users can access this page from the MarkerDB 2.0 landing page and can view concentrations either directly on the page, or by clicking on the MarkerDB ID.

Significant improvements have been made to the quality of both biomarker and health condition descriptions in MarkerDB 2.0. Every entry in MarkerDB 2.0 now includes a detailed description, typically ranging from 100 to 500 words, providing comprehensive information about each biomarker and its related health condition or conditions along with numerous, in-line references or citations. In total, the curation team has enhanced descriptions for over 300 biomarkers and over 600 conditions using both semi-automated and manual annotation approaches.

For well-known health conditions and biomarkers, ChatGPT 4.0 (

Approximately 80% of the ChatGPT-generated outputs were usable with revisions to ensure accuracy and scientific rigor, while 20% did not meet our acceptance criteria and were manually rewritten. For less familiar conditions and biomarkers, descriptions were manually researched using a variety of well-regarded reference sources (

The MarkerDB 2.0 website has undergone a comprehensive redesign to enhance its visual appearance, improve its layout, simplify its navigation, upgrade its usability and ameliorate the presentation of useful biomarker data. This updated version (

Screenshots from MarkerDB 2.0 showcasing (

In the previous version of MarkerDB (

Detailed information about MarkerDB 2.0’s collection of biomarkers is now provided in sortable tables, which list biomarker names, MarkerDB IDs (with hyperlinks), associated health conditions and other relevant features (Figure

MarkerDB 2.0’s redesign significantly improves the user’s ability to retrieve and work with data, focusing on enhancing clarity, navigation and data accessibility. The new, consistent color scheme and standardized iconography simplify navigation, allowing users to intuitively explore biomarker categories and health conditions. These improvements, including higher resolution and color-coordinated icons, guide users effortlessly through the database, reducing the time spent searching for specific information. The new icons, representing each of the 11 different biomarker types and categories, are far more intuitive in design and are of higher quality than those in the previous version (Figure

Each biomarker entry is now more richly annotated with descriptions, structural or schematized images, hyperlinks, references and most importantly, quantitative data. This includes

Recent developments in 3D protein structure prediction via tools like AlphaFold (

Screenshot showing a 3D protein structure for glycated hemoglobin from the Mol*Viewer. This feature is accessible by clicking ‘View 3D Structure’ when on a given protein biomarker page.

In addition to displaying 3D protein structures, the 3D structure of all small molecule biomarkers in MarkerDB 2.0 is now interactively viewable via the same Mol*Viewer (

A biomarker panel comprises a group of two or more markers that collectively reflect various pathophysiological processes associated with a disease or health condition. In the initial release of MarkerDB, biomarker panels were displayed similarly to individual biomarkers, but also associated with a logistic equation composed of multiple normalized biomarker levels instead of conventional concentration values. Recognizing the importance and frequency with which biomarker panels are being described, especially with GWAS data, we prioritized enhancing their usability and searchability in MarkerDB 2.0. This was done by reformatting the layout of the ‘old’ biomarker panels so that they would be more easily searched and displayed in a user-friendly tabular format. Each biomarker panel in MarkerDB 2.0 now includes direct links to individual biomarker pages, allowing rapid access to detailed information about each biomarker in the panel. An example of the new biomarker panel format can be seen in Figure

Screenshot from MarkerDB 2.0 of an example of a chemical biomarker panel for

Biomarker panels in MarkerDB 2.0 are identified by querying metabolites specifically linked to the condition of interest. These panels may either come from pre-existing publications or be generated by our team. To calculate logistic tables and AUC values, we use the means and standard deviations of biomarker concentrations to construct synthetic populations, which are then split into training and testing sets. A logistic regression model is trained on the training set, and ROC curves are generated. Sensitivity, specificity and AUC values are derived from the test set, with variability accounted for by adjusting the mean of each ROC curve point by two standard deviations. It is worth noting that individual biomarkers within a panel are not double-counted in categories.

Advanced filtering options and refined search capabilities allow users to quickly locate relevant biomarkers by class, approval status, name, sequence or structure, streamlining data retrieval and boosting efficiency. For instance, an additional filter for health conditions has been added to the ‘Browse Conditions’ page, which displays schematic icons representing each health condition (Figure

Screenshots from MarkerDB 2.0 demonstrating: (

A biomarker does not necessarily have to be associated with a disease. Indeed, biomarkers can also be associated with different states of health, such as pregnancy, menstrual cycle phases or menopause. As a result, we have chosen to use a more general health status term, ‘health condition’ or simply ‘condition’, to refer to health states associated with biomarkers. MarkerDB 2.0 has reorganized its collection of health conditions for better clarity and improved categorization. Previously, in MarkerDB 1.0, health conditions were linked to broader categories listed under ‘General Conditions’. Version 2.0 introduces refinements to these categories based on insights and standards described in the DO, developed and maintained by disease-ontology.org (

Specifically, the curation staff with MarkerDB 2.0 has corrected the names of a number of existing health condition categories, removed less informative ones and added new health categories to match standard DO categories. The broad health condition/disease category ‘Other’ has been eliminated, with all health conditions reclassified into more specific and more appropriate categories. A ‘Browse Conditions’ page now displays these new health condition categories (Figure

MarkerDB 2.0 was developed using the Ruby on Rails web framework (

All data in MarkerDB 2.0 has been vetted and validated by a team of experts, each with at least an undergraduate degree in relevant fields such as genetics, molecular biology, biochemistry or natural products chemistry. Data entry is carefully monitored through a centralized, password-controlled database, allowing any changes and edits to the database to be consistent, time-stamped and automatically transferred. Curators received extensive training by the lead curator(s) in biomarker annotation via hands-on mentoring, text instructions, peer support and tutorials.

MarkerDB 2.0 adheres to FAIR principles (Findable, Accessible, Interoperable and Reusable) (

MarkerDB 2.0 offers a robust web-based user-interface with extensive search functionalities and an application programming interface (API) for programmatic data access. The database undergoes continuous improvements and updates, with minor corrections and additions being done on a regular basis. The data in MarkerDB are released under a Creative Commons Attribution BY and NC license.

We believe MarkerDB 2.0 represents a significant advancement for online biomarker databases. This version of MarkerDB is characterized by a number of important biomarker additions, enhanced search and display functionalities, improved data organization, expanded accessibility and improved biomarker panels. These biomarker additions include numerous clinical biomarker reference intervals (for adults and children) and the addition of over 6000 new biomarkers, primarily chemical and protein biomarkers, from extensive literature searches and the consolidation of data from a number of online sources such as NHANES (

While this particular update to MarkerDB has emphasized the addition of new categories and types of biomarkers, the expansion of current biomarker categories, and improvements in layout to aid in usability, future updates of MarkerDB will likely focus on expanding the content of other biomarker categories such as prognostic biomarkers and further expanding the newly added response, monitoring and safety biomarkers. Currently, MarkerDB features only molecular biomarkers. Future updates are expected to expand into other biomarker types, including imaging and tissue/histology biomarkers to provide a more holistic view of both clinical biomarkers and research biomarkers. These additions will, no doubt, bring new challenges concerning biomarker data compilation and display. Regardless of where our future plans take us, the MarkerDB team remains committed to implementing continuous improvements, addressing user feedback and enhancing MarkerDB’s functionality, reliability and usability. By staying responsive to community needs, we aim to ensure that MarkerDB remains an invaluable resource for medical researchers and healthcare professionals.