NIAID Data Ecosystem Discovery Portal
Emily Haag, Candice Czech, Ginger Tsueng, Nichollette Acosta, Jason Lin, Dylan Welzel, Everaldo Rodolpho, Chunlei Wu, Andrew I. Su, Laura D. Hughes, Wilbert Van Panhuis, Asiyah Lin, Meghan Hartwick, Jack DiGiovanna, Sai Subrahmanian, Deepti Jain, Poro Burman, Eric Tobin, Sudha Venkatachari, Maria Giovanni, Zorana Mitrovic Vucicevic
Introduction
The world is sitting on a wealth of knowledge that has yet to be fully realized. Researchers around the globe generate huge amounts of biomedical data across an enormous spectrum of specialties - clinical data, genomic data, epidemiological data – which describe cell responses to infection, improve our understanding of immune-mediated diseases, and can ultimately lead to breakthroughs in fighting or curing disease. Many publicly-available datasets that could support life-saving research are underutilized because they are difficult to find.
In an effort to increase the findability of these resources, NIAID Office of Data Science and Emerging Technologies (ODSET) created the NIAID Data Ecosystem Discovery Portal. The Discovery Portal allows researchers to search simultaneously across millions of publicly available datasets to find immune-mediated and infectious disease data for reanalysis.
Infrastructure
This new tool aims to promote open science and FAIR data. It harvests and standardizes metadata from a number of sources, allowing users to find available data and links them to data repositories to access the data. The Discovery Portal focuses on the findability of data, aggregating resources across numerous sources, including NIAID-supported repositories, general biomedical repositories, and other generalist sources.
Metadata is transformed into a common schema, derived from Schema.org and BioSchemas. This schema has been extended from Data Discovery Engine to promote interoperability with other data findability projects, with a number of fields added to tailor to the needs of immune-mediated and infectious disease researchers. This metadata standardization not only facilitates advanced searching and filtering across varied sources, but it helps researchers better understand what the data contains before accessing it from the source. Additionally, all metadata harvested by the Discovery Portal can be downloaded or accessed via the NIAID Data Ecosystem Discovery API.
Using the tool
The Discovery Portal can be used to:
- Search across 2.8+ million datasets from 15 sources and growing, datasets that were previously unknown, to bring other dimensions into analyses.
- Download metadata or access via API to gather new insights about what’s available.
- Track research across funding programs or specific scientific areas.
For example, one researcher was able to use the Discovery Portal to find data to support his image segmentation study into lesions that are related to SARS-CoV-2 in non-human primates. He started with a general search for “SARS-CoV-2” and used the pre-built filters to narrow down the thousands of results to few Computed Tomography (CT) image datasets of interest within minutes.
Likewise, another scientist found a dataset related to co-infections of dengue virus and COVID-19 to aid his research using the Discovery Portal’s Advanced Search tool.
Conclusion
The Discovery Portal aims to help accelerate the work of researchers around the world, potentially leading to faster development of diagnostics, therapeutics, and vaccines. By optimizing the reuse of data, NIAID hopes to maximize free information exchange between researchers breaking ground in their investigations into infectious, immunologic, and allergic diseases.
Try the NIAID Data Ecosystem Discovery Portal at https://data.niaid.nih.gov and share your feedback with us by email at [email protected] or on GitHub.