1265 Welch Rd
Stanford, CA 94305
USA
Marcos Martinez-Romero, PhD (Research Software Developer)
Martin J. O’Connor, M.S. (Senior Software Developer)
The CEDAR system: a suite of tools to simplify the authoring of high-quality metadata in biomedicine
Abstract:
The ability to find and to access biomedical data that are stored in online repositories depends on the quality of the associated metadata. There is a growing set of community-developed guidelines and standards for defining such metadata, but the barriers to creating metadata using those standards are tremendously high. Producing well-defined metadata takes time and effort, and many investigators view the metadata authoring task as a burden. The Center for Expanded Data Annotation and Retrieval (CEDAR) is a Center of Excellence supported by the NIH Big Data to Knowledge (BD2K) initiative that is developing technologies to assist the process of managing biomedical metadata. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. Our goal is to develop an end-to-end system to support the creation of comprehensive and expressive metadata to facilitate data discovery, interoperability, and reuse. In this talk, we will provide an overview of the tools that we are developing and outline our future plans for simplifying the process by which biomedical investigators annotate their experimental data with high-quality metadata.