BMIR Research in Progress: Marcos Martinez-Romero “CEDAR’s Predictive Data Entry: Easier and Faster Creation of High-quality Metadata”

February 2, 2017 @ 12:00 pm – 1:00 pm
MSOB, Conference Room X-275
1265 Welch Rd
Stanford, CA 94305
Marta Vitale-Soto
(650) 724-3979


Martinez Romero_Marcos
Marcos Martínez-Romero, PhD
Research Software Developer
BMIR, Stanford University

The ability to find and to access biomedical data that are stored in online repositories depends on the quality of the associated metadata. Despite the growing number of community-developed standards for describing biomedical experiments, the practical difficulties to creating accurate, complete, and consistent metadata are still considerable.

The Center for Expanded Data Annotation and Retrieval (CEDAR) is developing novel methods and tools to simplify the process by which investigators annotate their experimental data with metadata. The CEDAR Workbench is a suite of Web-based tools that together form a pipeline for authoring metadata. As a step towards decreasing authoring time and effort while increasing metadata quality, we have enhanced the CEDAR Workbench with predictive data entry capabilities. Our system identifies common patterns in the CEDAR metadata repository, and generates real-time suggestions for filling out metadata acquisition forms. These suggestions are context-sensitive, meaning that the values predicted for a particular field are generated and ranked based on previously entered values.

In this talk, I will discuss some of the challenges that have arisen while implementing our approach, and our strategies for making this capability useful to the end users of CEDAR. I will demonstrate CEDAR’s intelligent authoring capabilities, and show how the technology that we are developing leverages existing metadata to make the authoring of high-quality metadata a manageable task.