Mark D. Yandell, PhD
Dr. Mark Yandell is an internationally recognized expert in comparative and
Title and Abstract:
Annotating genomes and their sequence-variants using interoperable, machine-readable data standards
Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah and School of Medicine, Salt Lake City, Utah, USA,
The ever-falling cost of sequencing is having dramatic impacts on the research community with regard to which, how and where genomes are sequenced. Indeed, costs have now fallen to the point where a sequenced genome is often only one component of a genomics-centered research plan, with many of today’s projects also involving significant transcriptome and re-sequencing efforts as well. The scale of these projects is truly staggering, and they present many challenges in quality control and curation. These gigantic datasets preclude ad-hoc manual curation efforts and require automated approaches for data management and quality control. This in turn makes the use of interoperable, machine-readable data standards essential. Fortunately there are several widely used data-standards available for the genomics domain. These include GFF for representation of genome annotations and their associated evidence; and VCF and GVF for representation of sequence variants. I will show how the use of these standardized formats is empowering individual investigators and small collaborative groups to annotate, manage, curate and analyze even truly huge genomes datasets. I will also discuss the challenges presented by genome re-sequencing, especially as regards annotation of these data in an interoperable machine-readable fashion. Finally, I will highlight a few examples from my own group illustrating how genome annotation and re-sequencing efforts can be combined for rapid identification of the genes and alleles underlying human disease and characteristic traits of plant cultivars and animal breeds.