CS Seminar

Title: Disease Risk annotation of Genomic and Epigenomic Variants using Machine Learning Approaches
Defense: Computer Science
Speaker: Yanting Huang, Emory University
Contact: Zhaohui Qin, zhaohui.qin@emory.edu
Date: 2022-07-15 at 12:00PM
Venue: https://zoom.us/j/5398613286?pwd=bWltZWxQVi9GZmU5WWRyUDlqamdTdz09
  Download Flyer  Add to Calendar
Abstract:
Understanding the impact of genomic variations and epigenomic modifications is important for discovering the mechanism of complex diseases. Over the last two decades, thousands of genome-wide association studies (GWASs) and epigenome-wide association studies (EWASs) have identified tens of thousands of disease-susceptibility loci that are associated with certain diseases. In addition to the association studies, many machine learning approaches have been applied to predict the pathogenicity of genetic variants and epigenetic modification. For example, logistic regression was used in CADD that prioritized functional, deleterious, and pathogenic variants. Random forests were used in GWAVA to distinguish disease-implicated variants from benign variants. A hybrid two-stage model with support vector machine, random forests, logistic regression, the Lasso and elastic net was used in BioMM to identify epigenetic signatures of schizophrenia.

In my thesis, I proposed several machine learning predictive models with different focuses on genomic and epigenomic variants annotations, which includes 1) EWASplus, an ensemble learning based framework for the risk prediction of DNA methylation loci associated with Alzheimer’s Disease, 2) CASAVA (Disease Category-specific Annotation of Variants), a disease category risk annotation for whole genome wide SNPs (single nucleotide polymorphism), 3) DRAFT (Disease Risk Annotation with Few shoTs learning), an end-to-end deep learning based approach that incorporates contrastive learning to tackle the lack of risk variants that hinder the application of traditional deep learning models to this research field. The raw training data is obtained from the ENCODE and the REMC projects and processed with our own pipeline.

See All Seminars