# All Seminars

Title: Computational discovery of interpretable histopathologic prognostic biomarkers in invasive carcinomas of the breast
Defense: Computer Science
Contact: Vaidy Sunderam, VSS@emory.edu
Date: 2021-11-30 at 1:00PM
Venue: https://northwestern.zoom.us/j/95813522155
Abstract:
While microscopic examination of tumor resections and biopsies has been a cornerstone in breast cancer grading for decades, it suffers from considerable inter-rater variability due to perceptual limitations and high clinical caseloads. Computational analysis of whole-slide image scans using convolutional neural networks (CNN) can help address this challenge. Unfortunately, CNNs can be difficult to interpret, which motivates our adoption of an approach called concept bottlenecking, where models first detect various tissue structures then use them to make their prediction. Concept bottleneck models require a large set of manual annotation data to train. Unfortunately, manual delineation of histopathologic structures is very demanding and impractical given pathologists’ time constraints. This dissertation describes contributions that fall under the themes of scalable data collection, deep learning-based tissue detection, and the discovery of novel histopathologic biomarkers and associations.

First, we examine crowdsourcing approaches that engage medical students to collect manual annotation data. Our results show that a structured, collaborative approach with pathologist supervision is scalable; the resultant publicly-released BCSS and NuCLS datasets contain 20,000 and 200,000 annotations of tissue regions and nuclei, respectively. We show that medical students produce accurate annotations for predominant, visually distinctive structures and that algorithmic suggestions help scale and improve the accuracy of annotations.

Second, we describe a set of CNN modeling approaches for the accurate delineation of histopathologic structures. We describe various improvements to enhance the performance of nucleus detection CNN models and introduce a technique called Decision Tree Approximation of Learned Embeddings, which helps explain CNN nucleus classifications without compromising prediction accuracy. Additionally, we offer consensus recommendations from the International Immuno-Oncology Working Group surrounding the computational detection of tumor-infiltrating lymphocytes, a critical emerging biomarker. Following these recommendations, we develop and validate a multi-scale CNN model that jointly detects tissue regions and nuclei, employing pre-defined biological constraints to improve accuracy.

Finally, we describe the development of a morphologic signature based on quantitative features extracted from computationally-delineated histopathologic regions and cells. This morphologic signature relies partly on a set of stromal features not captured by clinical guidelines for breast cancer grading, and has a stronger independent prognostic value.
Title: Knowledge-Aware User Intent Inference for Web Search and Conversational Agents
Seminar: Computer Science
Contact: Eugene Agichtein, eugene.agichtein@emory.edu
Date: 2021-11-24 at 2:00PM
Venue: https://zoom.us/j/9912158487?pwd=aURCWjVpY1BmVzBaSDB6QktmZ2xvZz09
Abstract:
User intent inference is a critical step in designing intelligent information systems (e.g., conversational agents and e-commerce search engines). Accurate user intent inference improves user experience and satisfaction, but is a challenging task since user utterances or queries can be short, ambiguous, and contextually dependent. Moreover, in an e-commerce setting, the collected datasets are often labeled by weak supervision (e.g., click-through data), resulting in an imbalanced and sparse dataset. To address these problems, my dissertation proposes integrating entity knowledge-bases, conversation context, and user profile information to improve user intent inference for conversational agents. Additionally, I investigate joint learning, product taxonomies, and unlabeled domain-specific corpora (e.g., catalog) to improve query intent inference in e-commerce search.

To evaluate the proposed models, I examine the user intent inference for two main settings: 1) open-domain conversational agents and 2) e-commerce search engines. The conversational agent research is evaluated on conversations collected from real users as part of Amazon Alexa Prize competitions, and the e-commerce efforts use real query logs collected from The Home Depot's search engine. My dissertation shows that leveraging entity knowledge-base, conversation context, and user profile information accounts for most improvements for the conversational setting. The results demonstrate that the proposed models significantly enhance topic classification accuracy by 15% and dialogue act accuracy by 8% for conversational agents. For e-commerce search, the dissertation shows that joint-learning, product taxonomies, and unlabeled domain-specific corpora can significantly improve intent inference accuracy. The proposed models improve the performance of the top-1 retrieved documents by 6%-8% on standard metrics for e-commerce search. The results in both settings offer a significant improvement over state-of-the-art deep learning methods. The insights and findings in this dissertation suggest a promising direction for developing the user intent inference in both open-domain conversational agents and e-commerce search.
Title: Fairness in Social Networks
Seminar: Computer Science
Speaker: Sucheta Soundarajan, Syracuse University
Contact: Joyce Ho, joyce.ho@emory.edu
Date: 2021-11-19 at 1:00PM
Venue: https://emory.zoom.us/j/98352727203
Abstract:
Social networks play a vital role in the spread of information through a population, and individuals in networks make important life decisions on the basis of the information to which they have access. In many cases, it is important to evaluate whether information is spreading fairly to all groups in a network. For instance, are male and female students equally likely to hear about a new scholarship? In this talk, I present the novel "information unfairness" criterion, which measures whether information spreads fairly to all groups in a network. I then discuss the results of a case study on the DBLP computer science co-authorship network with respect to gender, with several surprising results.

Biography:

Sucheta Soundarajan is an Associate Professor in the Electrical Engineering & Computer Science Department at Syracuse University. Her areas of interest include social network analysis and data mining, and her research covers topics such as network clustering, sampling, information flow, and centrality. She is a recipient of the NSF CAREER award, Army Research Office Young Investigator Award, and the SIAM Science Policy Fellowship. She received her PhD from Cornell University in 2013.
Title: Predicting Rare Clinical Events in Complex and Dynamic Environments
Defense: Computer Science
Contact: Dr. Rishikesan Kamaleswaran, rkamaleswaran@emory.edu
Date: 2021-11-16 at 3:00PM
Venue: https://us02web.zoom.us/j/3978183382?pwd=anFkN2V3d2VBWEErd2VCOEdWL2xiUT09
Abstract:
Traditional machine learning classification algorithms assume a balanced proportion of classes in the data. However, class-imbalanced data is a challenge for training predictive models in many fields such as the medical domain. Although patient adverse outcomes occur rarely, they are worthy of prediction to improve the quality of care that patients have received; therefore, monitoring systems are needed in the hospital setting to capture the adverse rare events and improve patient health outcomes.

To that end, machine learning and natural language processing (NLP) techniques were used along with clinical expert knowledge to address the issue of rare event classification in a complex environment such as a hospital setting. In particular, two different patient cohort with distinct characteristics and objectives were investigated.

First, strategies were proposed to predict a rare type of infection among hospitalized children with central venous lines (CVLs). This cohort of pediatric patients are at high risk of morbidity and mortality from hospital acquired infections. Many serious infections in hospitalized children are likely preventable through interventions that prevent the infection or identify them early to initiate antimicrobial therapy. Besides being considered as a rare clinical event, the definitions that have been proposed for bloodstream infection commonly have inadequate sensitivity for clinically important infections and may be difficult to generalize across electronic health records (EHR) platforms. To infer the onset of the infection from EHR and eliminate the need for extensive chart reviews, a surrogate definition for bloodstream infection was proposed and validated. Then, two study designs were tested to improve the prediction accuracy of the onset of the infection during hospitalization. Finally, a data fusion approach was undertaken to integrate structured and unstructured information from EHR to boost the prediction performance. Incremental but meaningful improvements in the predictions were observed after each step.

Second, an algorithm was proposed to monitor the visits to an emergency department to detect intimate partner violence (IPV). IPV is a pervasive social challenge with severe health and demographic consequences. People experiencing IPV may seek care in emergency settings. Despite the urgency of this critical public health issue, IPV continues to be profoundly underdiagnosed and is considered a persistent hidden epidemic. IPV is frequently undercoded, undetected without appropriate screening tools, and underreported, rendering it a rare encounter in EHRs. The early and appropriate detection of and response to such cases is critical in disrupting the cycle of abuse including IPV related morbidity and mortality. Our proposed algorithm benefits from NLP techniques and domain expert knowledge. It can identify victims of IPV with a high sensitivity by analyzing the recorded provider notes and patient narratives.

We argue that all the techniques incorporated in this thesis are transferable to identify other rare clinical events with the ultimate goal of improving the level of care.
Title: Cracking the diversity code: Understanding computing pathways of those least represented in order to foster their representation
Seminar: Computer Science
Speaker: Monique Ross, Florida International University
Contact: Vaidy Sunderam, VSS@emory.edu
Date: 2021-11-12 at 1:00PM
Venue: https://emory.zoom.us/j/98352727203
Abstract:
Abstract: A significant gap exists in the understanding of factors that influence the participation of Black and Hispanic women in computer science. The objective is to listen to those often unheard in the conversation around broadening participation in computer science, in order to critically examine efforts and initiatives that impact engagement. This talk will describe the journey towards this objective and preliminary results. The outcomes of this work have the potential to reshape the community’s perceptions of what and who are computer scientists as well as crack the code to diversifying this lucrative and impactful discipline.

Biography: Monique Ross earned a doctoral degree in Engineering Education from Purdue University. She has a Bachelor’s degree in Computer Engineering from Elizabethtown College, a Master’s degree in Computer Science and Software Engineering from Auburn University, eleven years of experience in the industry as a software engineer, and five years as a full-time faculty in the departments of computer science and engineering. Her interests focus on broadening participation in computer science through the exploration of: 1) race, gender, and identity; 2) discipline-based education research (with a focus on computer science courses) in order to better inform pedagogical practices that garner interest and retain women and minorities in computer-related fields. She is the PI on three National Science Foundation grants, one foundation grant, and co-PI on two large scale grants. Dr. Monique Ross is committed to the expansion of rigorous computer science education research at FIU and nationally.

https://www.cis.fiu.edu/faculty-staff/ross-monique/

**Join Zoom Meeting** Venue: https://emory.zoom.us/j/98352727203
Title: Deep Learning with Differential Privacy and Adversarial Robustness
Defense: Computer Science
Speaker: Pengfei Tang, Emory University
Contact: Dr. Li Xiong, lxiong@emory.edu
Date: 2021-11-11 at 3:00PM
Venue: https://us02web.zoom.us/j/7382282740?pwd=QVB4bmU2NnlZN2s1UW0veUtCNklmUT09
Abstract:
Deep learning models have been increasingly powerful on different tasks, such as image classification and data synthesization. However, there are two major vulnerabilities existing: 1) privacy leakage of the training data through inference attacks, and 2) adversarial examples that are crafted to trick the classifier to misclassify. Differential privacy (DP) is a popular technique to prevent privacy leakage, which offers a provable guarantee on privacy of training data through randomized mechanisms such as gradient perturbation. For attacks of adversarial examples, there are two categories of defense: empirical and theoretical approaches. Adversarial training is one of the most popular empirical approaches, which injects adversarial examples with correct labels to the training dataset and renders the model robust through optimization. Certified robustness is a representative of theoretical approaches, which offers a theoretical guarantee to defend against adversarial examples through randomized mechanisms such as input perturbation.

Title: Measurement and Analysis Methods of Performance Problems in Distributed Systems
Defense: Computer Science
Speaker: Lei Zhang, Emory University
Contact: Ymir Vigfusson, ymir@mathcs.emory.edu
Date: 2021-11-08 at 12:00PM
Venue: https://emory.zoom.us/j/94559953414
Abstract:
Today's distributed systems invest significant computational and storage resources to accommodate their large scale of data, but more resources does not automatically improve performance. To deliver high performance, new types of large-scale solutions, such as the cloud computing and microservices paradigms, follow the design of deploying loosely coupled components that perform but, in the process, making it harder to maintain a global view of system performance. The ensuing growing complexity of system architectures, diagnosing and understanding performance problems has become both critically important and highly challenging.

The aim of my thesis is to fill in some missing but significant parts towards monitoring and analyzing performance problems in distributed system, by asking the question: What is the performance bottleneck of distributed systems performance, and how should we improve it? First, my thesis proposes a novel retroactive tracing abstraction where full telemetry information about a distributed request can be retrieved back in time'' soon after a problem is detected without unduly burdening any node in the system, with an always-on distributed tracing system. Second, my thesis frames the challenges of data placement in modern memory hierarchies in a generalized paging model outside of traditional assumptions, and provides an offline data placement algorithm towards optimal placement decisions. Last, my thesis derives a rule-of-thumb expression for cache warmup times, specifically how long caches in storage systems and CDNs need to be warmed up before their performance is deemed to be stable.
Title: Fairness-Aware Predictive Modeling of Human Event Data
Seminar: Computer Science
Speaker: Dr. Mingxuan Sun, Louisiana State University
Contact: Joyce Ho, joyce.c.ho@emory.edu
Date: 2021-11-05 at 1:00PM
Venue: https://emory.zoom.us/j/98352727203
Abstract:
Large volumes of human event data, such as online TV viewing records, disaster rescue requests, and electronic records of hospital admissions, are becoming increasingly available in a wide variety of applications including social network analysis, smart cities, and healthcare analytics. Predictive modeling of those collective event sequences is beneficial for improving event response efficiency and promoting nationwide economic development. Although current machine learning algorithms can achieve significant event prediction accuracy, the historic data or the self-excitation property can introduce biased prediction. In this talk, we introduce a series of novel models and algorithms to analyze human events to balance between prediction accuracy and fairness. Specifically, we investigate point processes and deep learning methods to improve event prediction accuracy. Furthermore, we introduce a fairness metric that can efficiently evaluate the ranking fairness in event prediction and use the metric to penalize the event likelihood function and to strike a balance between accuracy and fair loss.

Biography: I am an Associate Professor in the Division of Computer Science and Engineering in the School of Electrical Engineering and Computer Science at Louisiana State University. I received my Ph.D. degree in Computer Science from the Georgia Institute of Technology in 2012. I received my Master's degree in Computer Science from the University of Kentucky in 2006 and my Bachelor's degree in Computer Science and Engineering from Zhejiang University, China in 2004. I was a Senior Scientist with the playlist recommendation group, Pandora Media, Inc. from 2012 to 2015.

http://csc.lsu.edu/~msun/

**Join Zoom Meeting** Venue: https://emory.zoom.us/j/98352727203
Title: Implicit User-Generated Content in the service of Public Health
Seminar: Computer Science
Speaker: Dr. Evgeniy Gabrilovich, Google Health
Contact: Eugene Agichtein, eugene.agichtein@emory.edu
Date: 2021-10-29 at 1:00PM
Venue: https://emory.zoom.us/j/98352727203
Abstract:
Abstract: Every day millions of people use online products and services to satisfy their information needs. In the process of doing so, they produce large volumes of user-generated content (UGC). In this talk, we will distinguish between "explicit" UGC, which is intended to be made public (such as product ratings or reviews), and "implicit" UGC, which can be responsibly anonymized and aggregated in a privacy-preserving way to improve public health. We will analyze implicit UGC as a positive consumption externality, and will discuss its beneficial uses across a range of public health applications.

The bulk of this talk will focus on methods for aggregating and classifying the data to provide timely signals that help guide public health interventions and assess their efficacy. We will discuss applications such as estimating disease incidence, outbreak prediction, mitigating pandemic spread, and improving public health messaging.

Biography:

Dr. Evgeniy Gabrilovich is a research director at Google Health where he leads the Public & Environmental Health team. Prior to joining Google in 2012, he was a director of research and head of the natural language processing and information retrieval group at Yahoo! Research. Evgeniy is an IEEE Fellow and ACM Distinguished Scientist. He is a recipient of the 2014 IJCAI-JAIR Best Paper Prize and the 2010 Karen Sparck Jones Award for his contributions to natural language processing and information retrieval. Evgeniy has served as a technical program chair for WSDM 2021, WWW 2017, and WSDM 2015. He earned his PhD in computer science from the Technion - Israel Institute of Technology. He also graduated (with extra credit) from the Executive MD training program at Harvard Medical School.

**Join Zoom Meeting**

Venue: https://emory.zoom.us/j/98352727203
Title: Data Collection for Data-Centric AI
Seminar: Computer Science
Speaker: Dr. Fatemeh Nargesian, University of Rochester
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-10-22 at 1:00PM
Venue: https://emory.zoom.us/j/98352727203
Abstract:
Abstract: The holy grail of data-centric AI is to collect high-quality labeled data sets for the purpose of training ML models. Data collection has become an active area of research in the data management community due to the importance of handling large amounts of training data. This talk will examine the data collection techniques that can be used to discover, augment, or generate datasets from existing data lakes. I will also cover data tailoring that is to ensure that the collected data set for analysis has an appropriate representation of relevant (demographic) groups: it meets desired distribution requirements. I will conclude by introducing some of the interesting research challenges that remain in the data collection landscape.

Biography: Fatemeh Nargesian is an assistant professor in the Department of Computer Science, at the University of Rochester. She got her PhD at the University of Toronto and was a research intern at IBM Watson. Before the University of Toronto, she worked at Clinical Health and Informatics Group at McGill University. Her primary research interests are in data intelligence focused ondata discovery, data, integration, and data for ML.

**Join Zoom Meeting** Venue: https://emory.zoom.us/j/98352727203