All Seminars

Title: Robust Crowdsourcing and Federated Learning under Poisoning Attacks
Defense: Computer Science
Speaker: Farnaz Tahmasebian, Emory University
Contact: Dr. Li Xiong, lxiong@emory.edu
Date: 2021-03-30 at 1:00PM
Venue: https://zoom.us/j/9828106847
  Download Flyer  Add to Calendar
Abstract:
Crowd-based computing can be described in a way that distributes tasks among multiple individuals or organizations to interact with their intelligent or computing devices. Two of the exciting classes of crowd-based computing are crowdsourcing and federated learning, where the first one is crowd-based data collection, and the second one is crowd-based model learning. Crowdsourcing is a paradigm that provides a cost-effective solution for obtaining services or data from a large group of users. It has been increasingly used in modern society for data collection in various domains such as image annotation or real-time traffic reports. Although crowdsourcing is a cost-effective solution, it is an easy target to take advantage of by assembling great numbers of users to artificially boost support for organizations, products, or even opinions. Therefore, deciding to use the best aggregation method that tackles attacks in such applications is one of the main challenges in developing an effective crowdsourcing system. Moreover, the original aggregation algorithm in federated learning is susceptible to data poisoning attacks. Also, the dynamic behavior of this framework in terms of choosing clients randomly in each iteration poses further challenges for implementing the robust aggregating method in federated learning. In this dissertation, we devise strategies that improve the system’s robustness under data poisoning attacks when workers intentionally or strategically misbehave. https://zoom.us/j/9828106847
Title: COVID-19 Vaccine Design using Mathematical Linguistics
Seminar: Computer Science
Speaker: Dr. Liang Huang, Oregon State University
Contact: Jinho Choi, jinho.choi@emory.edu
Date: 2021-03-19 at 1:00PM
Venue: https://emory.zoom.us/j/92103915275
  Download Flyer  Add to Calendar
Abstract:
Abstract: To defeat the current COVID-19 pandemic, a messenger RNA (mRNA) vaccine has emerged as a promising approach thanks to its rapid and scalable production and non-infectious and non-integrating properties. However, designing an mRNA sequence to achieve high stability and protein yield remains a challenging problem due to the exponentially large search space (e.g., there are $2.4 \times 10^{632}$ possible mRNA sequence candidates for the spike protein of SARS-CoV-2). We describe two on-going efforts for this problem, both using linear-time algorithms inspired by my earlier work in natural language parsing. On one hand, the Eterna OpenVaccine project from Stanford Medical School takes a crowd-sourcing approach to let game players all over the world design stable sequences. To evaluate sequence stability (in terms of free energy), they use LinearFold from my group (2019) since it’s the only linear-time RNA folding algorithm available (which makes it the only one fast enough for COVID-scale genomes). On the other hand, we take a computational approach to directly search for the optimal sequence in this exponentially large space via dynamic programming. It turns out this problem can be reduced to a classical problem in formal language theory and computational linguistics (intersection between CFG and DFA), which can be solved in $O(n^3)$ time, just like lattice parsing for speech. In the end, we can design the optimal mRNA vaccine candidate for SARS-CoV-2 spike protein in just about 10 minutes. To conclude, classical results (dating back to 1960s) from theoretical computer science and mathematical linguistics helped us solve the very challenging and extremely important problem in fighting the COVID-19 pandemic. \\ Bio: Liang Huang (PhD, Penn, 2008) is an Associate Professor of Computer Science at Oregon State University and Distinguished Scientist at Baidu Research USA. He is a leading theoretical computational linguist, and was recognized at ACL 2008 (Best Paper Award) and ACL 2019 (Keynote Speech), but in recent years he has been more interested in applying his expertise in parsing, translation, and grammar formalisms to biology problems such as RNA folding and RNA design. Since the outbreak of COVID-19, he has shifted his attention to the fight against the virus, which resulted in efficient algorithms for stable mRNA vaccine design, adapted from classical theory and algorithms from mathematical linguistics dating back to the 1960s.
Title: Optimal Control Approaches for Designing Neural Ordinary Differential Equations
Defense: Computer Science
Speaker: Derek Onken, Emory University
Contact: Lars Ruthotto, lruthotto@emory.edu
Date: 2021-03-10 at 1:00PM
Venue: https://emory.zoom.us/j/98688786075?pwd=ampLTG4reEV3ak5nbEJZUVdwRnljQT09
  Download Flyer  Add to Calendar
Abstract:
Neural network design encompasses both model formulation and numerical treatment for optimization and parameter tuning. Recent research in formulation focuses on interpreting architectures as discretizations of continuous ordinary differential equations (ODEs). These neural ODEs in which the ODE dynamics are defined by neural network components, benefit from reduced parameterization and smoother hidden states than traditional discrete neural networks but come at high computational costs. Training a neural ODE can be phrased as an ODE-constrained optimization problem, which allows for the application of mathematical optimal control (OC). The application of OC theory leads to design choices that differ from popular high-cost implementations. We improve neural ODE numerical treatment and formulation for models used in time-series regression, image classification, continuous normalizing flows, and path-finding problems.
Title: Machine Translation for All
Seminar: Computer Science
Speaker: Huda Khayrallah, Johns Hopkins University
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-02-15 at 10:00AM
Venue: https://emory.zoom.us/j/92558356951
  Download Flyer  Add to Calendar
Abstract:
Machine translation uses machine learning to automatically translate text from one language to another and has the potential to reduce language barriers. Recent improvements in machine translation have made it more widely-usable, partly due to deep neural network approaches. However—like most deep learning algorithms—neural machine translation is sensitive to the quantity and quality of training data, and therefore produces poor translations for some languages and styles of text. Machine translation training data typically comes in the form of parallel text—sentences translated between the two languages of interest. Limited quantities of parallel text are available for most language pairs, leading to a low-resource problem. Even when training data is available in the desired language pair, it is frequently formal text—leading to a domain mismatch when models are used to translate a different type of data, such as social media or medical text. Neural machine translation currently performs poorly in low-resource and domain mismatch settings; my work aims to overcome these limitations, and make machine translation a useful tool for all users.

In this talk, I will discuss a method for improving translation in low resource settings—Simulated Multiple Reference Training (SMRT; Khayrallah et al., 2020)—which uses a paraphraser to simulate training on all possible translations per sentence. I will also discuss work on improving domain adaptation (Khayrallah et al., 2018), and work on analyzing the effect of noisy training data (Khayrallah and Koehn, 2018).
Title: Mining and Learning from Graph Processes
Seminar: Computer Science
Speaker: Arlei Lopes Da Silva, University of California, Santa Barbara
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-02-12 at 1:00PM
Venue: https://emory.zoom.us/j/93293219464
  Download Flyer  Add to Calendar
Abstract:
The digital transformation has given rise to a new form of science driven by data. Graphs (or networks) are a powerful framework for the solution of data science problems, especially when the goal is to extract knowledge from and make predictions about the dynamics of complex systems such as those arising from epidemiology, social media and infrastructure. However, this representation power comes at a cost, as graphs are highly combinatorial structures, leading to challenges in search, optimization, and learning tasks that are relevant to modern real-world applications.

In this talk, I will overview my recent work on new algorithms and models for mining and learning from graph data. First, I will show how the interplay between graph structure and its dynamics can be exploited for pattern mining and inference in networked processes, such as improving the effectiveness of testing during a pandemic. Then, I will focus on machine learning on graphs, where novel deep learning and optimization approaches for predicting graph data, such as traffic forecasting, will be described. As the last topic, I will introduce combinatorial algorithms for optimization on graphs that enable us to attack/defend their core structure, among other applications. I will end by briefly contextualizing my ongoing work as part of a broader research agenda with new related problems that I plan to address in the next few years.
Title: Addressing Biases for Robust, Generalizable AI
Seminar: Computer Science
Speaker: Swabha Swayamdipta, Allen Institute for AI
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-02-10 at 1:00PM
Venue: https://emory.zoom.us/j/95438087188
  Download Flyer  Add to Calendar
Abstract:
Artificial Intelligence has made unprecedented progress in the past decade. However, there still remains a large gap between the decision-making capabilities of humans and machines. In this talk, I will investigate two factors to explain why. First, I will discuss the presence of undesirable biases in datasets, which ultimately hurt generalization, regardless of dataset size. I will then present bias mitigation algorithms that boost the ability of AI models to generalize to unseen data. Second, I will explore task-specific prior knowledge which aid robust generalization, but are often ignored when training modern AI architectures on large amounts of data. In particular, I will show how linguistic structure can provide useful biases for inferring shallow semantics, which help in natural language understanding. I will conclude with a discussion of how this framework of dataset and model biases could play a critical role even in the societal impact of AI, going forward.
Title: Human-AI Systems for Making Videos Useful
Seminar: Computer Science
Speaker: Amy Pavel, Carnegie Mellon University
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-02-08 at 10:00AM
Venue: https://emory.zoom.us/j/99774155333
  Download Flyer  Add to Calendar
Abstract:
Video has become a primary medium for communication. Videos including explainers, how-to tutorials, lectures, and vlogs increasingly eclipse their text counterparts. While videos can be engaging to watch, it can be challenging to use videos when seeking information. First, the timeline-based interfaces we use for videos are difficult to skim and browse because they lack the structure of text. Second, the rich audio and visual content in videos can be inaccessible for people with disabilities.

What are the future systems will make videos useful for all users?

In this talk, I’ll share my work creating hybrid AI and interactive systems that leverage multiple mediums of communication (e.g., text, video, and audio) across two main research areas: 1) helping domain experts surface content of interest through interactive video abstractions, and 2) making videos non-visually accessible through interactions for video accessibility. First, I will share core challenges of video informed by interviews with domain experts. I will then share new interactive systems that leverage state-of-the-art AI/ML techniques, and evaluations demonstrating the efficacy of these systems. I will conclude with future research directions on how hybrid HCI-AI breakthroughs will improve digital communication, and how designing new interactions can help us to realize the full potential of AI/ML advances.
Title: Positive AI with Social Commonsense Models
Seminar: Computer Science
Speaker: Maarten Sap, University of Washington
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-02-05 at 1:00PM
Venue: https://emory.zoom.us/j/92294085195
  Download Flyer  Add to Calendar
Abstract:
To effectively understand language and safely communicate with humans, machines must not only grasp the surface meanings of texts, but also their underlying social meaning. This requires understanding interpersonal social commonsense, such as knowing to thank someone for giving you a present, as well as understanding harmful social biases and stereotypes. Failure to account for these social and power dynamics could cause models to produce redundant, rude, or even harmful outputs.

In this talk, I will describe my research on enabling machines to reason about social dynamics and social biases in text. I will first discuss ATOMIC, the first large-scale knowledge graph of social and interpersonal commonsense knowledge, with which machines can be taught to reason about the causes and effects of everyday events. Then, I will show how we can make machines understand and mitigate social biases in language, using Social Bias Frames, a new structured formalism for distilling biased implications of language, and PowerTransformer, a new unsupervised model for controllable debiasing of text.

I will conclude with future research directions on making NLP systems more socially-aware and equitable, and how to use language technologies for positive societal impact.
Title: What We Miss if We Don’t Talk to People: Understanding Users’ Diverse and Nuanced Privacy Needs
Seminar: Computer Science
Speaker: Camille Cobb, Carnegie Mellon University
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-02-03 at 10:00AM
Venue: https://emory.zoom.us/j/91087670904
  Download Flyer  Add to Calendar
Abstract:
In security and privacy research, we usually think about protecting against powerful adversaries who have substantial resources and strong technical abilities. Those types of threats are important to address, but are often not well-aligned with typical users’ privacy concerns. Instead, users frequently worry about information disclosure to their friends, family, coworkers, or employers; and they may face tradeoffs between their desires for privacy and other goals such as convenience, financial security, and personal connection. For example, despite the risks it can pose, people using online dating apps may decide to share personal, potentially sensitive information to increase their chances of finding a romantic partner.

will discuss my prior and ongoing work, which takes a human-centered approach to understanding and addressing security and privacy concerns that affect users on a daily basis. First, I explore how real users’ smart home devices may introduce risks --- including to stakeholders who had no choice in their installation or configuration (e.g., children, visitors, neighbors, or household employees such as babysitters). Next, I discuss how online status indicators -- a UI element that communicates when users are actively online -- can lead to interpersonal tensions or make users contort their behaviors to achieve a desired self-presentation. In each project I show that users have nuanced and diverse technology goals and risk profiles, and that existing technologies fail to sufficiently support users. I discuss potential solutions and outline future research directions.
Title: Trustworthy Machine Learning: On the Preservation of Individual Privacy and Fairness
Seminar: Computer Science
Speaker: Xueru Zhang, University of Michigan
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2021-02-01 at 10:00AM
Venue: https://emory.zoom.us/j/92280212733
  Download Flyer  Add to Calendar
Abstract:
Machine learning (ML) techniques have seen significant advances over the last decade and are playing an increasingly critical role in people's lives. While their potential societal benefits are enormous, they can also inflict great harm if not developed or used with care. In this talk, I will focus on two critical ethical issues in ML systems: fairness and privacy, and present mitigating solutions in various scenarios.

On the fairness front, although many fairness criteria have been proposed to measure and remedy biases in ML systems, their impact is often only studied in a static, one-shot setting. In the first part of my talk, I will present my work on evaluating the long-term impact of (fair) ML decisions on population groups that are repeatedly subject to such decisions. I will illustrate how imposing common fairness criteria intended to protect disadvantaged groups may lead to undesirable pernicious long-term consequences by exacerbating inequality. I will then discuss a number of potential mitigations.

On the privacy front, when ML models are trained over individuals’ personal data, it is critical to preserve their individual privacy while maintaining a sufficient level of model accuracy. In the second part of the talk, I will illustrate two key ideas that can be used to balance an algorithm’s privacy-accuracy tradeoff: (1) reuse intermediate results to reduce information leakage; and (2) improve algorithmic robustness to accommodate more randomness. I will present a randomized, privacy-preserving algorithm that leverages these ideas in the context of distributed learning. It is shown that our algorithm’s privacy-accuracy tradeoff can be improved significantly over existing algorithms.