As AI and LLMs increasingly power sensitive domains such as healthcare and finance, safeguarding user data has become a central challenge. This tutorial explores the evolving landscape of privacy risks and protection strategies in the age of AI and LLMs, with particular attention to the Big Data characteristics that make these systems both powerful and vulnerable: longitudinality (data collected and linked over time), and multimodality (structured records, text, images).
The tutorial introduces key categories of privacy attacks—including membership inference, attribute inference, and data extraction—followed by case studies for healthcare data. It then discusses defenses across the AI lifecycle, from data synthesization and differentially private or privacy-enhanced training to post-training methods such as machine unlearning. Finally, we discuss open challenges arising from longitudinal and multimodal data, as well as broader privacy risks that extend beyond training data in the expanding LLM ecosystem.