Jiachen Wang (Princeton)- Fueling Responsible AI with Data Attribution

Date & Time:

February 25, 2025 2:00 pm – 3:00 pm

Location:

JCL 390

02/25/2025 02:00 PM 02/25/2025 03:00 PM America/Chicago Jiachen Wang (Princeton)- Fueling Responsible AI with Data Attribution JCL 390

Abstract: Understanding how training data shapes model behavior is fundamental to building trustworthy AI systems. Data attribution techniques quantify the influence of individual training examples on machine learning models, providing key insights for developing data-centric algorithms (e.g., data curation) as well as addressing data-related challenges (e.g., privacy, safety, and copyright protection).

In this talk, I will present our recent advances in the foundations and practical frameworks of data attribution. First, I will introduce a general, game-theoretic data attribution framework that optimizes for stochastic learning algorithms. I will then discuss how we can efficiently conduct data attribution in the challenging setting of large-scale deep learning models (e.g., large language models). These techniques guide data quality management, explain model predictions, and boost trustworthy AI development from a data-centric perspective.

Speakers

Jiachen Wang

PhD Candidate, Princeton University

Jiachen (“Tianhao”) is a Ph.D. student at Princeton University, advised by Prof. Prateek Mittal. His research focuses on developing theoretical foundations and practical tools for trustworthy machine learning from a data-centric perspective. Most recently, he has been developing scalable, theoretically grounded data attribution and curation techniques for foundation models. His contributions have been recognized through multiple fellowships and oral/spotlight presentations at top AI/ML venues. He was selected as a Rising Star in Data Science in 2024.

Resources

Community

Finding the “Goldilocks” Solution to a Classic Math Problem: A Breakthrough in Numerical Integration

Ten Years of MSCAPP: Where Public Policy Meets Coding

Moderation at the Crossroads: How Generative AI Platforms Manage Creativity and Content Safety

The Future of AI Panel: Alumni Weekend

Can we authenticate human creativity?

AI and the Future of Work Panel: Featuring Nick Feamster

Speakers

Jiachen Wang

Finding the “Goldilocks” Solution to a Classic Math Problem: A Breakthrough in Numerical Integration

Ten Years of MSCAPP: Where Public Policy Meets Coding

Moderation at the Crossroads: How Generative AI Platforms Manage Creativity and Content Safety

Can a Doctor’s Notes Reveal When They’re Tired? New Research Illuminates the Hidden Signals of Physician Fatigue—And Raises Questions About AI in Healthcare

2025 Midwest Machine Learning Symposium Demonstrates Regional Excellence

PhD Candidate Bogdan Stoica Receives Distinguished Artifact Evaluator Award for Championing Reproducibility in Computer Science

Report from GlobusWorld 2025: Going Beyond Data

University of Chicago PhD Graduates Secure Tenure-Track Faculty Positions Amid a Competitive Job Market

Democratizing Digital Graphics: An Undergrad’s Unlikely Path To Putting Agency of 3D-Generation in Users’ Hands

Faculty Spotlight: Get to Know Kexin Pei

David Cash Receives 2025 Quantrell Award for Undergraduate Teaching

The Future of AI Panel: Alumni Weekend