Distinguished Lecture Series: Been Kim (Google DeepMind)- Alignment and interpretability: how we might get it right
Abstract: The main goal of interpretability is to enable communication between humans and machines, whether it’s a value, knowledge, or an objective. In this talk, I argue that a better way to enable this communication is for humans to expand what they know and learn new things. Doing so enables us to also expand what machines know—by building better-aligned machines. I share why considering the representational gap is crucial in solving the alignment problem, and I provide an example of bridging the knowledge gap.
Speakers
Been Kim
Been Kim is a senior staff research scientist at Google DeepMind. Her research focuses on helping humans to communicate with complex machine learning models: 1) building tools to aid human’s collaboration with machines (and detect when those tools fail) 2) study machines’ general nature and 3) leveraging machines’ knowledge to benefit humans. She gave a talk at the G20 meeting in Argentina in 2019 and a keynote at ICLR 2022 and ECML 2020. Her work TCAV received UNESCO Netexplo award, was featured at Google I/O 19′. Her work is in a chapter of Brian Christian’s book on “The Alignment Problem”. She is the General chair at ICLR2024, was a Senior Program Chair at ICLR 2023 and advisory board at TRAILS. She has been a senior area chair at NeurIPS, ICML, ICLR, AISTATS and others for the past few years. She is a steering committee member of FAccT conference and SATML. She received her PhD. from MIT.