Shawn Im

Hello! I am a PhD student at UW-Madison working with Prof. Sharon Li. I am thankful to be supported by the NSF GRFP. I also had the chance to intern at Apple MLR last summer. Previously, I was an undergraduate at MIT in mathematics and computer science. My interest is in developing a reliable understanding of the behavior of machine learning models primarily to ensure they are beneficial. Not only is this understanding crucial for safe and beneficial models, but also, as models grow to understand more about the world and human values or preferences, a strong understanding of their behavior has the potential to lead to new insights of the world and ourselves. Currently, I am working on

Interpretability
Theory of ML
AI Safety

Email: shawnim@cs.wisc.edu

Recent Work

How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
Shawn Im, Changdae Oh, Zhen Fang, Yixuan Li
International Conference on Learning Representations (ICLR) Oral, 2026

[Paper][Code]

Research

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Shawn Im, Yixuan Li
Neural Information Processing Systems (NeurIPS), 2025

[Paper][Code]

Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
James Oldfield, Shawn Im, Yixuan Li, Mihalis A. Nicolaou, Ioannis Patras, Grigorios G Chrysos
Neural Information Processing Systems (NeurIPS), 2025

[Paper][Code]

Visual Instruction Bottleneck Tuning
Changdae Oh, Jiatong Li, Shawn Im, Yixuan Li
Neural Information Processing Systems (NeurIPS), 2025

[Paper]

Position: Challenges and Future Directions of Data-Centric AI Alignment
Min-Hsuan Yeh, Jeffrey Wang, Xuefeng Du, Seongheon Park, Leitian Tao, Shawn Im, Yixuan Li
In Proceedings of International Conference on Machine Learning (ICML), 2025

[Paper]

Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li
In Proceedings of International Conference on Machine Learning (ICML), 2025

[Paper]

A Unified Understanding and Evaluation of Steering Methods
Shawn Im, Yixuan Li
Preprint, 2025

[Paper]

Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im, Yixuan Li
In Proceedings of International Conference on Machine Learning (ICML), 2024

[Paper] [Code]

Evaluating the Utility of Model Explanations for Model Development
Shawn Im, Jacob Andreas, Yilun Zhou
NeurIPS Workshop on Attributing Model Behavior at Scale (ATTRIB), 2023

[Paper]