The information bottleneck principle, a mathematical formulation of Occam's Razor, aim to create latent representations that are sufficient for a task and maximally compressed – a minimal sufficient statistic. In this talk, we first critically reflect on the application of the information bottleneck principle in deep learning, addressing the question whether and how compression can be connected to generalization performance. We discuss theoretical, experimental, and engineering evidence in the shape of non-vacuous generalization bounds, information plane analyses, and neural classifiers successfully trained using the information bottleneck principle. Taken together, these three perspectives suggest that compressed representations help improving generalization and robustness.
In the second, shorter part of the talk, we argue that (variational) approaches used to implement the intractable information bottleneck objective can also be successfully used to implement other information-theoretic objectives. We concretize this with the example of invariant representation learning for fair classification. We show that the resulting method has interesting and desirable properties, suggesting that information-theoretic objectives can be useful ingredients for deep learning.
REGISTER HERE |
Bernhard C. Geiger (Know-Center GmbH, Graz, Austria)
Bernhard C. Geiger received the Dipl.-Ing. degree in Electrical Engineering (with distinction), the Dr. techn. degree in Electrical and Information Engineering (with distinction), and the venia docendi in Theoretical Information Engineering from Graz University of Technology, Austria, in 2009, 2014, and 2023, respectively. In 2010, he joined the Signal Processing and Speech Communication Laboratory, Graz University of Technology, as a Research and Teaching Associate. He was a Senior Scientist and Erwin Schrodinger Fellow at the Institute for Communications Engineering, Technical University of Munich, Germany from 2014 to 2017. He is currently a Key Researcher at Know-Center GmbH, Graz, Austria, where he leads the research area on Methods & Algorithms for Artificial Intelligence. His research interests cover information theory for signal processing and machine learning, theory-assisted machine learning, and information-theoretic model reduction for Markov chains and hidden Markov models.
*********
All the recordings of past AI-Cafés are available on this YouTube channel.
*********