Florian Tramer, Gautam Kamath and Nicholas Carlini won an International Conference on Machine Learning (ICML2024) best paper award for their work Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining. In this interview, Gautam summarises some of the key achievements of the paper.
Differential privacy is a rigorous and provable notion of data privacy. Among other things, training a machine learning model with differential privacy can prevent it from spitting out its training data. The issue is that training a model with differential privacy generally comes at a significant hit to a model’s utility. By incorporating “public data” (i.e., data that is not subject to privacy constraints) into the training procedure, it can help alleviate this concern and increase the resulting model’s utility.
We challenge the paradigm of pretraining models with public data, and then privately fine-tuning the weights with sensitive data. We question whether such a model ought to be considered privacy-preserving and further speculate about whether such a model is useful for downstream privacy-sensitive tasks.
The three points we raise are the following:
The broad challenge in private machine learning is how to train high-utility models while preserving privacy of the training data, with respect to meaningful privacy semantics.
To address some of the specific issues we identified above, we give a few broad suggestions, though our position paper is focused primarily on identifying and highlighting these problems for the community, rather than resolving them. We suggest that model curators go beyond a naïve dichotomy of treating data as either “public” or “private.” Such a dichotomy is misaligned with individual expectations of privacy norms. Another suggestion is that privacy researchers evaluate their techniques on datasets and settings that may more closely resemble those pertinent to privacy-sensitive applications. Finally, we suggest the community focus on a more holistic view of privacy. Rather than focusing primarily on the (important) task of training models with differential privacy, it is important to go further and connect that with real-world privacy norms and considerations.
Gautam Kamath is an Assistant Professor at the David R. Cheriton School of Computer Science at the University of Waterloo, and a Canada CIFAR AI Chair and Faculty Member at the Vector Institute. He has a B.S. in Computer Science and Electrical and Computer Engineering from Cornell University, and an M.S. and Ph.D. in Computer Science from the Massachusetts Institute of Technology. He is interested in reliable and trustworthy statistics and machine learning, including considerations such as data privacy and robustness. He was a Microsoft Research Fellow, as a part of the Simons-Berkeley Research Fellowship Program at the Simons Institute for the Theory of Computing. He serves as an Editor in Chief of Transactions on Machine Learning Research, and is the program committee co-chair of the 36th International Conference on Algorithmic Learning Theory (ALT 2025). He is the recipient of several awards, including the Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies, a best paper award at the Forty-first International Conference on Machine Learning (ICML 2024), and the Faculty of Math Golden Jubilee Research Excellence Award. |