The NeurIPS outstanding paper awards highlight some of the most notable papers at the conference. Find out who won the awards and read summaries of their work below.

**Distribution-Independent PAC Learning of Halfspaces with Massart Noise**

Ilias Diakonikolas, Themis Gouleakis and Christos Tzamos

The authors solve a long-standing foundational problem in theoretical machine learning. The problem is to learn a linear boundary between positive and negative examples in a high-dimensional space. They showed how to create a learning algorithm that can find the boundary with guaranteed computational efficiency even when a fraction of the examples are mislabeled as positive when they are actually negative and a possibly different fraction of the examples are mislabeled as negative when they are actually positive. The algorithm also provides a guarantee that the discovered boundary is accurate. The authors go on to argue that it is unlikely that any efficiently computable approach can achieve a better accuracy guarantee. Since many types of machine-learning algorithms are extensions of this boundary-identification problem, resolving this open question is significant and may provide techniques for solving related problems.

**Uniform convergence may be unable to explain generalization in deep learning**

Vaishnavh Nagarajan and J. Zico Kolter

The success of deep learning algorithms has raised important new questions about the foundations of machine learning. Deep networks are typically over parameterized, meaning that they have considerably more complexity than is needed to solve the problem at hand. The authors examine the mathematical tools people have tried to use to explain why over parameterized deep networks perform so well when decades of analysis suggest that over parameterized networks should fall prey to overfitting their training data and thereby failing to generalize. The authors show that these tools, when applied to deep networks, are not going to be able to provide a basis for understanding their benefits. The important message of this paper is that researchers will have to develop new mathematical methods to explain deep networks’ mysterious resistance to overfitting and consequent practical power.

**Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses**

Ananya Uppal, Shashank Singh and Barnabás Póczos

An essential machine-learning problem is capturing the statistical relationships among the features in a collection of data. The authors study a family of estimation methods for this problem that includes generative adversarial networks (GANs), an approach that has revolutionized the production of novel photographic quality images. They provide mathematical bounds that precisely relate the loss function used to measure the approximation quality of estimates to assumptions on the generative process that created the data. They use these bounds to determine how much data is needed before an accurate estimate can be made, and provide essential insights into why GANs perform so well compared to classical methods. These insights could lead to a better understanding of deep methods and new learning approaches for characterizing complex data sources.

**Fast and Accurate Least-Mean-Squares Solvers**

Alaa Maalouf, Ibrahim Jubran and Dan Feldman

Least-mean squares solving is a core subproblem in many areas of machine learning and statistics. The authors provide a new approach to solving these problems quickly and accurately. They make use of a clever combination of a geometry result from 1907 and modern computer science data structures for fast summarizations of large dynamic data sets. Their divide-and-conquer algorithm is a theoretical advance for solving the problem of summarizing high dimensional points as convex combinations of a small subset of those points and it also suggests a path to improving other related algorithms. The authors report numerical experiments on large data sets showing up to a 100-fold speed up. They also promise open access to their source code so others will be able to benefit directly from this advance. This work will likely produce faster machine-learning tools for machine-learning practitioners.

**Putting An End to End-to-End: Gradient-Isolated Learning of Representations**

Sindy Löwe, Peter O’Connor and Bastiaan S. Veeling

The authors provide a neural network training procedure that could be an attractive alternative to standard deep “end-to-end” learning approaches. In place of training a single giant network with a global optimization objective fed with supervised training examples, the authors show the modules in their network can train themselves based on exposure to input data without requiring access to supervised training examples. Once the modules learn to retain as much information as possible from their inputs, the authors show that the resulting representation can trivially be used to classify speech and image data with a minimum of supervised examples. Their algorithm is efficient to train and is more plausible as a model of learning in natural systems, which also learn in a more piecemeal distributed fashion. A practical benefit of this work is that large neural networks can be trained in distinct modules, allowing much larger networks to be created using existing computer hardware.

**Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations**

Vincent Sitzmann, Michael Zollhöfer and Gordon Wetzstein

To solve the problem of 3d-scene understanding, a machine needs to translate its perceptual inputs to a detailed description of what is in the scene and where those things are. The authors present an approach to this problem in which the scene itself is represented internally as a kind of neural network that maps position to appearance and an associated differentiable neural “renderer” that turns objects into images and vice versa. They show that their method can be trained using 2d images along with knowledge of the position of the camera that took the image. The learned representation can then be used to generate close ups of the object, images of the object taken from new angles, and informed guesses as to an object’s 3d shape from a single image. The work represents a valuable link between the fields of computer graphics and of machine-learning-based perception and could lead to better tools for capturing and processing information about real world objects.

**Dual Averaging Method for Regularized Stochastic Learning and Online Optimization**

Lin Xiao

The author combined two powerful ideas in machine-learning — regularization, a cost that encourages learning algorithms to find simpler models when possible — and online learning, a setting where learning takes place incrementally as new examples are presented. A key influence of some kinds of regularization — not possible in prior work on online learning — is to make models “sparse” by setting many weights in the network to zero. Sparse models can be more efficient to apply in practice and are often easier to understand and explain. The paper was quite influential in the highly mathematical machine-learning optimization community, garnering over 600 citations to date. It has provided a jumping off point for a variety of new algorithms being applied broadly within the community.

Featured image credit: Photo by Giorgio Trovato on Unsplash

Michael Littman
is a Professor and co-director of the Humanity Centered Robotics Initiative, Brown University.

A list of free-to-attend AI-related seminars that are scheduled to take place between 11 May and 30 June 2021.

11 May 2021, by
Lucy Smith

Find out how Cambridge researchers are using deep-learning to assist pathologists.

10 May 2021, by
University of Cambridge

We propose a method for using offline data to build a prediction model that only requires access to the available subset of confounders at prediction time.

07 May 2021, by
ML@CMU