What’s hot on arXiv? Here are the most tweeted papers that were uploaded onto arXiv during February 2021.
Results are powered by Arxiv Sanity Preserver.
How to represent part-whole hierarchies in a neural network
Submitted to arXiv on: 25 February 2021
Abstract: This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language.
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan
Submitted to arXiv on: 11 February 2021
Abstract: Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. Our code is available at this https URL.
DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, Guillaume Lample
Submitted to arXiv on: 15 February 2021
Abstract: Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-training objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code. We show that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks, providing relative improvements of up to 13% in unsupervised code translation, and 24% in natural language code search. Incidentally, we found that our pre-trained model is able to de-obfuscate fully obfuscated source files, and to suggest descriptive variable names.
Learning to identify image manipulations in scientific publications
Ghazal Mazaheri, Kevin Urrutia Avila, Amit K. Roy-Chowdhury
Submitted to arXiv on: 3 February 2021
Abstract: Adherence to scientific community standards ensures objectivity, clarity, reproducibility, and helps prevent bias, fabrication, falsification, and plagiarism. To help scientific integrity officers and journal/publisher reviewers monitor if researchers stick with these standards, it is important to have a solid procedure to detect duplication as one of the most frequent types of manipulation in scientific papers. Images in scientific papers are used to support the experimental description and the discussion of the findings. Therefore, in this work we focus on detecting the duplications in images as one of the most important parts of a scientific paper. We propose a framework that combines image processing and deep learning methods to classify images in the articles as duplicated or unduplicated ones. We show that our method leads to a 90% accuracy rate of detecting duplicated images, a ~ 13% improvement in detection accuracy in comparison to other manipulation detection methods. We also show how effective the pre-processing steps are by comparing our method to other state-of-art manipulation detectors which lack these steps.
Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
Submitted to arXiv on: 24 February 2021
Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
Neural Data Augmentation via Example Extrapolation
Kenton Lee, Kelvin Guu, Luheng He, Tim Dozat, Hyung Won Chung
Submitted to arXiv on: 2 February 2021
Abstract: In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such “few-shot” cases at test time. A common remedy is to perform data augmentation, such as by duplicating underrepresented examples, or heuristically synthesizing new examples. But these remedies often fail to cover the full diversity and complexity of real examples. We propose a data augmentation approach that performs neural Example Extrapolation (Ex2). Given a handful of exemplars sampled from some distribution, Ex2 synthesizes new examples that also belong to the same distribution. The Ex2 model is learned by simulating the example generation procedure on data-rich slices of the data, and it is applied to underrepresented, few-shot slices. We apply Ex2 to a range of language understanding tasks and significantly improve over state-of-the-art methods on multiple few-shot learning benchmarks, including for relation extraction (FewRel) and intent classification + slot filling (SNIPS).
Abstraction and Analogy-Making in Artificial Intelligence
Submitted to arXiv on: 22 February 2021
Abstract: Conceptual abstraction and analogy-making are key abilities underlying humans’ abilities to learn, reason, and robustly adapt their knowledge to new domains. Despite of a long history of research on constructing AI systems with these abilities, no current AI system is anywhere close to a capability of forming humanlike abstractions or analogies. This paper reviews the advantages and limitations of several approaches toward this goal, including symbolic methods, deep learning, and probabilistic program induction. The paper concludes with several proposals for designing challenge tasks and evaluation measures in order to make quantifiable and generalizable progress in this area.
TransGAN: Two Transformers Can Make One Strong GAN
Yifan Jiang, Shiyu Chang, Zhangyang Wang
Submitted to arXiv on: 14 February 2021
Abstract: The recent explosive interest on transformers has suggested their potential to become powerful “universal” models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go – are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? Driven by that curiosity, we conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets new state-of-the-art IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN. The code is available at this https URL.
Self-Supervised Learning of Graph Neural Networks: A Unified Review
Yaochen Xie, Zhao Xu, Zhengyang Wang, Shuiwang Ji
Submitted to arXiv on: 22 February 2021
Abstract: Deep models trained in supervised mode have achieved remarkable success on a variety of tasks. When labeled samples are limited, self-supervised learning (SSL) is emerging as a new paradigm for making use of large amounts of unlabeled samples. SSL has achieved promising performance on natural language and image learning tasks. Recently, there is a trend to extend such success to graph data using graph neural networks (GNNs). In this survey, we provide a unified review of different ways of training GNNs using SSL. Specifically, we categorize SSL methods into contrastive and predictive models. In either category, we provide a unified framework for methods as well as how these methods differ in each component under the framework. Our unified treatment of SSL methods for GNNs sheds light on the similarities and differences of various methods, setting the stage for developing new methods and algorithms. We also summarize different SSL settings and the corresponding datasets used in each setting. To facilitate methodological development and empirical comparison, we develop a standardized testbed for SSL in GNNs, including implementations of common baseline methods, datasets, and evaluation metrics.