Hot papers on arXiv from the past month – June 2020
What’s hot on arXiv? Here are the most tweeted papers that were uploaded onto arXiv during June 2020.
Results are powered by Arxiv Sanity Preserver.
End-to-End Adversarial Text-to-Speech
Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan
Submitted to arXiv on: 5 June 2020
Abstract: Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audio outputs. Our proposed generator is feed-forward and thus efficient for both training and inference, using a differentiable monotonic interpolation scheme to predict the duration of each input token. It learns to produce high fidelity audio through a combination of adversarial feedback and prediction losses constraining the generated audio to roughly match the ground truth in terms of its total duration and mel-spectrogram. To allow the model to capture temporal variation in the generated audio, we employ soft dynamic time warping in the spectrogram-based prediction loss. The resulting model achieves a mean opinion score exceeding 4 on a 5 point scale, which is comparable to the state-of-the-art models relying on multi-stage training and additional supervision..
DivNoising: Diversity Denoising with Fully Convolutional Variational Autoencoders
Mangal Prakash, Alexander Krull, Florian Jug
Submitted to arXiv on: 10 June 2020
Abstract: Deep Learning based methods have emerged as the indisputable leaders for virtually all image restoration tasks. Especially in the domain of microscopy images, various content-aware image restoration (CARE) approaches are now used to improve the interpretability of acquired data. But there are limitations to what can be restored in corrupted images, and any given method needs to make a sensible compromise between many possible clean signals when predicting a restored image. Here, we propose DivNoising — a denoising approach based on fully-convolutional variational autoencoders, overcoming this problem by predicting a whole distribution of denoised images. Our method is unsupervised, requiring only noisy images and a description of the imaging noise, which can be measured or bootstrapped from noisy data. If desired, consensus predictions can be inferred from a set of DivNoising predictions, leading to competitive results with other unsupervised methods and, on occasion, even with the supervised state-of-the-art. DivNoising samples from the posterior enable a plethora of useful applications. We are (i) discussing how optical character recognition (OCR) applications could benefit from diverse predictions on ambiguous data, and (ii) show in detail how instance cell segmentation gains performance when using diverse DivNoising predictions.
Are we done with ImageNet?
Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xiaohua Zhai, Aäron van den Oord
Submitted to arXiv on: 12 June 2020
Abstract: Yes, and no. We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore develop a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels. Furthermore, we find the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Nevertheless, we find our annotation procedure to have largely remedied the errors in the original labels, reinforcing ImageNet as a powerful benchmark for future research in visual recognition.
Unsupervised Translation of Programming Languages
Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample
Submitted to arXiv on: 5 June 2020
Abstract: A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.
Differentiable Rendering: A Survey
Hiroharu Kato, Deniz Beker, Mihai Morariu, Takahiro Ando, Toru Matsuoka, Wadim Kehl, Adrien Gaidon
Submitted to arXiv on: 22 June 2020
Abstract: Deep neural networks (DNNs) have shown remarkable performance improvements on vision-related tasks such as object detection or image segmentation. Despite their success, they generally lack the understanding of 3D objects which form the image, as it is not always possible to collect 3D information about the scene or to easily annotate it. Differentiable rendering is a novel field which allows the gradients of 3D objects to be calculated and propagated through images. It also reduces the requirement of 3D data collection and annotation, while enabling higher success rate in various applications. This paper reviews existing literature and discusses the current state of differentiable rendering, its applications and open research problems.
Revisiting Few-sample BERT Fine-tuning
Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, Yoav Artzi
Submitted to arXiv on: 10 June 2020
Abstract: We study the problem of few-sample fine-tuning of BERT contextual representations, and identify three sub-optimal choices in current, broadly adopted practices. First, we observe that the omission of the gradient bias correction in the \bertadam optimizer results in fine-tuning instability. We also find that parts of the BERT network provide a detrimental starting point for fine-tuning, and simply re-initializing these layers speeds up learning and improves performance. Finally, we study the effect of training time, and observe that commonly used recipes often do not allocate sufficient time for training. In light of these findings, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe a decrease in their relative impact when modifying the fine-tuning process based on our findings.
The Lipschitz Constant of Self-Attention
Hyunjik Kim, George Papamakarios, Andriy Mnih
Submitted to arXiv on: 8 June 2020
Abstract: Lipschitz constants of neural networks have been explored in various contexts in deep learning, such as provable adversarial robustness, estimating Wasserstein distance, stabilising training of GANs, and formulating invertible neural networks. Such works have focused on bounding the Lipschitz constant of fully connected or convolutional networks, composed of linear maps and pointwise non-linearities. In this paper, we investigate the Lipschitz constant of self-attention, a non-linear neural network module widely used in sequence modelling. We prove that the standard dot-product self-attention is not Lipschitz, and propose an alternative L2 self-attention that is Lipschitz. We derive an upper bound on the Lipschitz constant of L2 self-attention and provide empirical evidence for its asymptotic tightness. To demonstrate the practical relevance of the theory, we formulate invertible self-attention and use it in a Transformer-based architecture for a character-level language modelling task.
Cascaded Text Generation with Markov Transformers
Yuntian Deng, Alexander M. Rush
Submitted to arXiv on: 1 June 2020
Abstract: The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
Is deep learning necessary for simple classification tasks?
Joseph D. Romano, Trang T. Le, Weixuan Fu, Jason H. Moore
Submitted to arXiv on: 11 June 2020
Abstract: Automated machine learning (AutoML) and deep learning (DL) are two cutting-edge paradigms used to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists for when to choose one approach over the other in the context of specific real-world problems. Furthermore, relatively few tools exist that allow the integration of both AutoML and DL in the same analysis to yield results combining both of their strengths. Here, we seek to address both of these issues, by (1.) providing a head-to-head comparison of AutoML and DL in the context of binary classification on 6 well-characterized public datasets, and (2.) evaluating a new tool for genetic programming-based AutoML that incorporates deep estimators. Our observations suggest that AutoML outperforms simple DL classifiers when trained on similar datasets for binary classification but integrating DL into AutoML improves classification performance even further. However, the substantial time needed to train AutoML+DL pipelines will likely outweigh performance advantages in many applications.
Discovering Symbolic Models from Deep Learning with Inductive Biases
Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, Shirley Ho
Submitted to arXiv on: 19 June 2020
Abstract: We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical relations. We find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures. The symbolic expressions extracted from the GNN using our technique also generalized to out-of-distribution data better than the GNN itself. Our approach offers alternative directions for interpreting neural networks and discovering novel physical principles from the representations they learn.