about

resources

events

contribute

republishing

☰

ΑΙhub.org

Hot papers on arXiv from the past month – September 2020

by Lucy Smith

07 October 2020

share this:

What’s hot on arXiv? Here are the most tweeted papers that were uploaded onto arXiv during September 2020.

Results are powered by Arxiv Sanity Preserver.

The Hardware Lottery
Sara Hooker
Submitted to arXiv on: 14 September 2020

Abstract: Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly. This historical treatment is odd given that hardware and software have frequently determined which research ideas succeed (and fail). This essay introduces the term hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions. Examples from early computer science history illustrate how hardware lotteries can delay research progress by casting successful ideas as failures. These lessons are particularly salient given the advent of domain specialized hardware which make it increasingly costly to stray off of the beaten path of research ideas. This essay posits that the gains from progress in computing are likely to become even more uneven, with certain research directions moving into the fast-lane while progress on others is further obstructed.

336 tweets

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick, Hinrich Schütze
Submitted to arXiv on: 15 September 2020

Abstract: When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements. Based on our findings, we identify several key factors required for successful natural language understanding with small language models.

85 tweets

Brain2Word: Decoding Brain Activity for Language Generation
Nicolas Affolter, Beni Egressy, Damian Pascual, Roger Wattenhofer
Submitted to arXiv on: 10 September 2020

Abstract: Brain decoding, understood as the process of mapping brain activities to the stimuli that generated them, has been an active research area in the last years. In the case of language stimuli, recent studies have shown that it is possible to decode fMRI scans into an embedding of the word a subject is reading. However, such word embeddings are designed for natural language processing tasks rather than for brain decoding. Therefore, they limit our ability to recover the precise stimulus. In this work, we propose to directly classify an fMRI scan, mapping it to the corresponding word within a fixed vocabulary. Unlike existing work, we evaluate on scans from previously unseen subjects. We argue that this is a more realistic setup and we present a model that can decode fMRI data from unseen subjects. Our model achieves 5.22% Top-1 and 13.59% Top-5 accuracy in this challenging task, significantly outperforming all the considered competitive baselines. Furthermore, we use the decoded words to guide language generation with the GPT-2 model. This way, we advance the quest for a system that translates brain activities into coherent text.

83 tweets

Modern Methods for Text Generation
Dimas Munoz Montesinos
Submitted to arXiv on: 10 September 2020

Abstract: Synthetic text generation is challenging and has limited success. Recently, a new architecture, called Transformers, allow machine learning models to understand better sequential data, such as translation or summarization. BERT and GPT-2, using Transformers in their cores, have shown a great performance in tasks such as text classification, translation and NLI tasks. In this article, we analyse both algorithms and compare their output quality in text generation tasks.

77 tweets

Efficient Transformers: A Survey
Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
Submitted to arXiv on: 14 September 2020

Abstract: Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of “X-former” models have been proposed – Reformer, Linformer, Performer, Longformer, to name a few – which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains.

61 tweets

Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess
Submitted to arXiv on: 11 September 2020

Abstract: Recent work in deep reinforcement learning (RL) has produced algorithms capable of mastering challenging games such as Go, chess, or shogi. In these works the RL agent directly observes the natural state of the game and controls that state directly with its actions. However, when humans play such games, they do not just reason about the moves but also interact with their physical environment. They understand the state of the game by looking at the physical board in front of them and modify it by manipulating pieces using touch and fine-grained motor control. Mastering complicated physical systems with abstract goals is a central challenge for artificial intelligence, but it remains out of reach for existing RL algorithms. To encourage progress towards this goal we introduce a set of physically embedded planning problems and make them publicly available. We embed challenging symbolic tasks (Sokoban, tic-tac-toe, and Go) in a physics engine to produce a set of tasks that require perception, reasoning, and motor control over long time horizons. Although existing RL algorithms can tackle the symbolic versions of these tasks, we find that they struggle to master even the simplest of their physically embedded counterparts. As a first step towards characterizing the space of solution to these tasks, we introduce a strong baseline that uses a pre-trained expert game player to provide hints in the abstract space to an RL agent’s policy while training it on the full sensorimotor control task. The resulting agent solves many of the tasks, underlining the need for methods that bridge the gap between abstract planning and embodied control.

43 tweets

Generative Language Modeling for Automated Theorem Proving
Stanislas Polu, Ilya Sutskever
Submitted to arXiv on: 7 September 2020

Abstract: We explore the application of transformer-based language models to automated theorem proving. This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans — the generation of original mathematical terms — might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance. GPT-f found new short proofs that were accepted into the main Metamath library, which is to our knowledge, the first time a deep-learning based system has contributed proofs that were adopted by a formal mathematics community.

39 tweets

Layered Neural Rendering for Retiming People in Video
Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein
Submitted to arXiv on: 16 September 2020

Abstract: We present a method for retiming people in an ordinary, natural video—manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely “freezing” people), or “erase” selected people from the video altogether. We achieve these effects computationally via a dedicated learning-based layered video representation, where each frame in the video is decomposed into separate RGBA layers, representing the appearance of different people in the video. A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate—e.g., shadows, reflections, and motion of loose clothing. The layers can be individually retimed and recombined into a new video, allowing us to achieve realistic, high-quality renderings of retiming effects for real-world videos depicting complex actions and involving multiple individuals, including dancing, trampoline jumping, or group running.

36 tweets

Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics
Li Li, Stephan Hoyer, Ryan Pederson, Ruoxi Sun, Ekin D. Cubuk, Patrick Riley, Kieron Burke
Submitted to arXiv on: 17 September 2020

Abstract: Including prior knowledge is important for effective machine learning models in physics, and is usually achieved by explicitly adding loss terms or constraints on model architectures. Prior knowledge embedded in the physics computation itself rarely draws attention. We show that solving the Kohn-Sham equations when training neural networks for the exchange-correlation functional provides an implicit regularization that greatly improves generalization. Two separations suffice for learning the entire one-dimensional H2 dissociation curve within chemical accuracy, including the strongly correlated region. Our models also generalize to unseen types of molecules and overcome self-interaction error.

35 tweets

tags: arXiv

Lucy Smith is Senior Managing Editor for AIhub.

AIhub is supported by:

Emergence of fragility in LLM-based social networks: an interview with Francesco Bertolotti

Ella Scallan 08 Apr 2026

Francesco tells us how LLMs behave in the social network Moltbook, and what this reveals about network dynamics.

Scaling up multi-agent systems: an interview with Minghong Geng

Lucy Smith 07 Apr 2026

We sat down with Minghong in the latest of our interviews with the 2026 AAAI/SIGAI Doctoral Consortium participants.

Forthcoming machine learning and AI seminars: April 2026 edition

Lucy Smith 02 Apr 2026

A list of free-to-attend AI-related seminars that are scheduled to take place between 2 April and 31 May 2026.

#AAAI2026 invited talk: machine learning for particle physics

Lucy Smith 01 Apr 2026

How is ML used in the search for new particles at CERN?

monthly digest

Hot papers on arXiv from the past month – September 2020

Related posts :

Emergence of fragility in LLM-based social networks: an interview with Francesco Bertolotti

Scaling up multi-agent systems: an interview with Minghong Geng

Forthcoming machine learning and AI seminars: April 2026 edition

#AAAI2026 invited talk: machine learning for particle physics

AIhub monthly digest: March 2026 – time series, multiplicity, and the history of RoboCup

What I’ve learned from 25 years of automated science, and what the future holds: an interview with Ross King

A multi-armed robot for assisting with agricultural tasks

Resource-constrained image generation and visual understanding: an interview with Aniket Roy

↑