The AAAI-22 awards were announced on Thursday 24 February during the opening ceremony of the 36th AAAI Conference on Artificial Intelligence (AAAI2022). In addition to the awards for outstanding paper, distinguished papers, and best demonstration, a new award for outstanding student paper was introduced this year. The winners, and runners-up are as follows:
Online certification of preference-based fairness for personalized recommender systems
Virginie Do, Sam Corbett-Davies, Jamal Atif, Nicolas Usunier
Abstract: Recommender systems are facing scrutiny because of their growing impact on the opportunities we have access to. Current audits for fairness are limited to coarse-grained parity assessments at the level of sensitive groups. We propose to audit for envy-freeness, a more granular criterion aligned with individual preferences: every user should prefer their recommendations to those of other users. Since auditing for envy requires to estimate the preferences of users beyond their existing recommendations, we cast the audit as a new pure exploration problem in multi-armed bandits. We propose a sample-efficient algorithm with theoretical guarantees that it does not deteriorate user experience. We also study the trade-offs achieved on real-world recommendation datasets.
Read the full paper on arXiv.
Bayesian persuasion in sequential decision-making
Jiarui Gan, Rupak Majumdar, Goran Radanovic, Adish Singla
Abstract: We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal’s advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advice the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal’s point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal’s payoff as if playing against an agent who is far-sighted but myopic to future signals.
Read the full paper on arXiv.
Operator-potential heuristics for symbolic search
Daniel Fišer, Alvaro Torralba, Joerg Hoffmann
Abstract: Symbolic search, using Binary Decision Diagrams (BDDs) to represent sets of states, is a competitive approach to optimal planning. Yet heuristic search in this context remains challenging. The many advances on admissible planning heuristics are not directly applicable, as they evaluate one state at a time. Indeed, progress using heuristic functions in symbolic search has been limited and even very informed heuristics have been shown to be detrimental. Here we show how this connection can be made stronger for LP-based potential heuristics. Our key observation is that, for this family of heuristic functions, the change of heuristic value induced by each operator can be precomputed. This facilitates their smooth integration into symbolic search. Our experiments show that this can pay off significantly: we establish a new state of the art in optimal symbolic planning.
Read the full paper on here.
InfoLM: a new metric to evaluate summarization and Data2Text generation
Pierre Colombo, Chloé Clavel, Pablo Piantanida
Abstract: Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.
Read the full paper on arXiv.
Compilation of aggregates in ASP systems
Giuseppe Mazzotta, Francesco Ricca, Carmine Dodaro
Abstract: Answer Set Programming (ASP) is a well-known problem-solving formalism in computational logic. Nowadays, ASP is used in many real world scenarios thanks to ASP solvers. Standard evaluation of ASP programs suffers from an intrinsic limitation, knows as Grounding Bottleneck, due to the grounding of some rules that could fit all the available memory. As a result, there exist instances of real world problems that are untractable using the standard Ground and Solve approach. In order to tackle this problem, different strategies have been proposed. Among them we focus on a recent approach based on compilation of problematic constraints as propagators, which revealed to be very promising but is currently limited to constraints without aggregates. Since aggregates are widely used in ASP, in this paper we extend such an approach also to constraints containing aggregates. Good results, that proof the effectiveness of the proposed approach, have been reached.
Read the full paper on arXiv.
Entropy estimation via normalizing flow
Ziqiao Ao, Jinglai Li
Abstract: Entropy estimation is an important problem in information theory and statistical science. Many popular entropy estimators suffer from fast growing estimation bias with respect to dimensionality, rendering them unsuitable for high dimensional problems. In this work we propose a transformbased method for high dimensional entropy estimation, which consists of the following two main ingredients. First by modifying the k-NN based entropy estimator developed in (Kozachenko & Leonenko, 1987), we propose a new estimator which enjoys small estimation bias for samples that are close to a uniform distribution. Second we design a normalizing flow based mapping that pushes samples toward a uniform distribution, and the relation between the entropy of the original samples and the transformed ones is also derived. As a result the entropy of a given set of samples is estimated by first transforming them toward a uniform distribution and then applying the proposed estimator to the transformed samples. Numerical experiments demonstrate the effectiveness of the method for high dimensional entropy estimation problems.
AlphaHoldem: high-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning
Enmin Zhao, Renye Yan, Jinqiu Li, Kai Li, Junliang Xing
Abstract: Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative prior works like DeepStack and Libratus heavily rely on counterfactual regret minimization (CFR) and its variants to tackle HUNL. However, the prohibitive computation cost of CFR iteration makes it difficult for subsequent researchers to learn the CFR model in HUNL and apply it in other practical applications. In this work, we present AlphaHoldem, a high-performance and lightweight HUNL AI obtained with an end-to-end self-play reinforcement learning framework. The proposed framework adopts a pseudo-Siamese architecture to directly learn from the input state information to the output actions by competing the learned model with its different historical versions. The main technical contributions include a novel state representation of card and betting information, a multi-task self-play training loss function, and a new model evaluation and selection metric to generate the final model. In a study involving 100,000 hands of poker, AlphaHoldem defeats Slumbot and DeepStack using only one PC with three days training. At the same time, AlphaHoldem only takes four milliseconds for each decision-making using only a single CPU core, more than 1,000 times faster than DeepStack. We will provide an online testing platform of AlphaHoldem to facilitate further studies in this direction.
Certified symmetry and dominance breaking for combinatorial optimisation
Bart Bogaerts, Stephan Gocht, Ciaran McCreesh, Jakob Nordström
Abstract: Symmetry and dominance breaking can be crucial for solving hard combinatorial search and optimisation problems, but the correctness of these techniques sometimes relies on subtle arguments. For this reason, it is desirable to produce efficient, machine-verifiable certificates that solutions have been computed correctly. Building on the cutting planes proof system, we develop a certification method for optimisation problems in which symmetry and dominance breaking are easily expressible. Our experimental evaluation demonstrates that we can efficiently verify fully general symmetry breaking in Boolean satisfiability (SAT) solving, thus providing, for the first time, a unified method to certify a range of advanced SAT techniques that also includes XOR and cardinality reasoning. In addition, we apply our method to maximum clique solving and constraint programming as a proof of concept that the approach applies to a wider range of combinatorial problems.
Online elicitation of necessarily optimal matchings
Abstract: In this paper, we study the problem of eliciting preferences of agents in the house allocation model. For this we build on a recent model of Hosseini et al.[AAAI’21] and focus on the task of eliciting preferences to find matchings which are necessarily optimal, i.e., optimal under all possible completions of the elicited preferences. In particular, we follow the approach of Hosseini et al. and investigate the elicitation of necessarily Pareto optimal (NPO) and necessarily rank-maximal (NRM) matchings. Most importantly, we answer their open question and give an online algorithm for eliciting an NRM matching in the next-best query model which is 3/2-competitive,i.e., it takes at most 3/2 as many queries as an optimal algorithm. Besides this, we extend this field of research by introducing two new natural models of elicitation and by studying both the complexity of determining whether a necessarily optimal matching exists in them, and by giving online algorithms for these models.
Read the full paper on arXiv.
Sampling-based robust control of autonomous systems with non-Gaussian noise
Thom S. Badings, Alessandro Abate, Nils Jansen, David Parker, Hasan A. Poonawala, Marielle Stoelinga
Abstract: Controllers for autonomous systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel planning method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP, and compute a controller for which these guarantees carry over to the autonomous system. Realistic benchmarks show the practical applicability of our method, even when the iMDP has millions of states or transitions.
Read the full paper on arXiv.
Subset approximation of Pareto regions with bi-objective A*
Jorge A. Baier, Carlos Hernández, Nicolás Rivera
Abstract: In bi-objective search, we are given a graph in which each directed arc is associated with a pair of non-negative weights, and the objective is to find the Pareto-optimal solution set. Unfortunately, in many practical settings, this set is too large, and therefore its computation is very time-consuming. In addition, even though bi-objective search algorithms generate the Pareto set incrementally, they do so exhaustively. This means that early during search the solution set covers is not diverse, being concentrated in a small region of the solution set. To address this issue, we present a new approach to subset approximation of the solution set, that can be used as the basis for an anytime bi-objective search algorithm. Our approach transforms the given task into a target bi-objective search task using two real parameters. For each particular parameter setting, the solutions to the target task is a subset of the solution set of the original task. Depending on the parameters used, the solution set of the target task may be computed very quickly. This allows us to obtain, in challenging road map benchmarks, a rich variety of solutions in times that may be orders of magnitude smaller than the time needed to compute the solution set. We show that by running the algorithm with an appropriate sequence of parameters, we obtain a growing sequence of solutions that converges to the full solution set. We prove that our approach is correct and that Bi-Objective A* prunes at least as many nodes when run over the target task.
The SoftCumulative constraint with quadratic penalty
Yanick Ouellet, Claude-Guy Quimper
Abstract: The Cumulative constraint greatly contributes to the success of constraint programming at solving scheduling problems. The SoftCumulative, a version of the Cumulative where overloading the resource incurs a penalty is, however, less studied. We introduce a checker and a filtering algorithm for the SoftCumulative, which are inspired by the powerful energetic reasoning rule for the Cumulative. Both algorithms can be used with classic linear penalty function, but also with a quadratic penalty function, where the penalty of overloading the resource increases quadratically with the amount of the overload. We show that these algorithms are more general than existing algorithms and vastly outperform a decomposition of the SoftCumulative in practice.
A demonstration of compositional, hierarchical interactive task learning
Aaron Mininger, John Laird
Abstract: We present a demonstration of the interactive task learning agent Rosie, where it learns the task of patrolling a simulated barracks environment through situated natural language instruction. In doing so, it builds a sizable task hierarchy composed of both innate and learned tasks, tasks formulated as achieving a goal or following a procedure, tasks with conditional branches and loops, and involving communicative and mental actions. Rosie is implemented in the Soar cognitive architecture, and represents tasks using a declarative task network which it compiles into procedural rules through chunking. This is key to allowing it to learn from a single training episode and generalize quickly.