This article discusses methods proposed in our recent AISTATS and VLDB papers that attempt to answer this question in the machine learning context. This is joint work with David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Nick Hynes, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song, as well as a collaborative effort between UC Berkeley, ETH Zurich, and UIUC. More information about the work in our group can be found here.
Various ad-hoc data valuation schemes have been studied in the literature and some of them have been deployed in the existing data marketplaces. From a practitioner’s point of view, they can be grouped into three categories:
Query-based pricing attaches values to user-initiated queries. One simple example is to set the price based on the number of queries allowed during a time window. Other more sophisticated examples attempt to adjust the price to some specific criteria, such as arbitrage avoidance.
Data attribute-based pricing constructs a price model that takes into account various parameters, such as data age, credibility, potential benefits, etc. The model is trained to match market prices released in public registries.
Auction-based pricing designs auctions that dynamically set the price based on bids offered by buyers and sellers.
However, existing data valuation schemes do not take into account the following important desiderata:
Task-specificness: The value of data depends on the task it helps to fulfill. For instance, if Alice’s medical record indicates that she has disease A, then her data will be more useful to predict disease A as opposed to other diseases.
Fairness: The quality of data from different sources varies dramatically. In the worst-case scenario, adversarial data sources may even degrade model performance via data poisoning attacks. Hence, the data value should reflect the efficacy of data by assigning high values to data which can notably improve the model’s performance.
Efficiency: Practical machine learning tasks may involve thousands or billions of data contributors; thus, data valuation techniques should be capable of scaling up.
With the desiderata above, we now discuss a principled notion of data value and computationally efficient algorithms for data valuation.
Due to the task-specific nature of data value, it should depend on the utility of the machine learning model trained on the data. Suppose the machine learning model generates a specific amount of profit. Then, we can reduce the data valuation problem to a profit allocation problem, which splits the total utility of the machine learning model between different data sources. Indeed, it is a well-studied problem in cooperative game theory to fairly allocate profits created by collective efforts. The most prominent profit allocation scheme is the Shapley value. The Shapley value attaches a real-value number to each player in the game to indicate the relative importance of their contributions. Specifically, for $N$ players, the Shapley value of the player $i$ ($i\in I={1,\ldots,N}$) is defined as
where $U(S)$ is the utility function that evaluates the worth of the player subset S. In the definition above, the difference in the bracket measures how much the payoff increases when player $i$ is added to a particular subset $S$; thus, the Shapley value measures the average contribution of player $i$ to every possible group of other players in the game.
Relating these game theoretic concepts to the problem of data valuation, one can think of the players as training data sources, and accordingly, the utility function $U(S)$ as a performance measure of the model trained on the subset S of training data. Thus, the Shapley value can be used to determine the value of each data source. The Shapley value is appealing because it is the only profit allocation scheme that satisfies the following properties:
Group rationality: the total utility of the machine learning model is completely split between different data sources, i.e., $\sum_{i=1}^N s_i = U(I)$. This is a natural requirement because data contributors would expect the total benefit to be fully distributed.
Fairness: Two data sources that have identical contributions to the model utility should have the same value; moreover, data sources with zero contributions to all subsets of the dataset should not receive any payoff.
Additivity: The values under multiple utilities add up to the value under a utility that is the sum of all these utilities. This property generalizes the data valuation for a single task to multiple tasks. Specifically, if each task is associated with a utility function as the performance measure, with the additivity property, we can calculate the multi-task data value by simply computing the Shapley value with respect to the aggregated utility function.
Because the Shapley value uniquely satisfies the aforementioned properties and naturally leads to a payoff scheme dependent on the underlying task, we employ the Shapley value as a data value notion. While the outlined concept appears plausible, it has some fundamental challenges: computing the Shapley value, in general, requires evaluating the utility function for an exponential number of times; even worse, evaluating the utility function means re-training the model in the machine learning context. This is clearly intractable even for a small dataset. Interestingly, by focusing on the machine learning context, some opportunities arise to address the scalability challenge. Next, we show that for the K-nearest neighbors (KNN) classification, one can obviate the need to re-train models and compute the Shapley value in quasi-linear time—an exponential improvement in computational efficiency!
To understand why KNN is amenable to efficient data valuation, we consider $K=1$ and investigate the following simple utility function defined for 1NN: $U(S)=1$ if the label of a test point is correctly predicted by its nearest neighbor in $S$ and $0$ otherwise. For a given test point, the utility of a set is completely determined by the nearest neighbor in this set to the test point. Thus, the contribution of the point $i$ to a subset $S$ is zero if the nearest neighbor in S is closer to the test point than $i$. When we re-examine the Shapley value, we observe that for many $S$, $U(S\cup{i})-U(S)=0$. Figure 1 illustrates an example of such an $S$. This simple example shows the computational requirement of the Shapley value can be significantly reduced for KNN.
Figure 1: Illustration of why KNN is amenable to efficient Shapley value
computation.
For a given test point $(x_\text{test},y_\text{test})$, we let $\alpha_k(S)$ denote the $k$th nearest neighbor in $S$ to the test point. Consider the following utility function that measures the likelihood of predicting the right label of a particular test point for KNN:
Now assume that the training data is sorted according to their similarity to the test point. We develop a simple recursive algorithm to compute the Shapley value of all training points from the furthest neighbor of the test point to the nearest one. Let $\mathbb{I}[\cdot]$ represent the indicator function. Then, the algorithm proceeds as follows:
This algorithm can be extended to the case where the utility is defined as the likelihood of predicting the right labels for multiple test points. With the additivity property, the Shapley value for multiple test points is the sum of the Shapley value for every test point. The computational complexity is $\mathcal{O}(N\log NN_\text{test})$ for $N$ training points and $N_\text{test}$ test points—this is simply the complexity of a sorting algorithm!
We can also develop a similar recursive algorithm to compute the Shapley value for KNN regression. Moreover, in some applications, such as document retrieval, test points could arrive sequentially and the value of each training point needs to be updated and accumulated on the fly, which makes it impossible to complete sorting offline. However, sorting a large dataset with a high dimension in an online manner will be expensive. To address the scalability challenge in the online setting, we develop an approximation algorithm to compute the Shapley value for KNN with improved efficiency. The efficiency boost is achieved by utilizing the locality-sensitive hashing to circumvent the need of sorting. More details of these extensions can be found in our paper.
The Shapley value for KNN is efficient due to the special locality structure of KNN. For general machine learning models, the exact computation of the Shapley value is inevitably slower. To address this challenge, prior work often resorts to Monte Carlo-based approximation algorithms. The central idea behind these approximation algorithms is to treat the Shapley value of a training point as its expected contribution to a random subset and use the sample average to approximate the expectation. By the definition of the Shapley value, the random set has size $0$ to $N-1$ with equal probability (corresponding to the $1/N$ factor) and is also equally likely to be any subset of a given size (corresponding to the $1/{N-1\choose |S|}$ factor). In practice, one can implement an equivalent sampler by drawing a random permutation of the training set. Then, the approximation algorithm proceeds by computing the marginal utility of a point to the points preceding it and averaging the marginal utilities across different permutations. This was the state-of-the-art method to estimate the Shapley value for general utility functions (referred to as the baseline approximation later). To assess the performance of an approximation algorithm, we can look at the number of utility evaluations needed to achieve some guarantees of the approximation error. Using Hoeffding’s bound, it can be proved that the baseline approximation algorithm above needs $\mathcal{O}(N^2\log N)$ utility evaluations so that the squared error between the estimated and the ground truth Shapley value is bounded with high probability. Can we reduce the number of utility evaluations while maintaining the same approximation error guarantee?
We developed an approximation algorithm that requires only $\mathcal{O}(N(\log N)^2)$ utility evaluations by utilizing the information sharing between different random samples. The key idea is that if a data point has a high value, it tends to boost the utility of all subsets containing it. This inspires us to draw some random subsets and record the presence of each training point in these randomly selected subsets. Denoting the appearance of the $i$th and $j$th training data by $\beta_i$ and $\beta_j$. We can smartly design the distribution of the random subsets so that the expectation of $(\beta_i-\beta_j)U(\beta_1,\ldots,\beta_N)$ is equal to $s_i-s_j$. We can pick an anchor point, say, $s_1$, and use the sample average of $(\beta_i-\beta_1)U(\beta_1,\ldots,\beta_N)$ for all $i=2,\ldots,N$ to estimate the Shapley value difference from all other training points to $s_1$. Then, we can simply perform a few more utility evaluations to estimate $s_1$, which allows us to recover the Shapley value of all other points. More details of this algorithm can be found in our paper. Since this algorithm computes the Shapley value by simply examining the utility of groups of data, we will refer to this algorithm as the group testing-based approximation hereinafter. Our paper also discusses even more efficient ways to estimate the Shapley value when new assumptions can be made, such as the sparsity of the Shapley values and the stability of the underlying learning algorithm.
First, we demonstrate the efficiency of the proposed method to compute the exact Shapley value for KNN. We benchmark the runtime using a 2.6 GHZ Intel Core i7 CPU and compare the exact algorithm with the baseline Monte-Carlo approximation. Figure 2(a) shows the Monte-Carlo estimate of the Shapley value for each training point converges to the result of the exact algorithm with enough simulations, thus indicating the correctness of our exact algorithm. More importantly, the exact algorithm is several orders of magnitude faster than the baseline approximation as shown in Figure 2(b) .
Figure 2: (a) The Shapley value produced by our proposed exact approach and the
baseline Monte-Carlo approximation algorithm for the KNN classifier constructed
with 1000 randomly selected training points from MNIST. (b) Runtime comparison
of the two approaches as the training size increases.
With the proposed algorithm, for the first time, we can compute data values for a practically large database. Figure 3 illustrates the result of a large-scale experiment using the KNN Shapley value. We take 1.5 million images with pre-calculated features and labels from Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset. We observe that the KNN Shapley value is intuitive—the top-valued images are semantically correlated with the corresponding test image. This experiment takes only a few seconds per test image on a single CPU and can be parallelized for a large test set.
Figure 3: Data valuation using KNN classifiers (K = 10) on 1.5 million images
(all images with pre-calculated deep feature representations in the Yahoo100M
dataset).
Similarly, Figure 4(a) demonstrates the accuracy of our proposed group testing-based approximation and Figure 4(b) shows that the group testing-based approximation outperforms the baseline approximation by several orders of magnitude for a large number of data points.
Figure 4: The Shapley value produced by our proposed group testing-based
approximation and the baseline approximation algorithm for a logistic
regression classifier trained on the Iris dataset. (b) Runtime comparison of
the two approaches.
We also perform experiments to demonstrate the utility of the Shapley value beyond data marketplace applications. Since the Shapley value tells us how useful a data point is for a machine learning task, we can use it to identify the low-quality or even adversarial data points in the training set. As a simple example, we artificially create a training set with half of the data directly from MNIST and the other half perturbed with random noise. In Figure 5, we compare the Shapley value between normal and noisy data as the noise ratio becomes higher. The figure shows that the Shapley value can be used to effectively detect noisy training data.
Figure 5: The Shapley value of normal and noisy training data as the noise
magnitude becomes higher.
The Shapley value can also be used to understand adversarial training, which is an effective method to improve the adversarial robustness of a model by introducing adversarial examples to the training dataset. In practice, we measure the robustness in terms of the test accuracy on a dataset containing adversarial examples. We expect that the adversarial examples in the training dataset become more valuable as more adversarial examples are added into the test dataset. Based on the MNIST, we construct a training dataset that contains both benign and adversarial examples and synthesize test datasets with different adversarial-benign mixing ratios. Two popular attack algorithms, namely, the fast gradient sign method (FGSM) and the iterative attack (CW) are used to generate adversarial examples. Figure 6(a) and (b) compare the average Shapley value for adversarial examples and for benign examples in the training dataset. The negative test loss for logistic regression is used as the utility function. We see that the Shapley value of adversarial examples increases as the test data becomes more adversarial; in contrast, the Shapley value of benign examples decreases. In addition, the adversarial examples in the training set are more valuable if they are generated from the same attack algorithm during test time.
Figure 6: Comparison of the Shapley value of benign and adversarial examples.
FGSM and CW are different attack algorithms used for generating adversarial
examples in the test dataset: (a) (resp. (b)) is trained on Benign+FGSM (resp.
CW) adversarial examples.
We hope that our approaches for data valuation provide the theoretical and computational tools to facilitate data collection and dissemination in future data marketplaces. Beyond data markets, the Shapley value is a versatile tool for machine learning practitioners; for instance, it can be used for selecting features or interpreting black-box model predictions. Our algorithms can also be applied to mitigate the computational challenges in these important applications.
]]>By Ruoxi Jia
People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that people should be paid for their contributions—but the million-dollar question is: by how much?
This article discusses methods proposed in our recent AISTATS and VLDB papers that attempt to answer this question in the machine learning context. This is joint work with David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Nick Hynes, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song, as well as a collaborative effort between UC Berkeley, ETH Zurich, and UIUC. More information about the work in our group can be found here.
Various ad-hoc data valuation schemes have been studied in the literature and some of them have been deployed in the existing data marketplaces. From a practitioner’s point of view, they can be grouped into three categories:
However, existing data valuation schemes do not take into account the following important desiderata:
With the desiderata above, we now discuss a principled notion of data value and computationally efficient algorithms for data valuation.
Due to the task-specific nature of data value, it should depend on the utility of the machine learning model trained on the data. Suppose the machine learning model generates a specific amount of profit. Then, we can reduce the data valuation problem to a profit allocation problem, which splits the total utility of the machine learning model between different data sources. Indeed, it is a well-studied problem in cooperative game theory to fairly allocate profits created by collective efforts. The most prominent profit allocation scheme is the Shapley value. The Shapley value attaches a real-value number to each player in the game to indicate the relative importance of their contributions. Specifically, for players, the Shapley value of the player () is defined as
where is the utility function that evaluates the worth of the player subset S. In the definition above, the difference in the bracket measures how much the payoff increases when player is added to a particular subset ; thus, the Shapley value measures the average contribution of player to every possible group of other players in the game.
Relating these game theoretic concepts to the problem of data valuation, one can think of the players as training data sources, and accordingly, the utility function as a performance measure of the model trained on the subset S of training data. Thus, the Shapley value can be used to determine the value of each data source. The Shapley value is appealing because it is the only profit allocation scheme that satisfies the following properties:
Because the Shapley value uniquely satisfies the aforementioned properties and naturally leads to a payoff scheme dependent on the underlying task, we employ the Shapley value as a data value notion. While the outlined concept appears plausible, it has some fundamental challenges: computing the Shapley value, in general, requires evaluating the utility function for an exponential number of times; even worse, evaluating the utility function means re-training the model in the machine learning context. This is clearly intractable even for a small dataset. Interestingly, by focusing on the machine learning context, some opportunities arise to address the scalability challenge. Next, we show that for the K-nearest neighbors (KNN) classification, one can obviate the need to re-train models and compute the Shapley value in quasi-linear time—an exponential improvement in computational efficiency!
To understand why KNN is amenable to efficient data valuation, we consider and investigate the following simple utility function defined for 1NN: if the label of a test point is correctly predicted by its nearest neighbor in and otherwise. For a given test point, the utility of a set is completely determined by the nearest neighbor in this set to the test point. Thus, the contribution of the point to a subset is zero if the nearest neighbor in S is closer to the test point than . When we re-examine the Shapley value, we observe that for many , . Figure 1 illustrates an example of such an . This simple example shows the computational requirement of the Shapley value can be significantly reduced for KNN.
Figure 1: Illustration of why KNN is amenable to efficient Shapley value computation.
For a given test point , we let denote the th nearest neighbor in to the test point. Consider the following utility function that measures the likelihood of predicting the right label of a particular test point for KNN:
Now assume that the training data is sorted according to their similarity to the test point. We develop a simple recursive algorithm to compute the Shapley value of all training points from the furthest neighbor of the test point to the nearest one. Let represent the indicator function. Then, the algorithm proceeds as follows:
This algorithm can be extended to the case where the utility is defined as the likelihood of predicting the right labels for multiple test points. With the additivity property, the Shapley value for multiple test points is the sum of the Shapley value for every test point. The computational complexity is for training points and test points—this is simply the complexity of a sorting algorithm!
We can also develop a similar recursive algorithm to compute the Shapley value for KNN regression. Moreover, in some applications, such as document retrieval, test points could arrive sequentially and the value of each training point needs to be updated and accumulated on the fly, which makes it impossible to complete sorting offline. However, sorting a large dataset with a high dimension in an online manner will be expensive. To address the scalability challenge in the online setting, we develop an approximation algorithm to compute the Shapley value for KNN with improved efficiency. The efficiency boost is achieved by utilizing the locality-sensitive hashing to circumvent the need of sorting. More details of these extensions can be found in our paper.
The Shapley value for KNN is efficient due to the special locality structure of KNN. For general machine learning models, the exact computation of the Shapley value is inevitably slower. To address this challenge, prior work often resorts to Monte Carlo-based approximation algorithms. The central idea behind these approximation algorithms is to treat the Shapley value of a training point as its expected contribution to a random subset and use the sample average to approximate the expectation. By the definition of the Shapley value, the random set has size to with equal probability (corresponding to the factor) and is also equally likely to be any subset of a given size (corresponding to the factor). In practice, one can implement an equivalent sampler by drawing a random permutation of the training set. Then, the approximation algorithm proceeds by computing the marginal utility of a point to the points preceding it and averaging the marginal utilities across different permutations. This was the state-of-the-art method to estimate the Shapley value for general utility functions (referred to as the baseline approximation later). To assess the performance of an approximation algorithm, we can look at the number of utility evaluations needed to achieve some guarantees of the approximation error. Using Hoeffding’s bound, it can be proved that the baseline approximation algorithm above needs utility evaluations so that the squared error between the estimated and the ground truth Shapley value is bounded with high probability. Can we reduce the number of utility evaluations while maintaining the same approximation error guarantee?
We developed an approximation algorithm that requires only utility evaluations by utilizing the information sharing between different random samples. The key idea is that if a data point has a high value, it tends to boost the utility of all subsets containing it. This inspires us to draw some random subsets and record the presence of each training point in these randomly selected subsets. Denoting the appearance of the th and th training data by and . We can smartly design the distribution of the random subsets so that the expectation of is equal to . We can pick an anchor point, say, , and use the sample average of for all to estimate the Shapley value difference from all other training points to . Then, we can simply perform a few more utility evaluations to estimate , which allows us to recover the Shapley value of all other points. More details of this algorithm can be found in our paper. Since this algorithm computes the Shapley value by simply examining the utility of groups of data, we will refer to this algorithm as the group testing-based approximation hereinafter. Our paper also discusses even more efficient ways to estimate the Shapley value when new assumptions can be made, such as the sparsity of the Shapley values and the stability of the underlying learning algorithm.
First, we demonstrate the efficiency of the proposed method to compute the exact Shapley value for KNN. We benchmark the runtime using a 2.6 GHZ Intel Core i7 CPU and compare the exact algorithm with the baseline Monte-Carlo approximation. Figure 2(a) shows the Monte-Carlo estimate of the Shapley value for each training point converges to the result of the exact algorithm with enough simulations, thus indicating the correctness of our exact algorithm. More importantly, the exact algorithm is several orders of magnitude faster than the baseline approximation as shown in Figure 2(b) .
Figure 2: (a) The Shapley value produced by our proposed exact approach and the baseline Monte-Carlo approximation algorithm for the KNN classifier constructed with 1000 randomly selected training points from MNIST. (b) Runtime comparison of the two approaches as the training size increases.
With the proposed algorithm, for the first time, we can compute data values for a practically large database. Figure 3 illustrates the result of a large-scale experiment using the KNN Shapley value. We take 1.5 million images with pre-calculated features and labels from Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset. We observe that the KNN Shapley value is intuitive—the top-valued images are semantically correlated with the corresponding test image. This experiment takes only a few seconds per test image on a single CPU and can be parallelized for a large test set.
Figure 3: Data valuation using KNN classifiers (K = 10) on 1.5 million images (all images with pre-calculated deep feature representations in the Yahoo100M dataset).
Similarly, Figure 4(a) demonstrates the accuracy of our proposed group testing-based approximation and Figure 4(b) shows that the group testing-based approximation outperforms the baseline approximation by several orders of magnitude for a large number of data points.
Figure 4: The Shapley value produced by our proposed group testing-based approximation and the baseline approximation algorithm for a logistic regression classifier trained on the Iris dataset. (b) Runtime comparison of the two approaches.
We also perform experiments to demonstrate the utility of the Shapley value beyond data marketplace applications. Since the Shapley value tells us how useful a data point is for a machine learning task, we can use it to identify the low-quality or even adversarial data points in the training set. As a simple example, we artificially create a training set with half of the data directly from MNIST and the other half perturbed with random noise. In Figure 5, we compare the Shapley value between normal and noisy data as the noise ratio becomes higher. The figure shows that the Shapley value can be used to effectively detect noisy training data.
Figure 5: The Shapley value of normal and noisy training data as the noise magnitude becomes higher.
The Shapley value can also be used to understand adversarial training, which is an effective method to improve the adversarial robustness of a model by introducing adversarial examples to the training dataset. In practice, we measure the robustness in terms of the test accuracy on a dataset containing adversarial examples. We expect that the adversarial examples in the training dataset become more valuable as more adversarial examples are added into the test dataset. Based on the MNIST, we construct a training dataset that contains both benign and adversarial examples and synthesize test datasets with different adversarial-benign mixing ratios. Two popular attack algorithms, namely, the fast gradient sign method (FGSM) and the iterative attack (CW) are used to generate adversarial examples. Figure 6(a) and (b) compare the average Shapley value for adversarial examples and for benign examples in the training dataset. The negative test loss for logistic regression is used as the utility function. We see that the Shapley value of adversarial examples increases as the test data becomes more adversarial; in contrast, the Shapley value of benign examples decreases. In addition, the adversarial examples in the training set are more valuable if they are generated from the same attack algorithm during test time.
Figure 6: Comparison of the Shapley value of benign and adversarial examples. FGSM and CW are different attack algorithms used for generating adversarial examples in the test dataset: (a) (resp. (b)) is trained on Benign+FGSM (resp. CW) adversarial examples.
We hope that our approaches for data valuation provide the theoretical and computational tools to facilitate data collection and dissemination in future data marketplaces. Beyond data markets, the Shapley value is a versatile tool for machine learning practitioners; for instance, it can be used for selecting features or interpreting black-box model predictions. Our algorithms can also be applied to mitigate the computational challenges in these important applications.
This article was initially published on the BAIR blog, and appears here with the authors’ permission.
]]>The AIhub coffee corner captures the musings of AI experts over a 30-minute conversation. This edition focusses on AI as an inventor. This discussion was prompted by news that an artificial intelligence system was named as the inventor of two ideas in patents filed in the UK, Europe and US last summer.
Involved in the discussion for this edition are: Sabine Hauert (University of Bristol), Michael Littman (Brown University), Carles Sierra (CSIC) and Pedro Lima (University of Lisbon).
Sabine Hauert: Should AI be able to file a patent?
Michael Littman: If an AI can read and understand, and then carry out the patent-submission process, I would consider it. Until then, some human being needs to be involved and take primary responsibility.
Carles Sierra: To me the main questions are, “Can the AI that generates a new design contribute to the negotiation process and exploitation of the patent? Can the AI understand the filing process, which is different from the design process?”. If you open this door, then any program that generates anything with a technological impact could become a patent filer. I think patents should be owned by the developer of the AI.
Sabine: But should the AI be seen as an inventor?
Carles: You would need a general intelligence to understand how to handle the impact (for example, in social and commercial terms) of the invention. A patent is exploited in order to result in a product. How does it make sense that the AI would make the money? You need something more general than the act of creating the technology. We are currently far away from this.
Michael: AI is a partner to the invention process. But a human being is the one responsible for the disclosure.
Carles: And the AI doesn’t know who funded it – it doesn’t have the relevant knowledge.
Sabine: Just to clarify, the BBC article says the AI should be recognised as being the inventor, and whoever the AI belonged to should be the patent’s owner, unless they sold it on.
Carles: It’s the same as any other patent filing then. The challenge is that by making the AI the Inventor, are we giving it rights? This is a legal issue. If we offer rights to AI – then we enter philosophical realms. However, it is worth noting that times change. For example, in the past it would have been unthinkable to give animals rights, but this is something that I think should happen (there has been a big debate about this in Spain with regards to bullfighting). If attitudes towards animals can change, then it’s also possible that attitudes towards AI and rights will change too.
Michael: An AI is not a cognisant party in this process. Would a songwriter give writing credit to the sounds of a box of toys falling down the steps if he/she uses it in a song? Is the camera responsible for the photo? I’d say no – some person has to recognize the value of what was created and that’s part of the process as well. Could it happen in the future? I think so. Are we there now? We are not even close.
Pedro Lima: Even having a program that understands the full process of submitting a patent could be an AI. Another question to address is: why would the AI care about being attributed authorship of an invention? Humans do it for glory, money, etc. Why would an AI do it – why is this relevant?
Sabine: So you’re saying the AI could be used in two separate parts of the process. The inventing, and the filing?
Pedro: Yes, you could do both inventing and filing the patent, but why do it (especially with regards to the creation part)?
Carles: I’m not denying the possibility of an AI being creative. I think AI can be radically innovative. I’m just doubtful that the AI should have the legal rights to do it. It can’t refrain from creating, there is no act of consciousness.
Pedro: It’s nice that the program is capable of doing it – but why would the algorithm file the patent?
Sabine: Could the idea of automatic patenting break the patent system?
Michael: There’s a cost to process each patent. The patent office would declare random submissions as nuisance filings.
Carles: You can’t patent everything. In the EU you can’t patent a molecule for example. There was a big discussion around the patenting of genes and eventually that was ruled out. It’s hard to file a patent, it’s difficult to find the novelty and understand what’s new.
Sabine: So really, we’re back to business as usual? Nothing new to see here?
Michael: Yes, the AI is a tool just like a drafting program. It doesn’t make sense to credit the AI software with the invention.
Carles: Going back to rights – only humans should have human rights. Maybe in 100 years’ time we’ll find that AIs are autonomous/sentient enough. Then we could think of giving them rights, hence authorship. This would be an interesting debate for philosophical law.
Pedro: I like your point about AIs being sentient or not. If they are sentient they may enjoy getting the credit.
Carles: It’s really a debate between philosophers, law, and the technology crowd.
Sabine: Reading the article, it looked like the patent crowd were stumped by this.
Carles: They are not used to this. Corporations can’t be inventors, only people, so how could an AI be an inventor?
Sabine: Maybe you need the patent office to be AI powered.
Carles: If you allow this – you will increase inequality. There is the risk that AI powerhouses will own all patents.
Sabine: Alexander Reben, an artist and robotisist, recently designed a robot that could “patent everything” – with the idea that nothing could be patented anymore.
Carles: Or you could just ban all patents?
Pedro: M. Mazzucato has been discussing abolishing patents or changing the patenting system. She writes in her blog, “The consequence is to limit the diffusion of knowledge. There has been an increase in ‘patent trolling’, where firms hold patents not to develop the technology but simply to collect royalties. Here the patent has become an end in itself, disconnected from productive purposes”. “Four trends in recent years have disturbed this delicate balance to the extent that I now worry that – instead of facilitating innovation – the patent system is actually inhibiting it.”
Pedro: It is also worth reading the European Parliament’s report on Robot Law by Mady Delvaux. It generated a lot of interest because there was a discussion of robots having their own rights. Really this was introduced for liability purposes. It is relevant to this discussion as it talks about robots having rights to file patents, or responsibility in the case of autonomous cars. But some people fear that giving electronic personality to AI algorithms may lead to companies exploiting it for profit only, not for innovation, because they will develop algorithms that will hold the patent – as also underlined by Mazzucato.
Michael: Patents are a mechanism for disclosure, i.e., sharing. Banning patents doesn’t seem like it helps with equality. It’s not purely about claiming rights for oneself, it’s for trading some rights – “I’ll tell you about my cool invention that might benefit you in exchange for restrictions on who can make money from this invention”.
Carles: Assigning rights to machines is complicated.
Sabine: So, to summarise our coffee corner: don’t give rights to AIs… yet.
]]>Monday to Wednesday at AAAI-20 saw a multitude of technical sessions, the exhibition and posters. In addition there were a number of interesting debates, invited talks and panels. Here are some tweets from the final three days of the conference.
Great discussion involving Daniel Kahnemann, Bengio, @ylecun, and @geoffreyhinton ! A must watch. #AAAI20 https://t.co/IEoYcgkfqF
— Thomas Paula (@tsp_thomas) February 12, 2020
Time for the 2020 AAAI Debate! "Academic AI researchers should focus their attention on research problems that are not of immediate interest to industry." Vote here; then we'll vote again after 7PM to see if the debate changed any minds. #AAAI20debate #AAAI20
— Kevin Leyton-Brown (@k_leyton_brown) February 10, 2020
Part two of the 2020 AAAI Debate: "Academic AI researchers should focus their attention on research problems that are not of immediate interest to industry." Vote here; we'll compare to the first vote to see if the debate changed any minds. #AAAI20debate #AAAI20
— Kevin Leyton-Brown (@k_leyton_brown) February 11, 2020
We saw so many creative ideas for our limerick challenge at #AAAI20! Here are just a few of the submissions. Which limerick is your favorite? pic.twitter.com/9pelO7d7uK
— Microsoft Research (@MSFTResearch) February 12, 2020
Congrats to the winner of our limerick challenge at #AAAI20!
A limerick concerning AI?
I'd like to give that a try
Is it still something that
We're much better at
Than robots endowed with AI?Visit our booth for job openings and to chat with our experts: https://t.co/MHTXGm5YVf pic.twitter.com/cpwkyPvZEH
— Microsoft Research (@MSFTResearch) February 11, 2020
Congratulations to the winners of this year’s winners of the Outstanding Paper Award! #AAAI20 pic.twitter.com/Gbd3biH8z0
— AAAI (@RealAAAI) February 11, 2020
Take a break at #AAAI20 and learn sign language from AI! Visit Concourse G, one floor down from the main hotel lobby to test your skills. #AAAI2020 pic.twitter.com/klwtV3nkcA
— AAAI (@RealAAAI) February 11, 2020
Garry Kasparov (yes, Grand Masterthat one) says we’re focused the wrong thing if we look at how “perfect” computers can be. Computers can be superhuman simply by making fewer mistakes. #AAAI20 pic.twitter.com/wYpMCIX8NV
— Jeff Chen (@thisisjeffchen) February 11, 2020
Chess legend @Kasparov63 says “For many years I have been pondering whether deep blue was a bless or a curse. now I think it was a blessing.” #aaai20 pic.twitter.com/DZPKdktgam
— will knight (@willknight) February 11, 2020
“Provably beneficial AI is possible and desirable: but it’s not ‘AI safety’ or ‘AI ethics.’ It’s AI.” – Stuart Russell at #AAAI2020 ending his excellent invited talk. #ArtificialIntelligence #AI #AAAI20 #BeneficialAI pic.twitter.com/TMGDRLXCR4
— Prof. Barry O'Sullivan, MRIA (@BarryOSullivan) February 12, 2020
.@dawnsongtweets talking now about AI and security #AAAI20 pic.twitter.com/IZHcnBr6kP
— AAAI (@RealAAAI) February 11, 2020
Ph.D. student @VT_DAC Fanglan Chen followed her earlier talk at yesterday's @RealAAAI conference with more discussion about her research during an evening poster session. #AAAI20 @DTSH4869 @UrbComp @VT_CS @VTSPIA https://t.co/Fd5iAcSE2I pic.twitter.com/GYv0ahITZC
— DAC at Virginia Tech (@VT_DAC) February 12, 2020
Had a great time this week at #AAAI20 where I presented @pedjagogue, Shantanu Jain, and my paper “Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting”- https://t.co/qlWbqrDrwb pic.twitter.com/OSyQ6zspf8
— Daniel Zeiberg (@DZeiberg) February 12, 2020
I am presenting “Empirical Bounds on Linear Regions of Deep Rectifier Networks” at the #AAAI20 poster session in Americas Hall 1, poster ML148.
You can find a summary of our paper in the blog post below. #AAAI2020 @RealAAAI https://t.co/Q9pYPxjaKC pic.twitter.com/CgAWVvEsIr
— Thiago Serra (@thserra) February 10, 2020
Thank you Mark Mitton for a great magic show this morning at #AAAI20! pic.twitter.com/HxWH9v0i41
— AAAI (@RealAAAI) February 12, 2020
Gotta love a conference that knows its cartoons and its history. #AAAI20 @RealAAAI pic.twitter.com/pdTCo9lboh
— Kartik Talamadupula (@kr_t) February 10, 2020
]]>The Three Amazing Amigos who made #AAAI20 happen! Three cheers to their mad masochism (without which, they would surely not have taken on this gigantic job!) @conitzer @feishaAI @frossi_t #AAAI2020 pic.twitter.com/Imj9Jqq9fn
— Subbarao Kambhampati (@rao2z) February 12, 2020
If you weren’t able to attend the AAAI20 conference in New York you can catch some of the invited talks and panel sessions via the livestreamed videos. Featured events include Yolande Gil’s presidential address and the Turing Award winners’ session.
You can also watch the AI history panel. This was a much anticipated event, featuring none other than chess Grandmaster Garry Kasparov, and didn’t disappoint. The other panellists were Murray Campbell (IBM), Michael Bowling (University of Alberta), Hiroaki Kitano (Sony) and David Silver (Deepmind and University College London). They discussed the technology they developed, challenges they encountered, and how building expert game-playing machines furthers progress in AI techniques that can be applied to real-world problems.
Monday evening saw a light-hearted debate with the proposition: “Academic AI researchers should focus their attention on research problems that are not of immediate interest to industry”.
Based on a small sample size, here are the before and after votes (from Kevin Leyton-Brown’s Twitter poll):
Time for the 2020 AAAI Debate! "Academic AI researchers should focus their attention on research problems that are not of immediate interest to industry." Vote here; then we'll vote again after 7PM to see if the debate changed any minds. #AAAI20debate #AAAI20
— Kevin Leyton-Brown (@k_leyton_brown) February 10, 2020
Part two of the 2020 AAAI Debate: "Academic AI researchers should focus their attention on research problems that are not of immediate interest to industry." Vote here; we'll compare to the first vote to see if the debate changed any minds. #AAAI20debate #AAAI20
— Kevin Leyton-Brown (@k_leyton_brown) February 11, 2020
You can find all of the videoed talks and sessions here.
]]>The AAAI-20 outstanding paper awards were presented on Tuesday 11th February at the AAAI conference in New York. Awards and honourable mentions were given for: outstanding paper, outstanding student paper and outstanding paper in the special track on AI for social impact. You can read about the award-winning work below.
WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
The Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2011), a benchmark for commonsense reasoning, is a set of 273 expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models that rely on selectional preferences or word associations. However, recent advances in neural language models have already reached around 90% accuracy on variants of WSC. This raises an important question whether these models have truly acquired robust commonsense capabilities or whether they rely on spurious biases in the datasets that lead to an overestimation of the true capabilities of machine commonsense. To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) a carefully designed crowdsourcing procedure, followed by (2) systematic bias reduction using a novel AfLite algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. The best state-of-the-art methods on WinoGrande achieve 59.4-79.1%, which are 15-35% below human performance of 94.0%, depending on the amount of the training data allowed. Furthermore, we establish new state-of-the-art results on five related benchmarks – WSC (90.1%), DPR (93.1%), COPA (90.6%), KnowRef (85.6%), and Winogender (97.1%). These results have dual implications: on one hand, they demonstrate the effectiveness of WinoGrande when used as a resource for transfer learning. On the other hand, they raise a concern that we are likely to be overestimating the true capabilities of machine commonsense across all these benchmarks. We emphasize the importance of algorithmic bias reduction in existing and future benchmarks to mitigate such overestimation.
Read the full paper on arXiv.
A Unifying View on Individual Bounds and Heuristic Inaccuracies in Bidirectional Search
Vidal Alcázar, Pat Riddle, Mike Barley
In the past few years, new very successful bidirectional heuristic search algorithms have been proposed. Their key novelty is a lower bound on the cost of a solution that includes information from the g values in both directions. Kaindl and Kainz (1997) proposed measuring how inaccurate a heuristic is while expanding nodes in the opposite direction, and using this information to raise the f value of the evaluated nodes. However, this comes with a set of disadvantages and remains yet to be exploited to its full potential. Additionally, Sadhukhan (2013) presented BAE∗, a bidirectional best-first search algorithm based on the accumulated heuristic inaccuracy along a path. However, no complete comparison in regards to other bidirectional algorithms has yet been done, neither theoretical nor empirical. In this paper we define individual bounds within the lower-bound framework and show how both Kaindl and Kainz’s and Sadhukhan’s methods can be generalized thus creating new bounds. This overcomes previous shortcomings and allows newer algorithms to benefit from these techniques as well. Experimental results show a substantial improvement, up to an order of magnitude in the number of necessarily-expanded nodes compared to state-of-the-art near-optimal algorithms in common benchmarks.
Fair Division of Mixed Divisible and Indivisible Goods
Xiaohui Bei, Zihao Li, Jinyan Liu, Shengxin Liu, Xinhang Lu
We study the problem of fair division when the resources contain both divisible and indivisible goods. Classic fairness notions such as envy-freeness (EF) and envy-freeness up to one good (EF1) cannot be directly applied to the mixed goods setting. In this work, we propose a new fairness notion envy-freeness for mixed goods (EFM), which is a direct generalization of both EF and EF1 to the mixed goods setting. We prove that an EFM allocation always exists for any number of agents. We also propose efficient algorithms to compute an EFM allocation for two agents and for agents with piecewise linear valuations over the divisible goods. Finally, we relax the envy-free requirement, instead asking for -envy-freeness for mixed goods (-EFM), and present an algorithm that finds an -EFM allocation in time polynomial in the number of agents, the number of indivisible goods, and .
Read the full paper on arXiv.
Lifelong Learning with a Changing Action Set
Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas
In many real-world sequential decision making problems, the number of available actions (decisions) can vary over time. While problems like catastrophic forgetting, changing transition dynamics, changing rewards functions, etc. have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed. In this paper, we present an algorithm that autonomously adapts to an action set whose size changes over time. To tackle this open problem, we break it into two problems that can be solved iteratively: inferring the underlying, unknown, structure in the space of actions and optimizing a policy that leverages this structure. We demonstrate the efficiency of this approach on large-scale real-world lifelong learning problems.
Read the full paper on arXiv.
A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning
Kévin Fauvel, Daniel Balouek-Thomert, Diego Melgar, Pedro Silva, Anthony Simonet, Gabriel Antoniu, Alexandru Costan, Véronique Masson, Manish Parashar, Ivan Rodero, Alexandre Termier
Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to its propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data consequently, affecting the response time and the robustness of EEW systems. In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seis-mometers) approach that adopts the rule of relative strength.
Read the full paper here.
The Unreasonable Effectiveness of Inverse Reinforcement Learning in Advancing Cancer Research
John Kalantari, Heidi Nelson and Nicholas Chia
The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships speciﬁc to a given problem. This “unreasonable efﬁcacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a ﬁeld where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reﬂect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, out-compete neighboring cells, and expand spatially. Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More speciﬁcally, by encoding a cancer cell’s complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning(IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of expert demonstrations extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here,we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.
]]>It was a busy weekend at the 34th Conference on Artificial Intelligence (AAAI). Although the AAAI technical sessions didn’t start in earnest until the Sunday there were numerous workshops on Saturday as well as associated conferences AIES (AI, Ethics and Society) and EAAI (Educational Advances in AI). Here is a selection of tweets from the weekend.
Been looking forward to @OsondeOsoba's talk, "Technocultural Pluralism: A “Clash of Civilizations” in Technology?" https://t.co/ubbefB76RO #AAAI20 #AI #ethics pic.twitter.com/tz8psY0y9I
— Charlie Oliver (@itscomplicated) February 8, 2020
Nice moment of appreciation and gratitude from @DatapolicyProf for these students that were critical to the success of #AIES2020.
She’s also talking about necessary labour we have to do to bring a broader community into our spaces and work pic.twitter.com/eIW16pnG3h
— Bianca Wylie (@biancawylie) February 8, 2020
.@ginasue calls on us to conduct more on-the-ground research about situated ai users, uses, appropriations, impacts, and beliefs #aies2020 #AAAI2020 as well as more dialogue between policymakers, communities, and technologists https://t.co/VY87giwh10 pic.twitter.com/dPWBe3MNIq
— meg young (@megyoung0) February 8, 2020
Introduced our friendly neighbourhood conversational alien, Zhorai, today at #eaai2020 @RealAAAI with @aigelicwings & Galit Lukin! Zhorai enjoyed NYC, but is ready to head home here https://t.co/XK2PhLYCZ2 & here https://t.co/cWC6W8XnHb #conversationalai #nlp #aieducation pic.twitter.com/TQtBrdPI51
— Jessica Van Brummelen @AAAI (@JessVanBrum) February 9, 2020
AI for everyone! Ben Shapiro and Abigail Zimmermann-Niefield on teaching AI and creating meaningful ML-interactive experiences @RealAAAI #aieducation #AAAI20
–> Note: this includes training a system to recognize wingardia leviosa #harrypotter pic.twitter.com/dE8peJtKT3— Jessica Van Brummelen @AAAI (@JessVanBrum) February 9, 2020
Caught end of David Silver’s talk at RL for games workshop. Packed house! People still inspired by learning games.
I know I was.
#AAAI2020 pic.twitter.com/Rv0XVPUf7E
— Nikolai Yakovenko (@ivan_bezdomny) February 8, 2020
David Silver is giving an interactive talk! The audience was asked to choose the content, how cool! @RealAAAI #AAAI20 pic.twitter.com/nxe2DFGmBg
— Bilal Kartal (@bll_krtl) February 8, 2020
Highest number of submissions and acceptances from China and USA. Most papers focus on ML, followed by Vision and NLP #AAAI20 pic.twitter.com/sWe16ZbLUF
— Ishtiaque Shams, PhD (@ishtiaque_shams) February 9, 2020
@yolandagil giving the presidential address at #AAAI20. Will AI write the scientific papers of the future? pic.twitter.com/MEKpjJyIuh
— AAAI (@RealAAAI) February 9, 2020
The #AAAI20 Student Outreach Workshop gives undergrads a chance to build #AI #robots and games on Feb 8 and present them to the public on Feb 9 at @AMNH. Join us for the show! #MachineLearning #DeepLearning https://t.co/ueHsWAQTYv
— AAAI (@RealAAAI) February 6, 2020
Try AI became a reality today! We had a wonderful first event at the @RealAAAI conference in NYC. A huge thanks to all of the panelists, judges, mentors, and students that participated in the event. You are all so inspiring! pic.twitter.com/LIqyVWTO3z
— Elizabeth Bondi (@BondiElizabeth) February 8, 2020
We're excited to announce the $1 million AAAI Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity! More information coming soon.https://t.co/uaFeyCqdDC pic.twitter.com/CrbzaJgwyG
— AAAI (@RealAAAI) February 9, 2020
Also, people ask the ConceptNet folks all the time about issues we ran into crowdsourcing common sense. OMG, recency bias/priming of various types. (again @ybisk's slide). #AAAI2020 pic.twitter.com/rzFy1AFwKe
— Catherine Havasi @ AAAI 2020 (@catherinehavasi) February 9, 2020
Fascinating Turing Award session! Hinton’s great-great-grandfather was George Boole, the father of symbolic reasoning. The story of #AI in a single family. #AAAI20 #AAAI2020 @rao2z @frossi_t @conitzer @yolandagil pic.twitter.com/p2najTvBDX
— Prof. Barry O'Sullivan, MRIA (@BarryOSullivan) February 10, 2020
]]>Conference tip. If you want to reduce the constant rattle of badges, just clip them like this:#AAAI20 pic.twitter.com/O6NOHhMKqB
— A Wojcicki AAAI20 (@pretendsmarts) February 9, 2020
The 34th AAAI Conference on Artificial Intelligence (AAAI-20), held in New York, started yesterday (Friday 7 February) and runs until Wednesday 12 February. Our Managing Editor, Lucy Smith, will be attending, covering the conference and meeting researchers.
The purpose of the AAAI conference is to promote research in artificial intelligence (AI) and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines. AAAI-20 will have a diverse technical track, student abstracts, poster sessions, invited speakers, tutorials, workshops, and an exhibit.
Check our blog over the course of the next week or two as we bring you updates from the conference. You can also follow AAAI directly on twitter at #AAAI20 and @RealAAAI.
Are you presenting work at AAAI this year? We’d love to hear from you, just email Lucy with a blog post, or let her know if you’d like to meet at the conference.
]]>The famous short, silent film L’arrivée d’un train en gare de La Ciotat, produced by Auguste and Louis Lumière in 1896, hit the news this week. AI developer Denis Shiryaev used a combination of Gigapixel AI and depth-aware video frame interpolation (DAIN) to “upscale” the film to 4k, 60 frames-per-second quality.
You can watch the upscaled video here:
Here is the original version for comparison:
There were two parts to creating this upscaled video. Firstly, the enhancement to 4k resolution. The algorithm used for this is based on neural networks and was trained with millions of photos. The training process helped to create a sophisticated network that learned the best way to enlarge, enhance, and create natural details.
In addition to the enhanced resolution, Shiryaev utilised DAIN to add frames per second. This video frame interpolation method was developed by Wenbo Bao and colleagues (Shanghai Jiao Tong University, University of California, Merced, and Google) and aims to synthesize new frames in between the original frames. In their arXiv article from April 2019 they propose a novel depth-aware video frame interpolation algorithm which explicitly detects occlusion (when one object in a 3D space is blocking another object from view) using depth information. They developed a depth-aware flow projection layer to synthesize intermediate flows that preferentially sample closer objects rather than those further away. Their algorithm also learns hierarchical features to gather contextual information from neighbouring pixels. The model then warps the input frames, depth maps, and contextual features within an adaptive warping layer. Finally, a frame synthesis network generates the output frame using residual learning.
Shiryaev has also added a colour version which was made using DeOldify. DeOldify was created by Jason Antic and employs Generative Adversarial Networks (GANs) to colorize black and white images.
By Jean Frederic Isingizwe Nturambirwe, Stellenbosch University and Umezuruike Linus Opara, Stellenbosch University
Modern farming has evolved by adopting technical advances such as machines for ploughing and harvesting, controlled irrigation, fertilisers, pesticides, crop breeding and genetics research. These have helped farmers to produce large crops of a good quality in a fairly predictable way.
But there’s still progress to be made in getting the best possible yields from different kinds of soils. And big losses still occur – especially during and after harvest – where monitoring and handling of produce isn’t done well. The industry needs smart and precise solutions and these are becoming available through new technology.
Smart farming aims to use modern technology to improve yield and product quality. One example is precision agriculture, a site specific crop management concept that uses a decision support system based on monitoring, measuring and responding to inter and intra-field variability in crops. This allows farmers to optimise their returns and preserve resources. Such monitoring solutions can be achieved by integrating electronic sensing devices that record data in soil, the environment or crops. The data can then provide useful information for decision-making, through a process called data analytics.
The goal is to make the best possible use of soil in a particular field, control crop care and make informed decisions about handling produce after harvest.
We’ve been involved in the development and use of sensors to help establish the quality of a wide range of horticultural products, including fruits. We used computer intelligence methods to detect defects and predict the quality of fruit.
Our latest research found that data-driven solutions have a number of benefits. For instance, they can help reduce the loss of fruit and vegetables along the supply chain from farm to being consumed.
Fruits and vegetables can be damaged before, during and after harvest as well as in storage. This is wasteful. Some decay and spoilage is caused by viruses, fungi, bacteria or microbial pathogens. Products that are tightly packed or bruised are more vulnerable to infections and don’t last as long.
According to the United Nations Food and Agriculture Organisation, around 14% of the world’s food is lost after harvest and before reaching shops and markets. And about one-third of the world’s food is lost or wasted. Minimising food loss and waste is critical to creating a Zero Hunger world where more than 821 million people are already suffering from hunger.
Our research involved reviewing the role that data analytics can play in the detection of defects in fruit and vegetables. We found that machine learning – the ability of computers to find patterns in data, make predictions and propose decisions without being explicitly programmed – far surpasses traditional methods for classifying produce.
Machine learning has made great achievements in detecting plant diseases and fruit. These could be extended to monitoring the quality of fruit and other foods. Sensors can be used to detect insects and diseases in fruits and vegetables, acting as electronic noses or tongues and measuring chemical composition. They can also measure physical properties, such as firmness and acidity, to determine product quality.
The products’ acceptability depends on colour, shape, size, sweetness, and not having defects such as bruises and insect infestations. This is important for customer satisfaction and for the returns that producers and suppliers make.
Sensing devices can supply data about these characteristics to computer algorithms for analysis. These new developments in machine learning allow for fast and effective quality determination and prediction in fresh produce.
For example, imaging techniques have been coupled with machine learning algorithms to detect bruises, cold injury and browning in fruit such as apples, pears and citrus, and to detect various defects in tomatoes. Smartphone-based applications are being developed for use in quality recognition for small berry fruits.
There’s a current global research trend aimed at integrating sensing devices along the food chain to continuously monitor and control the quality indicators. We reviewed this research and found stages where such solutions are used in the food chain. These include:
Sensors can be used to measure properties of fruit and vegetables while they are growing, such as colour, size and shape. Such information helps control the growth conditions, such as water supply, and accurately determines the best harvest date. This reduces losses at harvest. For example, some smallholder farmers in Germany have been using smartphones to check the quality of their crops by sending crop images to be processed by experts through machine learning models; feedback is then sent to the farmers. Companies are developing models to track environmental factors such as weather changes and predict how these factors affect crop yield. This kind of support is aimed particularly at farmers in developing countries.
In packhouses, products must be graded and sorted according to quality standards to determine their suitability for different consumer destinations. Export products need to keep well during long distance transport and on the shelf.
For local markets, where travel time is shorter, the quality requirements could be of a different standard. To determine whether a product is suitable for animal feed or human consumption, specialised sensors take measurements and generate data to classify, grade and sort the products into categories.
Sensors can even be integrated into packaging materials that continuously monitor and report on the status of the product in real time. These sensors can be enabled to communicate and send data to a centre of command. Monitoring, detecting and segregating food products like fresh fruit to classify and remove unsafe products to meet market demand is crucial to ensure profitability and maintain market share.
With the increasing world population, which is expected to exceed 9 billion by 2050, food and nutrition security is set to become an even bigger challenge, especially in sub-Saharan Africa. Data-driven automation can contribute to the solution.
Jean Frederic Isingizwe Nturambirwe, Postdoctoral research fellow at the Research Laboratory for Postharvest Technology / SARChI, Stellenbosch University and Umezuruike Linus Opara, Distinguished Professor and DST-NRF South African Chair in Postharvest Technology, Stellenbosch University
This article is republished from The Conversation under a Creative Commons license. Read the original article.
Read the authors’ full review article: Machine learning applications to non-destructive defect detection in horticultural products.
]]>What’s hot on arXiv? Here are the most tweeted papers that were uploaded onto arXiv during January 2020.
Results are powered by Arxiv Sanity Preserver.
Towards a Human-like Open-Domain Chatbot
Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le
Submitted to arXiv on: 27 January 2020
Abstract: We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is trained to minimize perplexity, an automatic metric that we compare against human judgement of multi-turn conversation quality. To capture this judgement, we propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of good conversation. Interestingly, our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher than the next highest scoring chatbot that we evaluated.
323 tweets
Backward Feature Correction: How Deep Learning Performs Deep Learning
Zeyuan Allen-Zhu, Yuanzhi Li
Submitted to arXiv on: 13 January 2020
Abstract: How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of hierarchical learning. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically simply by applying stochastic gradient descent (SGD). On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how very deep neural networks can still be sample and time efficient on certain hierarchical learning tasks, when NO KNOWN non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding) are efficient. We establish a new principle called “backward feature correction”, which we believe is the key to understand the hierarchical learning in multi-layer neural networks. On the technical side, we show for regression and even for binary classification, for every input dimension d>0, there is a concept class consisting of degree ω(1) multi-variate polynomials so that, using ω(1)-layer neural networks as learners, SGD can learn any target function from this class in poly(d) time using poly(d) samples to any 1/poly(d) error, through learning to represent it as a composition of ω(1) layers of quadratic functions. In contrast, we present lower bounds stating that several non-hierarchical learners, including any kernel methods, neural tangent kernels, must suffer from d^{ω(1)} sample or time complexity to learn functions in this concept class even to any d^{−0.01} error.
60 tweets
Learning Discrete Distributions by Dequantization
Emiel Hoogeboom, Taco S. Cohen, Jakub M. Tomczak
Submitted to arXiv on: 30 January 2020
Abstract: Media is generally stored digitally and is therefore discrete. Many successful deep distribution models in deep learning learn a density, i.e., the distribution of a continuous random variable. Naïve optimization on discrete data leads to arbitrarily high likelihoods, and instead, it has become standard practice to add noise to datapoints. In this paper, we present a general framework for dequantization that captures existing methods as a special case. We derive two new dequantization objectives: importance-weighted (iw) dequantization and Rényi dequantization. In addition, we introduce autoregressive dequantization (ARD) for more flexible dequantization distributions. Empirically we find that iw and Rényi dequantization considerably improve performance for uniform dequantization distributions. ARD achieves a negative log-likelihood of 3.06 bits per dimension on CIFAR10, which to the best of our knowledge is state-of-the-art among distribution models that do not require autoregressive inverses for sampling.
48 tweets
Advbox: a toolbox to generate adversarial examples that fool neural networks
Dou Goodman, Hao Xin, Wang Yang, Wu Yuesheng, Xiong Junfeng, Zhang Huan
Submitted to arXiv on: 13 January 2020
Abstract: In recent years, neural networks have been extensively deployed for computer vision tasks, particularly visual classification problems, where new algorithms reported to achieve or even surpass the human performance. Recent studies have shown that they are all vulnerable to the attack of adversarial examples. Small and often imperceptible perturbations to the input images are sufficient to fool the most powerful neural networks. \emph{Advbox} is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddle, PyTorch, Caffe2, MxNet, Keras, TensorFlow and it can benchmark the robustness of machine learning models. Compared to previous work, our platform supports black box attacks on Machine-Learning-as-a-service, as well as more attack scenarios, such as Face Recognition Attack, Stealth T-shirt, and Deepfake Face Detect. The code is licensed under the Apache 2.0 license and is openly available at https://github.com/advboxes/AdvBox.
43 tweets
Reformer: The Efficient Transformer
Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya
Submitted to arXiv on: 13 January 2020
Abstract: Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L^{2}) to O(LlogL), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.
40 tweets
Everybody’s Talkin’: Let Me Talk as You Want
Linsen Song, Wayne Wu, Chen Qian, Ran He, Chen Change Loy
Submitted to arXiv on: 15 January 2020
Abstract: We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating arbitrary source audio into arbitrary video output. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e., expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio. The geometry and pose parameters of the target human portrait are retained, therefore preserving the context of the original video footage. Finally, we introduce a novel video rendering network and a dynamic programming method to construct a temporally coherent and photo-realistic video. Extensive experiments demonstrate the superiority of our method over existing approaches. Our method is end-to-end learnable and robust to voice variations in the source audio.
33 tweets
Learning a distance function with a Siamese network to localize anomalies in videos
Bharathkumar Ramachandra, Michael J. Jones, Ranga Raju Vatsavai
Submitted to arXiv on: 24 January 2020
Abstract: This work introduces a new approach to localize anomalies in surveillance video. The main novelty is the idea of using a Siamese convolutional neural network (CNN) to learn a distance function between a pair of video patches (spatio-temporal regions of video). The learned distance function, which is not specific to the target video, is used to measure the distance between each video patch in the testing video and the video patches found in normal training video. If a testing video patch is not similar to any normal video patch then it must be anomalous. We compare our approach to previously published algorithms using 4 evaluation measures and 3 challenging target benchmark datasets. Experiments show that our approach either surpasses or performs comparably to current state-of-the-art methods.
31 tweets
SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On
Surgan Jandial, Ayush Chopra, Kumar Ayush, Mayur Hemani, Abhijeet Kumar, Balaji Krishnamurthy
Submitted to arXiv on: 17 January 2020
Abstract: Image-based virtual try-on for fashion has gained considerable attention recently. The task requires trying on a clothing item on a target model image. An efficient framework for this is composed of two stages: (1) warping (transforming) the try-on cloth to align with the pose and shape of the target model, and (2) a texture transfer module to seamlessly integrate the warped try-on cloth onto the target model image. Existing methods suffer from artifacts and distortions in their try-on output. In this work, we present SieveNet, a framework for robust image-based virtual try-on. Firstly, we introduce a multi-stage coarse-to-fine warping network to better model fine-grained intricacies (while transforming the try-on cloth) and train it with a novel perceptual geometric matching loss. Next, we introduce a try-on cloth conditioned segmentation mask prior to improve the texture transfer network. Finally, we also introduce a dueling triplet loss strategy for training the texture translation network which further improves the quality of the generated try-on results. We present extensive qualitative and quantitative evaluations of each component of the proposed pipeline and show significant performance improvements against the current state-of-the-art method.
28 tweets
Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial Sampling
Bas van Opheusden, Luigi Acerbi, Wei Ji Ma
Submitted to arXiv on: 12 January 2020
Abstract: The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing severe biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.
25 tweets