The AIhub coffee corner captures the musings of AI experts over a 30-minute conversation. With the recent publication on arXiv of the article On the Opportunities and Risks of Foundation Models, this month we discuss the concept of such models.
Joining the discussion this time are: Sanmay Das (George Mason University), Tom Dietterich (Oregon State University), Stephen Hanson (Rutgers University), Sabine Hauert (University of Bristol), Holger Hoos (Leiden University) and Carles Sierra (Artificial Intelligence Research Institute of the Spanish National Research Council).
Sabine Hauert: This month we’ll be discussing this paper on the opportunities and risks of foundation models. Could somebody give an explanation of what these models are and some of the issues?
Tom Dietterich: So these foundation models, like BERT and GPT-3 are deep neural networks trained in an unsupervised fashion on huge collections of data. You’ve probably seen things like the automated writing of short paragraphs based on a prompt of some kind, and, of course, the learned representations are very useful for a big variety of tasks. The argument of this paper is that these are really changing the way people are building applications and systems using machine learning. Instead of collecting a dataset from specific tasks, where you have the inputs and the target outputs, you just build on top of these systems. You can do things like design prompts that then elicit the outputs. You can also use these models to label data for targeted machine learning applications. So, there are a bunch of different ways to use these models. In my view, the paper is very interesting. Not only have the team at Stanford written this huge paper, they’ve created a centre for the study of foundation models.
I think that most of the controversy on Twitter was about the term foundation models. It caught in people’s throats a lot because, as I tweeted, if these are our foundations then they are foundations of sand. But, they are giving us a glimpse of what we could do if we had a broad, general purpose commonsense knowledge base. These large models do not provide reliable common sense, but they “fake it” statistically. They provide a statistical surrogate for such a knowledge base.
Stephen Hanson: Another connection has to do with what DeepMind keep saying they’re going to do with artificial general intelligence. So, Sutton and some DeepMind people wrote a paper in the journal Artificial Intelligence recently just saying that reward is enough. Their point being that if you have an agent and reinforcement learning with some general kinds of datasets, the idea is that you’re building a system that has this huge scope and, as Tom was saying, it therefore can be used and generalized and transferred across many kinds of situations. But this is kind of a holy grail that goes back to the seventies as well, where you had people building artificial general intelligence and that didn’t turn out too well. So, there is a hype factor here that is GPT-3, and of course our journalist friends would really care if GPT-3 was elected governor of California – that might be interesting to them, but, otherwise, it’s very hard to tell how to test GPT-3. It does what psychologists call a Rorschach test, where you say “that’s pretty neat what it just said but how is that a test of its general abilities to solve problems?” I think there’s a great danger in these very large deep-learning networks, what I call deep-learning blobs now, that basically swallow books and then try to do something. You can’t really evaluate them very easily. By the way, this is getting out in product development and many large companies are trying to figure out how they can use GPT-3 or BERT or something else to basically build upon and sell to somebody. So, this stuff is beyond speculation, it’s going out into the product domain.
Holger Hoos: And that is one of the senses in which they become foundations, right? And that’s also one of the senses in which I agree at least with the wording of Tom’s tweet that if this is the foundation then you are building on sand.
Tom: A lot of sand!
Holger: Yes, exactly, a lot of sand, which doesn’t make it better actually. What I find very interesting, if you dissect the hype swirling around this, and the legitimate excitement as well (it’s not all hype), and you intersect this with the other topic that, at least in Europe, is big now and will get bigger in the fall: the attempt at regulating AI. And, by the way, China just made an interesting move there in terms of their proposed rules for recommender systems. So if you intersect those two, you actually get to the point where you ask yourself, how safe and desirable a product can you build on a thing that’s little understood? That’s a pretty interesting question that will keep us going for a bit.
Tom: If you read the section in the paper where they talk about the term foundation model, they themselves say it’s not a perfect term, but they give a pretty good rationale for why they chose it. And certainly, particularly in the natural language community, there’s a whole sub area that’s known as BERTology, which is basically studying the BERT model and trying to understand its strengths and weaknesses. I think that controlling, censoring and steering these models are all very active research questions right now.
Incidentally, someone has been going through the Stanford Encyclopaedia for Philosophy and finding incidental haiku. They find a series of phrases that form a haiku, which they then feed to one of these models and have it generate a picture that illustrates that haiku. It’s amazing.
Carles Sierra: I think we have to be very cautious in using these systems as the foundation of engineering systems. In engineering systems you need a certain predictability, you need to build upon requirements and satisfy those requirements and I think these emerging properties are still not well understood, at least not well enough to be able to build something on top of it.
Sabine: What would be a good foundation? What method would deserve the foundation title? Holger was mentioning regulation – maybe that is the foundation and we build from there?
Tom: Well, you need a large knowledge base, and maybe these models point the way, but I think they’re representationally impoverished. They don’t really have objects or agents. They may have them implicitly, but I suspect that we will want to acquire a huge commonsense knowledge base from a large corpus of human behaviour. In some sense they point the way but, right now, we’re doing what’s easy rather than what would be the correct thing scientifically.
Holger: There is also a deeper question here, which is very much connected to how desirable an AGI actually is. Rather than saying AGI, one should say “human-mimicking”. One could argue, in fact I most vocally would, that this is not the AI we neither want nor need. Because, we know perfectly well how to produce human intelligence [biologically]. What we really need is things that help us compensate for our own weaknesses, to help us recognise and overcome our own biases, the kinds of things that people don’t do well. It’s not clear to me at all that models like this, or their equivalents, are helping us much to advance towards that goal.
Sanmay Das: This is more related to the point about regulation. I think that’s one thing that’s interesting in this whole debate – the regulation part is going to become very important. There was an article recently about how Facebook has done these user surveys and they’ve decided that they’re going to unpromote political content, or something along those lines. Think about what’s actually going on behind the hood over there. Again, I’m speculating, but I imagine that they have a whole bunch of deep learning models that are probably based on foundations that are similar to this, which they then use to classify whether something is political or not, and then they’re going to decide how they alter people’s news feeds in that manner. Machine learning is just not very good at these kinds of things yet, because it has difficulty with things like sarcasm, or the strategic use of political text, and so on. This idea that we’re going to take all of these models and decide what’s political text or not, then mess around with what people are going to see kind of scares me. Of course, that’s already happening. What you see on Facebook is driven by engagement. (I’m not singling out Facebook, it was just what was coming up in the news.) A lot is being done based on these kinds of models already, and some of this is potentially quite worrying.
Steve: It could just fail and people would find the failure extremely expensive and they would stop using it.
Sanmay: I guess the point is, would we know that it had failed? Maybe in the long term…
You can find all of our previous coffee corner discussions here.