The AIhub coffee corner captures the musings of AI experts over a short conversation. The recent launches of two large language models, ChatGPT and Galactica, have led to much interest and controversy amongst the AI community, and beyond. These models, and in particular their potential use for writing scientific articles (and essays), provided the inspiration for this month’s discussion.
Joining the discussion this time are: Sabine Hauert (University of Bristol), Sarit Kraus (Bar-Ilan University), Michael Littman (Brown University), and Lucy Smith (AIhub).
Sabine Hauert: Has anyone had a chance to use any of these new models yet?
Sarit Kraus: During the summer I played with the previous version of GPT. Have you tried the latest version, Michael?
Michael Littman: Last night I logged in and gave it a couple of queries, but it kept dying on me, so I didn’t get to put it through its paces. I have read a whole bunch of examples that people have posted, and the Twitter examples I’ve seen are kind of mind-blowing. However, I think it’s really important that you don’t take all those at face value. Just because somebody found an example that is breathtaking doesn’t mean that it always behaves that way. You very much have to take these things with a grain of salt. As I couldn’t put it through its paces, it’s very difficult for me to tell how powerful these things really are. One of the things that’s super relevant to me, is that people are saying “oh, the reinforcement learning problem is solved by this too”, and, well, that would undercut all my research. If it’s true, that would be amazing, but I don’t think that’s what’s happening here. I think people are doing some cool examples and then over-generalizing from a small number of examples…like large language models do.
Sabine: Yeah, there was an article in the New York Times discussing whether skilled work is being replaced now, because ChatGPT can come up with really clever things to say on quite sophisticated topics.
Sarit: On quite specific topics let’s say, or on a specific query that it’s been asked.
Sabine: Yes. And in that article, there was a paragraph that was written by ChatGPT which is only revealed later in the article. So, I guess with the right queries you can get useful content out of it. Now, is it useful for scientific papers? In principle, we’re meant to innovate, to do things that haven’t been done before, so I don’t quite understand how it can generate more than what’s in the literature. It could probably write a paper based on things that you give it, and I think it would be useful if we had a tool to help us write our papers, but in terms of generating a paper, or scientific content based on what it sees online, I’m not sure. I guess it depends whether it’s more about integration or if it’s about real innovation that goes a step further forward.
Sarit: I spoke to a researcher who works at a start-up company. English is not his native language, although he speaks it quite well. He told me that now, when he wants to write a document, he goes to one of the GPT models and he inputs the main points for the document, and then he lets GPT write the document for him. He is very happy with this option. I tried it, as English also isn’t my native language, so I wanted to see how it could improve my writing. It didn’t work for me. However, a good English editor does improve my writing, no question about it. It was the same for Hebrew, where I’m not an expert in writing, and I was expecting these models could make improvements. However, they weren’t better than software like Grammarly.
Sabine: I like the idea of improving. One of my students has a startup company and apparently you can use ChatGPT as a business advisor to criticize your startup pitch. So he gave it this startup pitch and asked it to criticize it. He said it was very critical, and actually quite helpful. So, I guess it could be useful to improve yourself and have a critique machine, instead of your peers.
Michael: That can certainly work. Do people know the rubber ducky methodology for debugging code? So, one of the things that people advise is that when you’re working on coding you keep a little rubber ducky next to the computer and whenever you get stuck, you explain to the rubber ducky what’s going wrong, and it’s remarkable how often that will trigger something in somebody’s head. Just the act of trying to put it into words, even for an inanimate object, can be very valuable. So, of course something like this could be valuable even if it’s totally BSing you, it can trigger your own recognition of what the issues are.
The biggest problem for me is that these language models are just masterful improv artists. They can take any topic and just kind of riff on it. But they don’t actually understand science or human relationships, they don’t understand anything. They’re just really, really good at making things sound normal. And, I don’t know, sometimes that’s all you need. But certainly, in science that is not all you need, and something that’s just trying to string together words to sound like science is not what we want. We want an actual causal understanding of the topic. These models aren’t trained for that, they’re not optimized for that, there’s nothing in them that would cause them to be that way, other than to the degree that it’s helpful in making things sound reasonable.
Sabine: What are we going to do with all of our students who hand in essays, and it’s all generated by GPT? And they’re all different because there’s some randomness in this process. What are we going to do?
Sarit: It just means that you don’t need to ask them to do that project. You need to find something else.
Sabine: Something creative, something that really engages them, rather than just producing content. I don’t know what we’re going to do. I think we are going to have GPT mark them, and then we’re good.
Sarit: That would be better. And then the GPT model would say, “I’ve written this”.
Sabine: Maybe we could ask the model if it’s written it, that would be brilliant.
Michael: But it doesn’t know, it would just make something up, like “oh well I did, because of these pronouns or these subordinate clauses”. But then it wouldn’t even necessarily be true, it would just sound true.
Sabine: So, everything needs to be hands-on then, or live discussions.
Michael: I like what you suggested, which is that we could just have the machines write the essays, the machines grade the essays, and we don’t have to be involved at all. Education can just be outsourced to the machines. The machines will teach the machines, the machines will be happy, people won’t have to deal with school anymore…
Sabine: It’s so interesting. I wonder what the implications are for education.
Michael: There’s been a lot of essays recently about how college essays are going to have to go away, because it’s just too easy. So much of that is about making an argument sound reasonable and these programs can do it as well or better than people.
Sabine: How about scientific literature. Lucy, do you think we should be using this to write articles?
Lucy Smith: No, definitely not. This also has implications for the scientific publishing industry, which already receives automatically generated papers, so this could take it to a whole new level. I can see paper mills farming papers and them getting sent off left, right and centre to different publishers, at just a click of a button.
Sabine: You worked in publishing before joining AIhub. Do you think they’ll be worried about this?
Lucy: Yes, I think so. Another issue to think about is plagiarism. There are plagiarism checkers that they use, but you can only use them if the work you are checking has already been published. If the model has been trained on scientific articles, would it reproduce some of them, or parts of them? That would potentially lead to copyright infringement and plagiarism issues.
Sabine: I guess if you are building on other papers, how do you make sure that it’s quoting it correctly? Can it add references appropriately?
Michael: I’ve seen it make up references that sound great, but that don’t actually exist. It has this huge mishmash in its neural connections, which includes factual information and structural information and how to combine them in ways that are common. So, it’s really great at putting these pieces together.
Sarit: On a related note, if you ask GPT “where was IJCAI in 1992 and who was the program chair?”, it will give you reasonable answers, but in 1992 there was no IJCAI. You will get a reasonable answer, and you could imagine that that person was the program chair, but you won’t know that it didn’t really happen that year. IJCAI took place in 1991 and 1993, but not in 1992.
Sabine: That’s terrifying in terms of the facts, and the fact that they can invent scientific references to make things sound good.
Michael: I gave it the title of the book that I wrote and asked it to write a blurb for the back cover, and it gushed about my book, which was great, I felt very good about that. But I didn’t tell it who I was, and it made up an author for the book. I looked up who that person was, and it was a person who had written a book on a related topic, an engineer at Google, who would have been a perfectly reasonable author for the book. But it just asserts it, there’s not even a pause, like “this might have been written by so and so”. No, “this wonderful book by so and so does this, that and the other thing”.
Sabine: To a certain extent it’s somewhat reassuring, in that it’s not all-knowing. It’s just very good at capturing how we explain things and presenting it in a convincing way.
Michael: This is why I like to think of it as improv, because it doesn’t have to be factually grounded, and it’s not trained to be factually grounded, it’s just trained to flow nicely. It does that remarkably well.
Sabine: So maybe these models will be helpful for condensing text, maybe shaping ideas that we have, to make them easier to understand,
Sarit: Some tasks it does very well. For example, translating sentences from passive to active voice. For sure, it doesn’t know anything, but it writes beautifully.
Sabine: So that will be my task for Christmas. I’ll have it write my Christmas cards and send them to all my contacts.