The AIhub coffee corner captures the musings of AI experts over a short conversation. This month, our trustees tackle the topic of conference reviewing. Joining the conversation are: Sanmay Das (George Mason University), Sarit Kraus (Bar-Ilan University), Michael Littman (Brown University) and Carles Sierra (CSIC).
Lucy Smith: Our topic this month is the conference reviewing and publication process. It would be good to discuss some of the issues and then consider some possible improvements.
Sarit Kraus: Well, where do we start…?!
Carles Sierra: I mean, there are so many issues. There are examples of papers that have been published in which nobody checked the final version. In some cases, the authors introduced elements from a large language model, trying to refine the introduction or improve it, which was unacceptable and inappropriate, but no human looked into that text. I think if we want to guarantee that the final versions are OK, we need to do more than just look at the format of the paper. However, this all adds extra work and extra cost.
Sarit: Sometimes, I’m not sure that the reviewers even read the papers. With some reviewers, you can see from the reviews that they read the introduction, they read the conclusions and maybe scanned the rest of the paper. And the quality of some of the reviews are really low. I still remember ECAI, around 1994; I was a Program Committee member and we all flew to Amsterdam and we sat around the table and we discussed the papers. Today, there are 5,000 – 10,000 papers, there are 10,000 reviewers who no one knows, the process is really random in some sense. People keep resubmitting their papers until they get the correct set of reviewers. So, this is one problem.
Another problem is the problem of cheating, which is not just in AI, but in science in general. I’m talking about issues such as letting LLMs write the papers, cheating on the results, collusion of reviewers, etc. A few weeks ago, someone talked to me about the idea that there will be some organization that investigates cheating. He told me that they are looking into some code of conduct. If a conference suspects somebody of cheating, they will apply this code, there will be an investigation, and perhaps some punishment (such as banning them from the conference, or announcing the misconduct). This will maybe help in reducing the number of misconduct cases. I think this is an interesting idea, but in order to do this people need to be able to share information between conferences. However, at the moment we are not allowed to do so because of privacy.
Carles: One of the things going in that direction is an initiative to coordinate the main conferences, which will be called AI SoC. So, conferences such as AAAI, IJCAI, ECAI, PRICAI, and possibly ICML and NeurIPS will form a committee to share practices on reviewing, etc. So, at least there is an initiative to start thinking about it. I don’t know what the conclusions will be because there are many problems, and there is probably no solution for all of them. I mean, how can you prove that a review has been written by an LLM, for example – it’s very complicated.
Sanmay Das: Reviews written by LLMs, about papers written by LLMs…
Michael Littman: We’re going to finally get all that extra time back so we can do other things. There has been an analysis made, and it basically discovered that there are particular words that LLMs like to use that have now sky-rocketed in probability in reviews. And so, as you point out, it’s really hard to point to a specific review and say this review is not written by a human being. Overall, these reviews are definitely being influenced by language models.
Sanmay: All these problems are huge, right. But I think it comes down to incentives. I was comparing to other fields, and how they handle it. Firstly, in some of the other fields, the volume of papers is somewhat lower. I’m also not sure where computer science should sit in that interval between biotech, which is obviously extremely fast and competitive, and the social sciences, where everything is much slower and more meticulous. One question is, unless you’re getting paid for it, why would people review? The one thing that I think used to be the case is reputation – “if I do a bad review and the associate editor or the senior program committee member sees that I did a bad review, they’re going to have a bad impression of me, and that could affect my career”. I feel like that has disappeared. Not only has that disappeared more generally, but it’s also disappeared because we’ve explicitly made it disappear by making reviewers anonymous, even to the senior program committee member or the area chair. Half the time when I’m an area chair, all I see is “reviewer number 25567 wrote the following”. Therefore, I have no idea how much to trust this person’s judgment. I think the system kind of falls down because of that.
Sarit: At ECAI, after you submit the review, you see the names of the other reviewers. There was a case at IJCAI that led to the reviewing system being changed. Previously, the reviewers were known to each other, but it was reported that one senior reviewer pressured a more junior reviewer to improve the grades on the paper. Since then, the names of the reviewers have not been revealed to each other.
Carles: In 2017, I was responsible for IJCAI. I followed the IJCAI tradition of not showing the names of the reviewers to one another, but what we added to the system was that each reviewer would come back and judge the other reviews on the same paper. So, if your work as a reviewer was bad, the other reviewers will point out that it was bad, your reputation will go down and your opinion will be considered much less than the others in the final numerical aggregation. That gave me a different ranking from the traditional ranking with self-confidence, and in many cases, allowed me to save papers because some of the reviews were considered bad by fellow reviewers. This is a way of scaling things up because, as a program chair, you cannot look at the quality of all the thousands of reviews; instead, you use the intelligence of the community for that. And I think mechanisms like this are those that we could probably put in place. If we do that across conferences, looking at the reputation of the reviewers across conferences could be a good incentive.
Sanmay: But will it make people less likely to agree to review, which has been the other half of the problem?
Carles: Yes, that’s the argument against doing that. We need to find a mechanism that is based on reputation. Probably not publishing the names, but giving awards to those who are at the top of the reputation list. For example, the best 100 reviewers of the year will receive an award from the joint conferences on AI that they could put on their CVs. So, we need to have a mechanism like that so that people will compete in order to make better and better reviews. I think this is the only way.
Sanmay: The other thing that could help is if there’s some form of monetary incentive. But that is very hard to scale. I mean, other disciplines do it. If I look at editors of journals in political science or economics, they get a month of salary.
Carles: How about the best 20 reviewers, for example, get their conference registration waived? That would be an excellent incentive for people to do good reviews.
Sanmay: In addition, we obviously want to do something in terms of punishing cheaters.
Carles: Yes, one option is to not invite them anymore. I passed my list of reputations to IJCAI 2018, and, below a certain level, they didn’t invite them for the next year.
Sarit: I’m not sure if they were too sad about that! It’s really time consuming to write a review. I recently spent a day writing a review for ECAI.
Carles: On the other hand, for young researchers, to be invited to be a reviewer of a large conference, they consider that as a recognition. Some young researchers will add to their CV that they reviewed for a particular conference (or journal).
Lucy: Do any conferences have specific training for new reviewers, or any kind of guidance for them on how to produce a good review?
Michael: I think there are other areas of computer science that do that, and actually have a kind of an onboarding for newer reviewers. As far as I know, not in our world.
Sarit: One thing I will say is that deadlines really improve research. I mean deadlines give everybody motivation. That’s the only advantage of our conference system.
Carles: On the other hand, the negative aspect of it is that when you have a deadline for the reviews, people tend to review on the last day, and they produce very bad quality reviews. So, an argument against our model would be that we should publish in a journal first and then go to the conference to present the results.
Sanmay: There’s been a lot of experimentation with these models in different non-AI subfields. For example, SIGGRAPH has models for reviewer continuity between the conference and the ACM Transactions on Graphics (TOG), and SIGGRAPH papers automatically appear in TOG, while TOG papers can be presented at SIGGRAPH.
Carles: We’re doing a bit of it in AI, because IJCAI accepts presentations of papers published in the AI Journal or JAIR, and similarly with AAMAS and the JAAMAS journal. So, we have a bit of that already. Maybe that could be increased to a larger portion of the conference.