The sixth AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) took place in Montreal, Canada, from 8-10 August 2023. The three-day event included keynote talks, contributed talks and poster sessions. There were also two panel discussions. We summarise some of the key points from participants of the first of these: “Large Language Models: Hype, Hope, and Harms”. The session was moderated by Alex John London (Carnegie Mellon University), and the panellists were: Roxana Daneshjou (Stanford), Atoosa Kasirzadeh (University of Edinburgh), Kate Larson (University of Waterloo) and Gary Marchant (Arizona State University).
The panellists began by talking about some of their hopes for large languages models. They also outlined some of the concerns around the hype and harms that surround these models.
Kate started by saying that she believes that this technology opens up exciting research questions. In the past, there were potential technologies that were put on hold because we didn’t have language capabilities. Now that we have large language models (LLMs) with good capabilities, this could have implications for system design, and opens up many possibilities. However, in terms of hype, there is a constant over-promising when it comes to AI systems. In research, the LLM hype has had an impact on funding for other areas of machine learning research. The other concern she has is that many young researchers see that the LLM resources in academia are nowhere near the resources that large companies have, and this may dissuade them from pursuing a career in academia. To tackle the harms from these models (and other AI systems), she favours a larger collaborative effort with people across society.
Roxana is both a physician and an AI researcher and she spoke from a healthcare applications standpoint. She is concerned about the narrative surrounding the reports of the performances of LLMs in exams. Some reports in the media have suggested that because a LLM passed a medical exam, such a model could work as a medical practitioner. She is shocked about how quickly things are moving with respect to LLMs in healthcare – you can’t “move fast and break things” in medicine, because the “things” are people’s lives. Usually getting a new treatment, or system, approved in healthcare is a long process, and for very good reason. However, many in the profession have already adopted LLMs with no approval procedure. For example, GPT-4 has already been plugged in to the patient database in some areas and is being used to create responses to patients. There is no framework to vet this, there is a lack of physician training, and the patients have not been informed about its use. However, she is hopeful that the technology can assist in some areas of the profession. Medical professionals are over-worked and much of that is due to administrative tasks. If a framework could be built to ease that burden, it would be very helpful.
Gary comes from a law background and has seen a range of levels of adoption in law settings, with some people already fully embracing the technology, and others not using it at all. Gary raised the issue of billing. Traditionally in law, clients are billed per hour (the merits of which are debatable). He posed the question about what would happen if you used a tool (such as ChatGPT) to save time. If, for example, it usually took 10 hours to manually write a document, but with ChatGPT help that was reduced to one hour, how much do you charge the client? Gary raised three concerns. The first is the tendency for ChatGPT to output completely false citations. The second is deep fakes which could cause major harm. Courts are increasingly using video evidence, and Gary knows of at least two cases last year where the video evidence presented was fake. The third concern is around teaching and exams, with these new models making assessment very difficult for teachers and professors.
The panel were asked about the discourse around long-term vs short-term risks. They all agreed that there are real harms and immediate risks which are already taking place now, and that the conversation about these is being drowned out by the existential risk narrative.
Atoosa is worried at the lack of conversation between the long-term and short-term risks camps, with real in-depth discussion needed rather than social media spats. She would like to see the people who talk about existential threats try to understand the everyday harms that are already happening. Her main concern is that the existential threat discourse surrounding AGI (artificial general intelligence) enters the mainstream, and governments and policy makers take decisions based on this position.
Kate and Roxana both questioned what exactly people mean when they use terms like AGI or superintelligence; these terms are often used in hyping the capability of systems and it would be helpful if the hype-mongers actually defined what they meant by these terms. It would be quite a number of leaps from the systems we have now to move towards anything representing AGI or superintelligence. For example, in medicine, this would have to be a system that could fulfil all of the competencies of a physician. Then imagine scaling that up to cover all professions, and all tasks that humans carry out.
The panel moved on to talk about the speed of deployment of LLM systems. Roxana proposed that when considering if and how something should be deployed we should look at the potential for harm in that deployment. High-risk settings need a regulatory framework in place before release. Gary was of the opinion that we should proceed with voluntary guidelines rather than regulation, his reasoning being that government moves too slowly to keep up with technology.
Kate and Atoosa considered the effect on assessment in schools and universities, with Atoosa noting that the new tools are forcing us to analyse our processes and redefine how we assess students. Kate joked that she wished the latest LLMs weren’t released so quickly, as the speed of deployment didn’t give her time to change her assessment process.
What is certain is that the debate surrounding LLMs, their workings, development, risks, potential, deployment, and regulation, is sure to continue apace.