ΑΙhub.org
 

“Open” alternatives to ChatGPT are on the rise, but how open is AI really?


by
25 July 2023



share this:

Rusty padlock on a green gate
OpenAI’s ChatGPT seems ubiquitous, but open source versions of instruction-tuned text generators are gaining the upper hand. In just 6 months, at least 15 serious alternatives have emerged, all of which have at least one important advantage over ChatGPT: they are a lot more transparent. Insight into training data and algorithms is key for responsible use of generative AI, a team of linguists and language technology researchers at Radboud University claim.

The researchers have mapped this rapidly evolving landscape in a paper and a live-updated website. This shows there are many working alternative “open source” text generators, but also that openness comes in degrees and that many models inherit legal restrictions. They sound a note of cautious optimism. Lead researcher Andreas Liesenfeld: “It’s good to see so many open alternatives emerging. ChatGPT is so popular that it is easy to forget that we don’t know anything about the training data or other tricks being played behind the scenes. This is a liability for anyone who wants to better understand such models or build applications on them. Open alternatives enable critical and fundamental research.”

More and more open

Corporations like OpenAI sometimes claim that AI must be kept under wraps because openness may bring “existential risks”, but the researchers are not impressed. Senior researcher Mark Dingemanse: “Keeping everything closed has allowed OpenAI to hide exploitative labour practices. And talk of so-called existential risk distracts from real and current harms like confabulation, biased output and tidal waves of spam content.” Openness, the researchers argue, makes it easier to hold companies responsible and accountable for the models they make, the data that goes into them (often copyrighted), and the texts that come out of them. 

The research shows that models vary in how open they are: many only share the language model, others also provide insight into the training data, and quite a few are extensively documented. Mark Dingemanse: “In its present form, ChatGPT is unfit for responsible use in research and teaching. It can regurgitate words but has no notion of meaning, authorship, or proper attribution. And that it’s free just means we’re providing OpenAI with free labour and access to our collective intelligence. With open models, at least we can take a look under the hood and make mindful decisions about technology.”

Some additional points

  • New models appear every month, so the paper is mainly a call for action to track their openness and transparency in a systematic way. An accompanying website makes this possible.
  • Many models borrow elements from one another, which can lead to murky legal situations. For instance, the popular Falcon 40B-instruct model builds on a dataset (Baize) meant strictly for research purposes, but still the Falcon makers encourage commercial uses.
  • A key reason ChatGPT feels so fluid is the human labour that goes into the instruction-tuning step (RLHF), in which model output is trimmed and pruned to make it sound more docile and conversational. Open models enable research into what makes people so susceptible to the suggestion of true interactivity.

The researchers presented their findings at the international conference on Conversational User Interfaces in Eindhoven, July 19-21.

Find out more

Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators, Andreas Liesenfeld, Alianda Lopez, and Mark Dingemanse. In ACM Conference on Conversational User Interfaces (2023).

Arxiv version

The website




Radboud University




            AIhub is supported by:


Related posts :



Optimizing LLM test-time compute involves solving a meta-RL problem

  20 Jan 2025
By altering the LLM training objective, we can reuse existing data along with more test-time compute to train models to do better.

Generating a biomedical knowledge graph question answering dataset

  17 Jan 2025
Introducing PrimeKGQA - a scalable approach to dataset generation, harnessing the power of large language models.

The Machine Ethics podcast: 2024 in review with Karin Rudolph and Ben Byford

Karin Rudolph and Ben Byford talk about 2024 touching on the EU AI Act, agent-based AI and advertising, AI search and access to information, conflicting goals of many AI agents, and much more.

Playbook released with guidance on creating images of AI

  15 Jan 2025
Archival Images of AI project enables the creation of meaningful and compelling images of AI.

The Good Robot podcast: Lithium extraction in the Atacama with Sebastián Lehuedé

  13 Jan 2025
Eleanor and Kerry chat to Sebastián Lehuedé about data activism, the effects of lithium extraction, and the importance of reflexive research ethics.

Interview with Erica Kimei: Using ML for studying greenhouse gas emissions from livestock

  10 Jan 2025
Find out about work that brings together agriculture, environmental science, and advanced data analytics.

TELL: Explaining neural networks using logic

  09 Jan 2025
Alessio and colleagues have developed a neural network that can be directly transformed into logic.




AIhub is supported by:






©2024 - Association for the Understanding of Artificial Intelligence


 












©2021 - ROBOTS Association