ΑΙhub.org
 

Ghostbuster: detecting text ghostwritten by large language models


by
02 January 2024



share this:

The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text.

By Vivek Verma, Eve Fleisig, Nicholas Tomlin, and Dan Klein

Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to ghostwrite assignments, leading some schools to ban ChatGPT. In addition, these models are also prone to producing text with factual errors, so wary readers may want to know if generative AI tools have been used to ghostwrite news articles or other sources before trusting them.

What can teachers and consumers do? Existing tools to detect AI-generated text sometimes do poorly on data that differs from what they were trained on. In addition, if these models falsely classify real human writing as AI-generated, they can jeopardize students whose genuine work is called into question.

Our recent paper introduces Ghostbuster, a state-of-the-art method for detecting AI-generated text. Ghostbuster works by finding the probability of generating each token in a document under several weaker language models, then combining functions based on these probabilities as input to a final classifier. Ghostbuster doesn’t need to know what model was used to generate a document, nor the probability of generating the document under that specific model. This property makes Ghostbuster particularly useful for detecting text potentially generated by an unknown model or a black-box model, such as the popular commercial models ChatGPT and Claude, for which probabilities aren’t available. We’re particularly interested in ensuring that Ghostbuster generalizes well, so we evaluated across a range of ways that text could be generated, including different domains (using newly collected datasets of essays, news, and stories), language models, or prompts.

Examples of human-authored and AI-generated text from our datasets.

Why this Approach?

Many current AI-generated text detection systems are brittle to classifying different types of text (e.g., different writing styles, or different text generation models or prompts). Simpler models that use perplexity alone typically can’t capture more complex features and do especially poorly on new writing domains. In fact, we found that a perplexity-only baseline was worse than random on some domains, including non-native English speaker data. Meanwhile, classifiers based on large language models like RoBERTa easily capture complex features, but overfit to the training data and generalize poorly: we found that a RoBERTa baseline had catastrophic worst-case generalization performance, sometimes even worse than a perplexity-only baseline. Zero-shot methods that classify text without training on labeled data, by calculating the probability that the text was generated by a specific model, also tend to do poorly when a different model was actually used to generate the text.

How Ghostbuster Works

Ghostbuster uses a three-stage training process: computing probabilities, selecting features, and classifier training.

Computing probabilities: We converted each document into a series of vectors by computing the probability of generating each word in the document under a series of weaker language models (a unigram model, a trigram model, and two non-instruction-tuned GPT-3 models, ada and davinci).

Selecting features: We used a structured search procedure to select features, which works by (1) defining a set of vector and scalar operations that combine the probabilities, and (2) searching for useful combinations of these operations using forward feature selection, repeatedly adding the best remaining feature.

Classifier training: We trained a linear classifier on the best probability-based features and some additional manually-selected features.

Results

When trained and tested on the same domain, Ghostbuster achieved 99.0 F1 across all three datasets, outperforming GPTZero by a margin of 5.9 F1 and DetectGPT by 41.6 F1. Out of domain, Ghostbuster achieved 97.0 F1 averaged across all conditions, outperforming DetectGPT by 39.6 F1 and GPTZero by 7.5 F1. Our RoBERTa baseline achieved 98.1 F1 when evaluated in-domain on all datasets, but its generalization performance was inconsistent. Ghostbuster outperformed the RoBERTa baseline on all domains except creative writing out-of-domain, and had much better out-of-domain performance than RoBERTa on average (13.8 F1 margin).

Results on Ghostbuster’s in-domain and out-of-domain performance.

To ensure that Ghostbuster is robust to the range of ways that a user might prompt a model, such as requesting different writing styles or reading levels, we evaluated Ghostbuster’s robustness to several prompt variants. Ghostbuster outperformed all other tested approaches on these prompt variants with 99.5 F1. To test generalization across models, we evaluated performance on text generated by Claude, where Ghostbuster also outperformed all other tested approaches with 92.2 F1.

AI-generated text detectors have been fooled by lightly editing the generated text. We examined Ghostbuster’s robustness to edits, such as swapping sentences or paragraphs, reordering characters, or replacing words with synonyms. Most changes at the sentence or paragraph level didn’t significantly affect performance, though performance decreased smoothly if the text was edited through repeated paraphrasing, using commercial detection evaders such as Undetectable AI, or making numerous word- or character-level changes. Performance was also best on longer documents.

Since AI-generated text detectors may misclassify non-native English speakers’ text as AI-generated, we evaluated Ghostbuster’s performance on non-native English speakers’ writing. All tested models had over 95% accuracy on two of three tested datasets, but did worse on the third set of shorter essays. However, document length may be the main factor here, since Ghostbuster does nearly as well on these documents (74.7 F1) as it does on other out-of-domain documents of similar length (75.6 to 93.1 F1).

Users who wish to apply Ghostbuster to real-world cases of potential off-limits usage of text generation (e.g., ChatGPT-written student essays) should note that errors are more likely for shorter text, domains far from those Ghostbuster trained on (e.g., different varieties of English), text by non-native speakers of English, human-edited model generations, or text generated by prompting an AI model to modify a human-authored input. To avoid perpetuating algorithmic harms, we strongly discourage automatically penalizing alleged usage of text generation without human supervision. Instead, we recommend cautious, human-in-the-loop use of Ghostbuster if classifying someone’s writing as AI-generated could harm them. Ghostbuster can also help with a variety of lower-risk applications, including filtering AI-generated text out of language model training data and checking if online sources of information are AI-generated.

Conclusion

Ghostbuster is a state-of-the-art AI-generated text detection model, with 99.0 F1 performance across tested domains, representing substantial progress over existing models. It generalizes well to different domains, prompts, and models, and it’s well-suited to identifying text from black-box or unknown models because it doesn’t require access to probabilities from the specific model used to generate the document.

Future directions for Ghostbuster include providing explanations for model decisions and improving robustness to attacks that specifically try to fool detectors. AI-generated text detection approaches can also be used alongside alternatives such as watermarking. We also hope that Ghostbuster can help across a variety of applications, such as filtering language model training data or flagging AI-generated content on the web.

Try Ghostbuster here: ghostbuster.app

Learn more about Ghostbuster here: [ paper ] [ code ]

Try guessing if text is AI-generated yourself here: ghostbuster.app/experiment


This article was initially published on the BAIR blog, and appears here with the authors’ permission.




BAIR blog




            AIhub is supported by:



Related posts :



Deploying agentic AI: what worked, what broke, and what we learned

  15 Sep 2025
AI scientist and researcher Francis Osei investigates what happens when Agentic AI systems are used in real projects, where trust and reproducibility are not optional.

Memory traces in reinforcement learning

  12 Sep 2025
Onno writes about work presented at ICML 2025, introducing an alternative memory framework.

Apertus: a fully open, transparent, multilingual language model

  11 Sep 2025
EPFL, ETH Zurich and the Swiss National Supercomputing Centre (CSCS) released Apertus today, Switzerland’s first large-scale, open, multilingual language model.

Interview with Yezi Liu: Trustworthy and efficient machine learning

  10 Sep 2025
Read the latest interview in our series featuring the AAAI/SIGAI Doctoral Consortium participants.

Advanced AI models are not always better than simple ones

  09 Sep 2025
Researchers have developed Systema, a new tool to evaluate how well AI models work when predicting the effects of genetic perturbations.

The Machine Ethics podcast: Autonomy AI with Adir Ben-Yehuda

This episode Adir and Ben chat about AI automation for frontend web development, where human-machine interface could be going, allowing an LLM to optimism itself, job displacement, vibe coding and more.

Using generative AI, researchers design compounds that can kill drug-resistant bacteria

  05 Sep 2025
The team used two different AI approaches to design novel antibiotics, including one that showed promise against MRSA.

#IJCAI2025 distinguished paper: Combining MORL with restraining bolts to learn normative behaviour

and   04 Sep 2025
The authors introduce a framework for guiding reinforcement learning agents to comply with social, legal, and ethical norms.



 

AIhub is supported by:






 












©2025.05 - Association for the Understanding of Artificial Intelligence