ΑΙhub.org
 

How the internet and its bots are sabotaging scientific research


by
03 September 2025



share this:

This image features 3 images of a street. Overlying the image are different shapes which are arranged to look like QR code symbols. These are in white/blue colours and intersect one another. The first image is clear, but the second is slightly more pixelated, and the final image is very pixelated. Elise Racine & The Bigger Picture / Web of Influence I / Licenced by CC-BY 4.0

By Mark Forshaw, Edge Hill University and Jekaterina Schneider, University of the West of England

There was a time, just a couple of decades ago, when researchers in psychology and health always had to engage with people face-to-face or using the telephone. The worst case scenario was sending questionnaire packs out to postal addresses and waiting for handwritten replies.

So we either literally met our participants, or we had multiple corroborating points of evidence that indicated we were dealing with a real person who was, therefore, likely to be telling us the truth about themselves.

Since then, technology has done what it always does – creating opportunities for us to cut costs, save time and access wider pools of participants on the internet. But what most people have failed to fully realise is that internet research has brought along risks of data corruption or impersonation which could be deliberately aiming to put research projects in jeopardy.

What enthused scientists most about internet research was the new capability to access people who we might not normally be able to involve in research. For example, as more people could afford to go online, people who were poorer became able to participate, as were those from rural communities who might be many hours and multiple forms of transport away from our laboratories.

Technology then leapt ahead, in a very short period of time. The democratisation of the internet opened it up to yet more and more people, and artificial intelligence grew in pervasiveness and technical capacity. So, where are we now?

As members of an international interest group looking at fraud in research (Fraud Analysis in Internet Research, or Fair), we’ve realised that it is now harder than ever to identify if someone is real. There are companies that scientists can pay to provide us with participants for internet research, and they in turn pay the participants.

While they do have checks and balances in place to reduce fraud, it’s probably impossible to eradicate it completely. Many people live in countries where the standard of living is low, but the internet is available. If they sign up to “work” for one of these companies, they can make a reasonable amount of money this way, possibly even more than they can in jobs involving hard labour and long hours in unsanitary or dangerous conditions.

In itself, this is not a problem. However, there will always be a temptation to maximise the number of studies they can participate in, and one way to do this is to pretend to be relevant to, and eligible for, a larger number of studies. Gaming the system is likely to be happening, and some of us have seen indirect evidence of this (people with extraordinarily high numbers of concurrent illnesses, for example).

It’s not feasible (or ethical) to insist on asking for medical records, so we rely on trust that a person with heart disease in one study is also eligible to take part in a cancer study because they also have cancer, in addition to anxiety, depression, blood disorders or migraines and so on. Or all of these. Short of requiring medical records, there is no easy answer for how to exclude such people.

More insidiously, there will also be people who use other individuals to game the system, often against their will. We are only now starting to consider the possibility of this new form of slavery, the extent of which is largely unknown.

Enter the bots

Similarly, we are seeing the rise of bots who are pretending to be participants, answering questions in increasingly sophisticated ways. Multiple identities can be fabricated by a single coder who can then not only make a lot of money from studies, but also seriously undermine the science we are trying to do (very concerning where studies are open to political influence).

It’s getting much more difficult to spot artificial intelligence. There was a time when written interview questions, for example, could not be completed by AI, but they now can.

It’s literally only a matter of time before we will find ourselves conducting and recording online interviews with a visual representation of a living, breathing individual, who simply does not exist, for example through deepfake technology.

We are only a few years away from such a profound deception, if not months. The British TV series The Capture might seem far-fetched to some, with its portrayal of real-time fake TV news, but anyone who has seen where the state of the art now is with respect to AI can easily imagine us being just a short stretch away from its depictions of the “evils” of impersonation using perfect avatars scraped from real data. It is time to worry.

The only answer, for now, will be to simply conduct interviews face-to-face, in our offices or laboratories, with real people who we can look in the eye and shake the hand of. We will have travelled right back in time to the point a few decades ago mentioned earlier.

With this comes a loss of one of the great things about the internet: it is a wonderful platform for democratising participation in research for people who might otherwise not have a voice, such as those who cannot travel because of a physical disability, and so on. It is dismaying to think that every fraudster is essentially stealing the voice of a real person who we genuinely want in our studies. And indeed, between 20–100% of survey responses have been found as fraudulent in previous research.

We must be suspicious going forward, when our natural propensity as amenable people who try to serve humanity with the work we do, is to be trusting and open. This is the real tragedy of the situation we find ourselves in, over and above that of the corruption of data that feed into our studies.

It also has ethical implications that we urgently need to consider. We do not, however, seem to have any choice but to “hope for the best but assume the worst”. We must build systems around our research, which are fundamentally only in place in order to detect and remove false participation of one type or another.

The sad fact is that we are potentially going backwards by decades to rule out a relatively small proportion of false responses. Every “firewall” we erect around our studies is going to reduce fraud (although probably not entirely eliminate it), but at the cost of reducing the breadth of participation that we desperately want to see.The Conversation

Mark Forshaw, Professor of Health Psychology, Edge Hill University and Jekaterina Schneider, Research Fellow of Sport Psychology, University of the West of England

This article is republished from The Conversation under a Creative Commons license. Read the original article.




The Conversation is an independent source of news and views, sourced from the academic and research community and delivered direct to the public.
The Conversation is an independent source of news and views, sourced from the academic and research community and delivered direct to the public.




            AIhub is supported by:



Related posts :



Interview with Luc De Raedt: talking probabilistic logic, neurosymbolic AI, and explainability

  23 Sep 2025
AIhub ambassador Liliane-Caroline Demers caught up with Luc de Raedt at IJCAI 2025 to find out more about his research.

Call for AAAI educational AI videos

  22 Sep 2025
Submit your contributions by 30 November 2025.

Self-supervised learning for soccer ball detection and beyond: interview with winners of the RoboCup 2025 best paper award

  19 Sep 2025
Method for improving ball detection can also be applied in other fields, such as precision farming.

How AI is opening the playbook on sports analytics

  18 Sep 2025
Waterloo researchers create simulated soccer datasets to unlock insights once reserved for pro teams.

Discrete flow matching framework for graph generation

and   17 Sep 2025
Read about work presented at ICML 2025 that disentangles sampling from training.

We risk a deluge of AI-written ‘science’ pushing corporate interests – here’s what to do about it

  16 Sep 2025
A single individual using AI can produce multiple papers that appear valid in a matter of hours.

Deploying agentic AI: what worked, what broke, and what we learned

  15 Sep 2025
AI scientist and researcher Francis Osei investigates what happens when Agentic AI systems are used in real projects, where trust and reproducibility are not optional.

Memory traces in reinforcement learning

  12 Sep 2025
Onno writes about work presented at ICML 2025, introducing an alternative memory framework.



 

AIhub is supported by:






 












©2025.05 - Association for the Understanding of Artificial Intelligence