A growing number of AI tools are being developed for the legal sector, to help professionals search lengthy texts or check court rulings. Leiden SAILS researcher Masha Medvedeva, an expert on the technical development of these systems, warns: “Users should know what’s under the hood.”
I have technical expertise on building AI systems and I’ve been embedded in various law faculties. My research is focused on technical design choices in systems that may have downstream implications on whoever is going to use them. These choices can have implications for law as a whole, for legal practice or for individuals. Legal search tools, for instance, have been around for a very long time and have been done pretty well. But in their search results, they can omit certain things, rank your search results in different manners and so on. So if you’re a law professional using these systems, you should at least have some sort of understanding of how they work to a degree that you can make decisions of whether to rely on certain tools or not.
In my experience, developers who build these systems often have little understanding of the sector they’re developing for. This was also one of the findings in my PhD research on the development of systems that predict court decisions. Furthermore, tools often never properly evaluated on data. I’ll give you an example, again with the systems predicting court decisions. Many scientific papers claim huge successes with their prediction models. But 93 percent of the academic papers evaluate their models on data that is only available after the judge made a decision. But this of course is not the kind information that would be available to somebody who would actually want to predict court decisions. So actually, these systems are predicting what the decision was rather than what it will be. Out of 171 papers in English that predict court decisions, only twelve are actually doing something with the data that is only available prior to a decision. The same goes for models built to generate legal reasoning. These models are often evaluated as if they’re translation models. In essence, they’re comparing how many words in the reasoning are the same as the original reasoning that they compare it to. That’s not necessarily a way to evaluate whether the model’s reasoning is actually sound.
Really try, before adopting any of the AI tools, to understand what data goes into it, how it makes its decisions, and most importantly, how it was evaluated. The thing with the legal domain is, it moves very quickly. So if you’re relying on databases or producing something based on the most recent information, systems have to be updated and evaluated regularly. And even when the AI you’re using is able to support those requirements, it’s still important to understand that whenever there’s different iterations in AI models, these also have to be evaluated each time. Because it doesn’t necessarily mean that, if you added more information, your system started working better or stayed the same. It could start working worse, because most of the time, we don’t really know how it works exactly. And I think today, the clients have to be informed. And they have to also be aware that these systems may be used for their cases.
Earlier in my career, I was involved in the development of Typology of Legal Technologies. This is a curated set of legal technologies (applications, scientific papers, and datasets) that we handpicked to demonstrate the potential impact on legal effect of different types of “legal tech”. I would recommend this as a guide in trying to figure out what questions you should ask if you were to adopt a tool. Right now, there’s no real systematic audits of these systems and it’s a bit of a Wild West. I think maybe with the new AI Act, we’ll at least see that, very high-risk systems would have to be audited and maybe they’ll be safer.
Sometimes we do that. For instance, I’ve given talks to the judges in the Hague. There was a symposium in January of this year, organized for the employees of the Hague court. We talked about how predicting court decisions works and explored how the tools work so judges can also make informed decisions on whether and how these systems can be used.