For over a decade, computer scientist Randy Goebel and his colleagues in Japan have been using a tried-and-true method from his field to advance artificial intelligence in the world of law: a yearly competition.
Drawing on example legal cases taken from the Japanese bar exam, contestants must use an AI system that can retrieve statutes relevant to the cases, and, more crucially, make a decision: did the defendants in the cases break the law, or not?
It's this yes/no answer that AI struggles with the most, says Goebel — and it raises questions of whether AI systems can be ethically and effectively deployed by lawyers, judges and other legal professionals who face giant dockets and narrow time windows to deliver justice.
The contest has provided the foundation for a new paper in which Goebel and his co-authors outline the types of reasoning AI must use to "think" like lawyers and judges, and describe a framework for imbuing large language models (LLMs) with legal reasoning.
"The mandate is to understand legal reasoning, but the passion and the value to society is to improve judicial decision-making," Goebel says.
The need for these kinds of tools has been especially critical since the Supreme Court of Canada's Jordan decision, Goebel says. That decision shortened the length of time prosecutors have to bring a case to trial, and it has resulted in cases as severe as sexual assault and fraud being thrown out of court.
"It's a very good motivation to say, 'Let's enable the judicial system to be faster, more effective and more efficient,'" Goebel says.
Making machines "think" like lawyers
The paper highlights three types of reasoning AI tools must possess to think like legal professionals: case-based, rule-based and abductive reasoning.
Some AI systems, such as LLMs, have proven adept at case-based reasoning, which requires legal experts to examine previous court cases and determine how laws were applied in the past to draw parallels to the current case in question.
Rule-based reasoning, which involves applying written laws to unique legal cases, can also be completed to some extent by AI tools.
But where AI tools struggle the most is with abductive reasoning, a type of logical inference that involves stringing together a plausible series of events that could explain, for example, why a defendant is not guilty of a crime. (Did the man with the knife in his hand stab the victim? Or did a gust of wind blow the knife into his hand?)
"Not surprisingly, abductive reasoning can't be done by modern large language models, because they don't reason," Goebel says. "They're like your friend who has read every page of Encyclopedia Britannica, who has an opinion on everything but knows nothing about how the logic fits together."
Combined with their tendency to "hallucinate," or invent "facts" wholesale, generic LLMs applied to the legal field are at best unreliable and, at worst, potentially career-ending for lawyers.
The important challenge for AI scientists is whether they can develop a reasoning framework that works in conjunction with generic LLMs to focus on accuracy and contextual relevance in legal reasoning, Goebel says.