On January 31st 2023, judge Juan Manuel Padilla issued a 7-page ruling on a case in which the fundamental right to health of a child, diagnosed as being on the autism spectrum, was at stake. It was a relatively simple second instance case decided by a judge in Cartagena, Colombia, in which the key legal question was whether a health insurance company’s request for co-payments or a fee for authorizing a medical procedure infringed on the child’s fundamental rights to health and dignified life. Judge Padilla upheld the first instance ruling that favored the child.
The ruling would have been one of thousands of health-related judicial decisions adopted in Colombia every year, but for the fact that the judge decided to transcribe his interactions with ChatGPT, which he had used to motivate his verdict. In a matter of hours, the ruling made its way into the Colombian national media and has now been registered by media outlets all over the world.
Only 10 days later, magistrate María Victoria Quiñones from the Administrative Tribunal of the Magdalen, also in Colombia, similarly issued a court order in which ChatGPT prompts were also transcribed. The interactions of magistrate Quiñones with the chatbot aimed at answering technical questions that helped her to decide on how to carry out a judicial hearing in the metaverse. The judicial process concerns a direct remedy claim (reparación directa) requested by a contractor of Colombia’s National Police. On February 15th, the hearing was held through Meta’s Horizon Workrooms and was livestreamed through YouTube.
This blogpost examines the challenges of Large Language Models (LLMs) tools, such as a ChatGPT, to draft judicial rulings and, more generally, the dangers of using emerging technologies in judicial activities, in Colombia and beyond. My main argument is that current LLMs are not trustworthy sources of information and should only be used – with the utmost care – when other more effective and safe options are not available. Moreover, I contend that the judiciary should promote digital literacy and an informed, transparent, ethical, and responsible use of AI tools, in order to reap its potential benefits and prevent risks.
The judges did not use ChatGPT in an informed or responsible manner
To be clear, the texts of the ruling and court order issued by the Colombian judges were not a simple copy / paste of the queries introduced to ChatGPT and the chatbot’s answers. On the one hand, the decision of judge Padilla succinctly explained the facts of the case, described the logic of the first instance decision, stated the main constitutional issues at stake, listed the relevant articles of the Colombian Constitution and cited a ruling of the Constitutional Court that addressed a very similar case (for a more thorough description of the facts and of the tutela constitutional action, see professor Lorena Florez’s post).On the other hand, the court order of magistrate Quiñones explained that the parties agreed to carry out the initial hearing of the administrative procedure in the metaverse, cited legal provisions and case law that justified the use of information technologies in judicial procedures, and explained what the metaverse is and how the hearing would be conducted. If the chatbot’s answers were only one part of the judicial decisions’ statements of reason, why should we care or even be concerned about how the Colombian judges used ChatGPT?
The short answer is that they used ChatGPT as if it was an oracle: a trustworthy source of knowledge that did not require any sort of verification. While the judges were transparent about the fact that they used the tool and included quotation marks to distinguish the content produced by ChatGPT, their use was not informed nor responsible.
There are three main reasons why the way ChatGPT was used by the judiciary in these cases is very concerning for Colombians and beyond.
First, the stakes in judicial rulings are too high – especially when human rights are involved – to justify the use of unreliable and insufficiently tested technologies. Due to the way that LLMs like ChatGPT are developed and operate, these tools tend to produce incorrect and imprecise answers and confuse reality with fiction. Even the CEO of OpenAI acknowledged in December 2022, that “ChatGPT is incredibly limited (…) it’s a mistake to be relying on it for anything important right now”. Furthermore, due to structural reasons, it is unlikely that these problems of LLMs will be solved soon.
In the two Colombian cases, the answers of ChatGPT were not incidental but determinant for the decisions adopted by the courts. In judge Padilla’s ruling, two of the seven pages consisted of a transcription of four ChatGPT’s answers to prompts. This means that about 29% of the ruling consists of text generated by ChatGPT. Hence, although the answers of ChatGPT were not the sole legal basis of the ruling, they are a key component of the decision. Moreover, the four questions posed by the judge to ChatGPT dealt with key legal issues required to decide the case:
- Is an autistic child exempt from co-payments for therapy?
- Should tutela [constitutional] actions in these cases be granted?
- Is requiring a co-payment in these cases a barrier to access to health services?
- Has the jurisprudence of the constitutional court made favourable decisions in similar cases?
Hence, judge Padilla prompted ChatGPT to address core legal questions that are very specific to the Colombian legal system.
In the case of magistrate Quiñones’ court order, the questions dealt with issues that were not substantial for the cause of action. The three questions aimed at supporting procedural decisions required to carry out the hearing in the metaverse:
- What is an avatar?
- What is the most effective method to verify the authenticity of those connecting to a meeting and/or virtual hearing.
- Method for verifying the authenticity of the avatar in the metaverse?
Although these questions appear to be merely technical, they deal with how the magistrate makes sure that the people who participate in the hearing legitimately represent the parties, a matter that is essential for ensuring access to justice and due process. The statements included in the court order illustrate the point: “Thus, for a better understanding of some concepts about the metaverse and the administration of the hearing in this environment, this judicial agency will rely on AI, using ChatGPT.” For the sake of brevity, I won’t address in this post the legal and equity implications of using the metaverse for a court hearing, but I recommend professor Lorena Florez’s recent post where she discusses the need to assess the necessity of the tool and to implement user-centered approaches (design thinking) to decide how to conduct judicial activities.
In sum, the first argument is not that judges – and in general, public officials – should not innovate or use new technologies. Rather, experimental tools should not be deployed in certain state-related activities and if judges currently have access to more effective and safe tools, the latter should be preferred over the untested ones.
Secondly, ChatGPT’s answers should not have been accepted and taken at face value, but rather contrasted with other more reliable sources. For example, in the case of judge Padilla’s ruling, the answers provided by ChatGPT lacked nuance. Moreover, the answers were poorly justified. Once, the chatbot cited a specific law that is only tangentially pertinent to the case, and in another reply, ChatGPT alluded to the jurisprudence of the Constitutional Court but without citing specific cases.
The judgement states that the information offered by ChatGPT would be “corroborated”. However, there is no explicit trace in the text that allows us to conclude that judge Padilla or his clerk effectively checked whether ChatGPT’s responses were accurate. In fact, I replicated the four queries posed by judge Padilla and the chatbot answered slightly differently, a result that is not surprising given how the tool works. Furthermore, when I prompted ChatGPT to provide examples of case law from the Constitutional Court that justified its answers, the chatbot invented the facts and ratio decidendi of one ruling, and cited a judgement that did not exist (inventing the facts and the verdict).
Hence, the argument is not that ChatGPT or other LLMs should not be used for supporting judicial work. The point is that any content produced by these systems that is to be used directly or indirectly to draft rulings must be subjected to a rigorous and through examination.
This last point introduces the third reason why the two Colombian cases are worrisome. Both the ruling and the court order explicitly claim that emerging technologies can help to streamline judicial processes. For example, judge’s Padilla’s ruling stated: “The purpose of including these AI texts is not in any way to replace the Judge’s decision. What we are really looking for is to optimize the time spent in writing sentences”. Further, in a radio interview, judge Padilla claimed that: “my only concern is to improve the justice system’s timings […]. (T)hat string of text that artificial intelligence provided me with could also have been provided to me by a clerk”.
It is true that the law 2213 of 2022 and the General Procedure Code (article 103, law 1564 of 2012), among others, admit the use of information and communication technologies to manage and conduct judicial activity. However, it is also true that the laws indicate that such technologies should only be used when they are “suitable” for the task. If ChatGPT and other LLMs currently available are evidently unreliable, since their outputs tend to include incorrect and false information, then judges would require significant time to check the validity of the AI-generated content, thereby undoing any significant “time savings”. As it happens with AI in other areas, under the narrative of supposed “efficiencies”, fundamental rights can be put at risk.
Finally, there is a risk that the judges and its clerks over-rely on the AI’s recommendations, incurring what is known as “automation bias”. As explained by professor Florez, “[d]ue to an overconfidence in the impartiality or certainty of the AI system, such as ChatGPT, judges may be hindered in their ability to make exact judgments and understand their surroundings. This could lead to an over-reliance on the outputs of automated systems.”
Challenges for judicial systems in the age of generative AI
It is worrisome that two Colombian judges transcribed ChatGPT’s prompts to motivate their decisions without thoroughly examining whether the information was correct. There is a high risk that judges and their clerks all over Colombia start transcribing ChatGPT’s outputs as if they were a reliable source. In fact, judge Padilla stated in a radio interview that judges from all over the country would be “very happy” because the system could save “many hours transcribing things that are already on the Internet”. Judge Padilla also claimed that “what ChatGPT does is to help us choose the best of these texts from the Internet and compile them in a very logical and very short way to what we need.” This lack of understanding of how LLMs work illustrates why ensuring digital literacy of the judiciary is critical in times of generative AI.
There is a tendency towards greater access to generative AI tools, freely offered by different companies through web and app-based platforms. Hence, the type of uninformed use of AI that we saw in Colombia may expand beyond the country. Moreover, plaintiffs and defendants may also use LLMs – such as ChatGPT – as an oracle, to the detriment of their clients’ interests. AI tools should only be used in judicial matters whenever such tools are sufficiently tested and when other more effective, less costly, and more accessible tools are not available.
Furthermore, the bodies that manage the judiciary wherever the use of these tools is feasible should design guidelines and policies on how and when certain AI tools, including LLMs like ChatGPT, can be introduced to judicial processes. The guidelines could establish certain standards and best practices for judges, clerks, and attorneys who wish to use AI tools.
For example, an informed, transparent, ethical, and responsible use of AI tools by judges, clerks, and attorneys should comply with the following standards: (i) the user must understand how the technology works, acknowledges its limitations and risks, and makes sure that the tool is adequate for the required task (informed use); (ii) the user is transparent about the use of the technology in proceedings (transparent use); (iii) the user distinguishes clearly which sections of the judicial decision or legal document are AI-generated text (ethical use); and, (iv) the user rigorously checks information retrieved from the AI system against reliable sources and explicitly informs about such examination (responsible use).
The Colombian cases could contribute to a global discussion on the importance of digital literacy of judges, their aides, and attorneys, as well as the need of having clear guidelines of when and how to use AI systems in the judicial system.