top of page

Why Alaska’s AI Legal Chatbot Struggled with Basic Truths

Why Alaska’s AI Legal Chatbot Struggled with Basic Truths

The promise was simple: use technology to bridge the massive gap in legal access for rural citizens. Alaska’s court system, facing a shortage of attorneys and a geography that isolates many residents, attempted to build an AI legal chatbot named AVA (Alaska Virtual Assistant). The goal was to guide users through the complex, emotionally draining process of probate—dealing with a loved one’s estate after death.

It didn’t go according to plan. What was scoped as a three-month sprint stretched into a grueling 16-month slog, resulting in a tool that struggled to distinguish legal fact from digital fiction. The project exposes a fundamental tension in modern legal tech: the gap between what generative AI is sold as, and what it actually is.

For developers and court systems watching from the sidelines, Alaska’s experiment offers more value as a warning than a blueprint. It highlights specific technical and design failures that anyone building an AI legal chatbot needs to understand before writing a single line of code.

The Core Problem: When an AI Legal Chatbot Predicts Instead of Knows

The Core Problem: When an AI Legal Chatbot Predicts Instead of Knows

The biggest hurdle for the Alaska project wasn't a lack of funding or effort; it was the nature of the technology itself. Users and developers often expect an AI legal chatbot to function like a database—retrieving hard facts upon request. Generative AI, however, does not retrieve. It predicts.

The "Word Prediction" Trap in Legal Advice

Legal advice relies on certainty. A statute either exists, or it doesn't. A filing deadline is specific. Large Language Models (LLMs), the engines behind tools like AVA, are built on probability. They predict the next most likely word in a sentence based on training data.

In creative writing, a "likely" continuation is a feature. In law, it is a liability. When an AI legal chatbot is asked a question it doesn't have the answer to, it often attempts to bridge the gap with a plausible-sounding string of words. This is where hallucinations happen.

Tech-savvy observers have pointed out that expecting an LLM to strictly adhere to logic is a misunderstanding of the tool. It is a language calculator, not a truth engine. Alaska's system ran into this head-on when it confidently directed users to resources that didn't exist, effectively sending grieving citizens on wild goose chases.

Why RAG Didn't Solve the AI Legal Chatbot's Accuracy Issues

To combat hallucinations, developers often use Retrieval Augmented Generation (RAG). This technique restricts the AI, forcing it to look at a specific set of verified documents (like Alaska's court procedures) before answering.

While RAG significantly reduces error rates—potentially lowering hallucinations to a fraction of a percent—it doesn't hit zero. In the context of government services, a 1% error rate is dangerous. If 1 out of 100 citizens is given incorrect instructions on how to transfer a property deed or handle a will, the court system is creating new legal messes rather than solving old ones.

Aubrie Souza, a consultant for the National Center for State Courts (NCSC) who worked on the project, noted the persistent difficulty in eliminating these errors. The system proved that even with guardrails, an AI legal chatbot can still veer off the road if the underlying model decides to improvise.

User Experience: The Failure of Artificial Empathy

User Experience: The Failure of Artificial Empathy

Beyond the technical accuracy, the AVA project failed on a human level. The development team initially programmed the AI legal chatbot to be conversational and sympathetic. Given that the tool was designed for probate law, users were often dealing with the recent death of a parent or spouse.

Why Users Rejected the AI Legal Chatbot’s Apologies

The AI was programmed to use phrases like "I am so sorry for your loss" when users mentioned a death. While this mimics human interaction, testers found it deeply off-putting.

Feedback from the pilot phase showed that users felt patronized. They knew they were talking to a machine. Having a piece of software feign emotional distress felt disingenuous and annoying. It added friction to an already painful process. When you are trying to figure out which tax form to file for a deceased relative, you don't want a robot offering condolences; you want the form number.

This reaction aligns with broader user experience trends in utility-based AI. Users tolerate "personality" in entertainment bots, but when the stakes are financial or legal, they prefer a tool that functions like a scalpel: sterile, precise, and efficient.

Functionality Over Personality in Government Tech

The Alaska team eventually stripped the "personality" out of AVA, removing the empathetic scripts in favor of a drier, direct tone. This is a critical lesson for any government agency building an AI legal chatbot.

Trust in these systems comes from competence, not simulated friendship. If a user asks a question about jurisdiction, the answer should be immediate and factual. Rhetorical flourishes and emotional padding only serve to obscure the information the user is desperate to find. The most compassionate thing a legal tool can do is provide the correct answer quickly so the user can get off the computer.

The Alaska Case Study: Ambition vs. Reality

The Alaska Case Study: Ambition vs. Reality

The context of Alaska’s court system is unique, but the administrative errors made are universal. Alaska has a vast rural population with little to no access to lawyers. The intent behind the AI legal chatbot was noble: improve "Access to Justice" (A2J) by automating the guidance provided by understaffed court hotlines.

From Three Months to Stagnation

The original roadmap pegged the project as a three-month build. This timeline suggests a fundamental underestimation of the complexity involved in digitizing probate law. Probate is notoriously filled with edge cases, family disputes, and financial intricacies.

Sixteen months later, the project was still struggling. This wasn't just "scope creep"; it was a collision with reality. By choosing probate as the pilot for their AI legal chatbot, the courts selected one of the hardest possible starting points. A simpler domain, like traffic citations or jury duty scheduling, might have allowed for a working prototype much sooner. Instead, they bogged down in a field where nuance is everything.

The "Phantom Law School" Incident

The most publicized failure of the AVA system was a specific hallucination regarding legal assistance. When testers asked where they could find a lawyer, the AI legal chatbot suggested they contact the alumni association of the local law school in Alaska.

The problem? Alaska does not have a law school.

This seemingly small error destroys user trust instantly. If the bot doesn't know whether a law school exists in the state, why would a user trust it to explain the difference between formal and informal probate? This highlights the danger of generative models: they create answers that sound structurally correct (many states do have law schools with alumni referral services) but are factually wrong for the specific context.

How to Use Legal AI Safely (Lessons from Users)

How to Use Legal AI Safely (Lessons from Users)

Despite the failures of AVA, the technology isn't useless. Tech-savvy users and legal professionals have found ways to utilize LLMs effectively, but they use them differently than the Alaska court system attempted to.

Summarization vs. Consultation

The most effective use case for current AI technology in law isn't generating advice; it's digesting information.

Users analyzing complex government PDFs or dense legal forms report success when using AI as a summarizer. You can feed a 50-page document into a system and ask it to extract specific clauses, deadlines, or definitions. In this workflow, the AI isn't being asked to "think" or provide counsel; it is being asked to reorganize existing text.

For an AI legal chatbot to be truly useful today, it might need to pivot from a "consultant" role (telling you what to do) to a "librarian" role (helping you find what you need within a document you provide).

The Necessity of Human-in-the-Loop Verification

The consensus among technical observers is that "Human-in-the-Loop" (HITL) isn't just a buzzword—it's a safety requirement.

In the Alaska example, the goal was to help unrepresented litigants (people without lawyers). The danger here is that these users are the least equipped to spot an AI hallucination. If a lawyer uses ChatGPT and it cites a fake case, the lawyer likely catches it. If a grieving widow uses AVA and it cites a fake filing deadline, the consequences are missed court dates and financial penalties.

Until an AI legal chatbot can guarantee accuracy—which probabilistic models currently cannot—they remain dangerous tools for the layperson. The most responsible deployment of this tech might be internal: helping court clerks answer phones faster, rather than replacing the clerk entirely.

The failure of AVA wasn't that the technology is "bad." It’s that the application didn't match the capability. We are trying to force a probabilistic creative writer to do the job of a strict logic processor. Until the architecture changes, or we severely restrict the scope, courtrooms should be wary of inviting AI to the bench.

FAQ Section

Why do AI legal chatbots hallucinate facts?

AI chatbots function as word predictors, not logic engines. They generate answers based on statistical patterns in their training data rather than referencing a verified database of facts, leading them to confidently invent incorrect information.

What is RAG in the context of an AI legal chatbot?

RAG (Retrieval Augmented Generation) is a method where the AI is instructed to answer questions only using a specific set of uploaded documents. While it reduces errors, it does not completely eliminate hallucinations if the AI misinterprets the source text.

Why did users dislike the empathy features in the Alaska AI?

Users found the simulated empathy (like "I'm sorry for your loss") to be fake and patronizing. When dealing with serious legal or government tasks, users prefer direct, efficient, and neutral interactions over forced emotional connection.

Is it safe to use AI for probate or estate planning?

Using AI for complex legal tasks like probate without lawyer supervision is risky. While AI can help summarize documents or explain definitions, it frequently misses the nuance required for legal advice and can fabricate laws or institutions.

What was the "phantom law school" error in Alaska's project?

The Alaska court's AI chatbot advised users to seek help from the "Alaska Law School" alumni association. However, Alaska does not have a law school, proving that the AI was applying generic patterns to a specific location where they didn't apply.

Get started for free

A local first AI Assistant w/ Personal Knowledge Management

For better AI experience,

remio only supports Windows 10+ (x64) and M-Chip Macs currently.

​Add Search Bar in Your Brain

Just Ask remio

Remember Everything

Organize Nothing

bottom of page