Alaska’s court system has spent more than a year building an artificial intelligence chatbot to guide residents through probate, revealing how difficult it can be for public institutions to deploy generative AI in high‑stakes situations.
The Alaska Virtual Assistant, or AVA, was conceived as a generative AI tool to help people handle the legal process of transferring a deceased person’s property. Project leaders originally expected to finish in about three months. Instead, the work has stretched well past a year as the team tried to make the system accurate and reliable enough for public use.
“With a project like this, we need to be 100% accurate, and that’s really difficult with this technology,” said Stacey Marz, administrative director of the Alaska Court System and a leader on the AVA project.
Marz said other court technology projects can launch as a minimum viable product and be refined later. In this case, she argued, the standard must be higher because people could act on AVA’s guidance in matters that affect families and estates. “If it’s not accurate or not complete, they really could suffer harm,” she said.
That tension captures a broader challenge for government agencies experimenting with generative AI. While local governments are testing tools for tasks such as driver’s license applications or processing housing benefits, a recent Deloitte report found that fewer than 6% of local government practitioners prioritize AI as a way to deliver services. Concerns over reliability, trust and oversight often collide with the broader excitement and investment around AI.
Marz envisioned AVA as a lower-cost digital counterpart to Alaska’s family law helpline, which offers free support on issues from divorce to domestic violence protective orders. The aim was to give users a similar self-help experience via chatbot, guided by a team that includes attorneys, technologists and advisers from the National Center for State Courts (NCSC).
NCSC provided startup funding as part of its expanding AI work, while the technical buildout was handled by Tom Martin, a lawyer, law professor and founder of legal AI firm LawDroid. Martin said the project required careful decisions about how the underlying model should behave, including what kind of “personality” the system should present to users.
Different AI models tend to exhibit different styles, he noted: some follow rules closely, others are more prone to overconfident answers or performative cleverness. For legal information, Martin said, the team wanted a model that was disciplined, accurate and clear, rather than one that tries to “prove that they’re the smartest guy in the room.”
Early user testing showed that even seemingly positive traits could backfire in a sensitive context like probate. Aubrie Souza, a consultant with NCSC who has worked on AVA, said initial versions were overly sympathetic, repeatedly offering condolences to people asking questions about a death. Testers reported that the repeated expressions of sympathy felt grating rather than helpful.
“Through our user testing, everyone said, ‘I’m tired of everybody in my life telling me that they’re sorry for my loss,’” Souza said. The team ultimately stripped out those kinds of scripted condolences.
More serious were the system’s hallucinations—instances where AVA produced confident but incorrect information. Although the chatbot was supposed to limit its answers to the Alaska Court System’s own probate materials, it sometimes pulled in outside references. In one example, AVA advised a user seeking legal help to consult the alumni network of a law school in Alaska. No such law school exists.
The team responded by tightening controls so the chatbot would cite only vetted Alaska probate documents rather than searching more broadly. Across the industry, hallucination rates have fallen as models improve and as companies add layers of automated checking, but AVA’s backers still faced the need for close human oversight.
To gauge performance, the AVA team created a test set of 91 probate-related questions, such as which form to file to transfer a deceased relative’s car title. Each answer required review. Jeannie Sato, the court system’s director of access to justice services, said the process soon became unmanageable given the volume of questions and the need for careful evaluation.
The team narrowed the test set to 16 questions, a mix of previously mishandled queries, complex scenarios and common basic questions they expect AVA to receive regularly. Those responses are now used as a more practical benchmark for accuracy and usefulness.
Cost is another constant concern. While fees for using large language models have dropped sharply as new versions are released, Alaska’s courts still operate under tight budget constraints. Martin estimated that, under one configuration, 20 AVA queries would cost roughly 11 cents to process, an appealing figure for a mission framed around expanding access to legal information.
But the underlying AI services—such as OpenAI’s GPT models—are evolving rapidly. That means AVA’s maintainers will likely have to monitor the chatbot on an ongoing basis for shifts in behavior or quality and update prompts or even switch models over time.
“We anticipate needing to do regular checks and potentially update prompts or the models as new ones come out and others are retired,” Martin said. “It’s definitely something we’ll need to stay on top of rather than a purely hands-off situation.”
Despite repeated adjustments, AVA is now slated to go live in late January, assuming no further delays. Marz remains hopeful that the tool will make it easier for Alaskans to navigate probate, while acknowledging that current AI systems fall short of fully replicating the court’s human self-help facilitators.
“We wanted to replicate what our human facilitators at the self-help center are able to share with people,” she said. “But we’re not confident that the bots can work in that fashion, because of the issues with some inaccuracies and some incompleteness.” She said future model improvements could eventually raise both accuracy and completeness.
For now, she described the project as far more labor-intensive than early AI enthusiasm might suggest. “All the buzz about generative AI, and everybody saying this is going to revolutionize self-help and democratize access to the courts—it’s quite a big challenge to actually pull that off,” Marz said.
