Part II: Optimizing eDiscovery with AI — Enhancing Transparency

27 June 2024

Recalling our discussion in Part 1 on data security and cost management, this second installment focuses on the critical element of transparency in using LLMs. Understanding how AI tools derive their outputs is essential for legal professionals to trust and effectively use the technology. We will discuss how Hanzo ensures that users are fully informed about the workings and outputs of LLMs, enhancing reliability and reducing the risk of errors.

Understanding LLMs and Their Training Data

LLMs are trained on huge “foundational datasets” of public data. These foundational datasets include (but are not limited to) all of Wikipedia, the works of Shakespeare, billions of public web pages, legal and financial documents, and other content. When LLMs generate content, they can draw on patterns from these foundational datasets to decide what text to generate. This means that when using common LLM tools in today’s marketplace and asking the LLM to generate content, it can generate text based on what we ask it and the content of the foundational dataset. 

The Problem of Hallucinations in LLMs

When an LLM generates text based on the user’s input and the foundational dataset, this process can lead to “hallucinations,” where the LLM produces text that is either factually incorrect or irrelevant to the query. Such hallucinations can present significant risks in the legal industry, as they can result in misleading or inaccurate information being considered credible, often without adequate verification.

Transparency Measures to Avoid Hallucinations

When using the common LLMs, there is the possibility of hallucinations, and we need to be very careful to avoid them. Hanzo’s strategy is to transparently expose responses with the largest scope for hallucination and bring them to the user’s attention before the dataset is analyzed. In this context, “scope” means the extent to which the LLM can generate false, misleading, or irrelevant content. Hanzo’s Spotlight AI feature generates questions to help users understand how the AI will determine if parts of the dataset are relevant to a given case. This gives the user the best chance to oversee the data analysis and guide the LLMs in identifying the most relevant content. These questions are easy to understand and can be adjusted before any user data is run through the process, thus saving valuable time and cost by providing full transparency upfront before any work is performed, resulting in better results.

Spotlight AI: Enhancing Decision-making

Spotlight AI’s workflow uses methods to reduce the scope of hallucinations to a problem of traditional Type I and Type II errors. Rather than having responses that are a synthetic mix of fact and fantasy, each answer is tuned to be either correct or incorrect. This process makes it easier for users to understand and measure success. When the user evaluates Spotlight AI’s relevant content, they see the rationale for the decision, including the questions that led to it.

Importance of Transparency in Complex Cases

Dealing with legal cases and datasets is rarely straightforward, and answers need to be complex. This is why transparency is so important: Given the complexities of a given case, at least the decisions about each message or document can be black and white and traced back to a clear rationale. In the case of Spotlight AI,  once the enhancement process is run, Hanzo returns unique, item-level details as to why the Spotlight relevancy engine deemed a message, email, or document as relevant. This leaves the user in control of the discovery process and clearly understanding the output. Any steps that make the discovery process more opaque or put unnecessary distance between the user and the dataset will introduce a risk of content being missed. Hanzo provides exported data intelligence along with content when viewed outside of the Hanzo platform to make things easier. This shared intelligence provides enrichment details that can be very helpful to outside reviewers. 

Coming up

In our final installment, we will examine how Hanzo addresses the challenges of scaling AI tools in eDiscovery to handle large and complex datasets efficiently. Discover how scalability intersects with security, cost, and transparency to shape the future of legal tech.

Uncover Your Savings! 

Hanzo Spotlight AI automates relevancy analysis for ediscovery. Try our value calculator now to see how much you can save! Estimates based on industry averages and real customer results.