Part I: Optimizing eDiscovery with AI — Ensuring Data Security While Controlling Costs

18 June 2024

Introduction to AI in eDiscovery

In the opening segment of our three-part series on optimizing eDiscovery with generative AI, we explore the pressing need for data security and cost-effective strategies in legal tech. The shift towards larger datasets has amplified the demand for tools that not only expand capabilities but also safeguard sensitive information and manage expenses wisely.  We will delve into how Hanzo’s unique approach to using Large Language Models (LLMs) in legal contexts ensures high data security and cost efficiency.

The Evolution of eDiscovery and the Role of Generative AI

For decades, the world of eDiscovery has faced multiple challenges related to processing large amounts of varied data. When dealing with increasingly larger datasets, we need tools that scale efficiently and safely, making the best use of the available resources. In recent years, generative AI has opened up new possibilities to decrease the workload of overworked legal professionals dealing with huge datasets. Large Language Models (LLMs) are excellent at summarizing large documents and generating answers to simple questions. At the same time, generative AI has introduced new potential risks, such as hallucinations or the ability to generate text that contains an incorrect answer and deliver this text with apparent confidence. (We will discuss hallucination in our next blog post series.)  The use of LLMs can also be expensive, meaning that every failure costs time and money.

Security Concerns with Generative AI

It is possible to use LLMs to process large datasets in a manner that is safe, fast, cheap, and accurate, but this requires care and attention to detail. Perhaps the most important concern when dealing with large datasets for eDiscovery is data security. Today’s datasets likely contain confidential information that must be secured with minimized opportunities for data leakage. Data protection is one of the hallmarks of Hanzo’s platform and why Hanzo has adopted a single tenant policy whereby each customer gets its own environment, and while their data is with Hanzo, it does not leave that environment. This “walled garden” approach is also applied to our use of LLMs. When LLMs are needed, they can be spun up within a customer’s environment, ensuring that datasets are never mixed or use shared resources.

Cost Implications of Using LLMs

On the topic of cost, we know that our customers do not have infinitely deep pockets, and any solution must come at a reasonable price. Keeping costs low is largely a matter of choosing the right tools for the right problems and using those tools as effectively as possible. Keeping LLMs “hot” and running on machines is expensive, and the GPUs required are a limited, precious resource, which is why Hanzo only spins up what it needs when it is needed. 

Hanzo’s Approach to Data Security and Cost Efficiency

Hanzo understands this problem and created an infrastructure for LLMs when, and only when, called upon for use. This means we don’t need to count API requests or tokens, we can count GPU hours instead. By using this approach, we can focus on setting up the workload to make the best use of those GPU hours, addressing the root source of costs, and incentivizing real savings as directly as possible.

Coming up

Stay tuned for our next blog post, in which we will unpack the vital aspect of transparency in using LLMs, a cornerstone of maintaining trust and clarity in automated legal processes.

Explore the Value of Spotlight AI-Powered Relevancy Assessment!

Save time and money with Hanzo Spotlight AI, which automates analysis and review for ediscovery and investigations. Choose between two options, collaboration message data or email and loose file data, to compare the cost and time savings of using AI versus standard approaches.