Leveraging Generative AI in eDiscovery: The Art and Science of Prompt Engineering

6 December 2023

The use of generative AI in eDiscovery is opening new avenues for efficiency and precision. But, as is often the case with powerful tools, the devil is in the details. A significant part of those details? Prompt engineering. Let’s take a look.

What is Prompt Engineering?

We use a “prompt” to interact with a Large Language Model (LLM). LLMs are AI models that have been trained on billions of documents and then used to generate text. In AI lingo, a ‘prompt’ is a query or instruction that tells the LLM what to generate or how to respond. Think of it as a refined search query written in plain language that the average person can understand. Generally, the more precise and relevant your prompt is, the more accurate and useful the AI-generated results will be.

A key part of the process of Spotlight AI developed here at Hanzo relies on prompt engineering to help our users separate out the responsive from non-responsive content in a scalable and transparent manner. It really helps break down the complexity of a case into simple facets that are easy to understand and easy to answer. The facets themselves are also written by generative AI, which parses out the salient features of a case from long and complex documents. These facets can be supplemented or tweaked by the user, and the user can even provide examples to clarify edge cases. This means that Spotlight AI gives the user a lot of control over the process of first pass review and allows computers to take on the most laborious steps.

In this post, we will share some of the lessons learned in developing Spotlight AI.

Crafting Effective Prompts for eDiscovery: Examples & Tips

Know Your AI: Before crafting prompts, understand your AI’s strengths and limitations. For instance, if your model excels in legal jargon but struggles with idioms, you’d want to frame your questions accordingly. All LLMs are trained on what is called a “foundational dataset,” and these foundational datasets shape how the LLM can respond. Some LLMs have foundational datasets that are enriched with legal documents, others with news articles, financial documents, etc. LLMs can also be “fine-tuned” for a specific use case. This allows us to make smaller, faster, cheaper LLMs that perform very well at given tasks.

Spotlight AI, for instance, uses several different LLMs for different tasks. In order to generate the facets, for example, we use a general LLM with a very large number of parameters, which can generate high-quality and diverse facets for a wide range of different topics. These LLMs are usually expensive, verbose, and can be slow. Other lighter LLM can be used for other tasks like classifying documents. We use an LLM with much fewer parameters, which is great at parsing text and generating a very short answer. Smaller LLMs are, as you would expect, cheaper, faster, and can be scaled up to suit the size of the dataset. Combining several types of LLM, like in Spotlight AI, enables one to manage more complex tasks while still being economical and scalable.

Avoid Hallucinations: A “hallucination” is when an LLM generates a response that contains incorrect information. It can be hard to tell when an LLM generates a hallucination; therefore, it is highly recommended that the factual accuracy of the response be verified. Asking the same question in multiple ways can help to identify and avoid hallucinations.

Avoiding hallucinations is key for the legal industry, which is why Hanzo has adopted a complex strategy of multi-step and multi-faceted prompting for a case. With Spotlight AI, each step and facet addresses a different aspect of the case at hand and can be related back to the description of the case. This strategy gives us much greater confidence that the LLM is behaving correctly.

From Broad to Specific: An example of “too broad of a question” may be something like “Explain eDiscovery.” Based on the output, you might refine this to “Outline the steps in the eDiscovery process for intellectual property cases.”

Assign a Role: You can guide an LLM in its response by specifying what role it should have or what tone to set. For example, the following two prompts would generate very different results. “You are a highly qualified lawyer, explain eDiscovery.” versus “You are a teenager at school. Explain eDiscovery.” You can also indicate the qualities you want in the text. For example, you can prefix a prompt with “You are a helpful, honest, modest employee.” This can help generate better, more reliable responses.

Context is King: Instead of “What’s spoliation?” ask, “In the context of eDiscovery, define spoliation and its implications.” LLMs cannot “think”; they can only generate text, and they cannot understand what you are asking. They can, however, generate text closer to what you expect if they are provided with additional context.

Iterative Approach: Think of prompt engineering like sculpting. You start with a rough shape and then continuously refine. For example, if the AI doesn’t provide an exhaustive list initially, re-prompt with, “List more steps involved in the eDiscovery process.” Spotlight AI gives the user the opportunity to interact at key steps of the process, which is particularly useful for complex cases.

Control Parameters at Play: If your initial prompt, “Describe eDiscovery challenges,” yields a very brief answer, you might adjust verbosity controls for a more detailed response. You can adjust your prompt to indicate the length of the response you want and even the format. You could write, “Describe eDiscovery challenges. Write the answer as a list of between 5 and 10 bullet points.” or “Tell me about data privacy in no more than 50 words.”

Avoid Leading the AI: Rather than “Isn’t metadata crucial in eDiscovery?”, use “How is metadata used in eDiscovery?” LLMs tend to give neutral replies, so it is easier to generate text that does not express an opinion.

Direct the Format: Instead of “Tell me about eDiscovery tools,” specify “List five top tools used in modern eDiscovery and their primary functions.” You can also provide examples to better guide the LLM with its answer. For example, you could write “Q: List five animals. A: [Cat, Dog, Sheep, Horse, Cow] Q: List five cities. A: ” This prompt would be more likely to give a list in the form you expect.

Break Down Complex Queries: If you ask, “How does eDiscovery differ internationally?” you might refine it to a series of prompts, like “Describe eDiscovery practices in the US,” followed by “Compare eDiscovery practices in the US and Europe.” LLMs can generate text that is similar to the prompt that you enter, and every clause you add to a sentence makes the text generation harder. If you provide a complex enough prompt, the LLM may generate a response that only contains text relevant to some of the prompt, which can lead to inaccurate results. For example, an LLM (and a human!) would probably not generate a useful response to the following prompt: “Is it safe to cross a road at a crossing if the crossing light is broken, and there is a crossing guard in the road, but the crossing guard cannot see you, nearby traffic is moving slowly, it is a weekday, the weather is sunny, and you are hard of hearing?”.

Challenges & Identifying Ineffective Prompts

Crafting effective prompts for AI in the realm of eDiscovery presents its own set of challenges. One of the most common pitfalls is vagueness. If the AI consistently returns broad or off-topic responses, it’s likely that the prompts are lacking specificity and clarity. However, on the flip side, being overly specific can also be problematic. For instance, a prompt like “List eDiscovery tools for email” might miss tools that are versatile enough to handle both emails and other data types. Another challenge is unintentionally leading the AI, which can introduce bias. If the AI appears to be merely confirming pre-existing biases or seems overly agreeable, it’s an indication that the prompts might be skewing the output. Furthermore, while AI is a potent tool, an over-reliance on its insights without human oversight can derail the entire eDiscovery strategy. Lastly, a clear sign of prompt inefficiency is when the AI’s answers consistently misalign with expert expectations, indicating a need for prompt refinement.

There is always the possibility of hallucination, so it is best to check the results. If in doubt, imagine you are using a search engine and ask if you would just read the results that come back or if you would click through to the source and verify its authenticity. It’s probably okay to ask, “Tell me about vacations in Alaska,” to get a good sense of what a vacation in Alaska would be like, but it would be unwise to rely on only an LLM or search engine to tell you about hotel availability in Alaska. When making real-world decisions, do not rely on only an LLM or a search engine for answers.


Generative AI holds immense promise for eDiscovery, but its success lies in mastering the art of prompt engineering. With clear, iterative, and unbiased prompts, legal professionals can harness AI’s capabilities to their fullest potential. However, not everyone needs to become proficient in this art. As illustrated with the example of Spotlight AI, you can also use a tool that leverages prompt engineering. However, like any tool, it’s advantageous to have a reasonable understanding of its functioning, to some extent at least. We hope this post has aided you in gaining that understanding!