Part III: Navigating AI Success Metrics – Bringing It All Together

18 April 2024

Introduction to mixed data in legal review

In Parts I and II, we tackled the basics of applying Recall, Rejection, and Precision in evaluating AI’s performance on document and conversational datasets. Part III brings these threads together for legal professionals seeking to enhance eDiscovery and investigation processes with AI. In Part III, we’ll navigate the complexities of combining emails and conversational data, the practicalities of using Large Language Models (LLMs), and rethink our definitions of “Document” and “Recall” to fit this mixed data landscape. We want to demystify how these advanced tools can be tailored to the unique needs of legal review and investigatory workflows.

Understanding the data landscape

With emails, we can work with a single document that arrives at one time and contains much information. This makes it simple to define Recall, Rejection, and Precision and to use TAR and CAL to speed up classification. However, when we add conversational data to the mix, we must change how we define documents to include surrounding messages and think about discussions instead of messages. This provides a challenge for TAR and CAL, as we need to define a new type of document, a group of messages, and any definition comes with its own problems. Do we group messages together in chunks of time? If so, we will have many orphan messages that appear on their own. If we group messages consecutively, we will have adjacent messages that could be hours or days apart. Whichever way we choose to split the conversations up, sometimes a discussion will be split across two or more chunks of messages. There are also differences between typical values for Recall and Precision between emails and conversational data, with emails typically having higher Recall but lower Precision and conversational data typically having lower Recall but higher Precision. When combining different datasets, it’s useful to keep track of the multiple values for Recall or Precision to reflect the different natures of the datasets.

The role of large language models (LLMs)

Using LLMs gives us more flexibility:, in fact, with LLMs, there’s no need for TAR or CAL. To address the issue of fragmented discussions across multiple message chunks, we first segment conversational data into manageable pieces. Then, using LLMs, we assess each chunk for relevance. When a chunk is deemed relevant, we refine our analysis by examining each message within it individually. If a relevant message is identified, we expand our focus to include surrounding messages, both preceding and following, thereby reconstructing the discussion more comprehensively. This allows us to consider a high-level view of multiple messages, narrow down on individual messages, and then broaden the scope to include the context. This process can be automated to deliver a collection of Relevant content for both document-based datasets and conversational datasets at a high level of Recall. Using LLMs, we can shift the burden of human workers from labeling a corpus to evaluating the output of LLMs, effectively completely replacing a lot of tedious manual labor.

Navigating the complexity of overlapping datasets

The end user’s new challenge is making sense of multiple, overlapping datasets. Suppose you have one discussion that spans the hours of 10:00-14:00, another discussion that spans the hours of 12:00-16:00, and emails that arrive at 11:00 and 15:00. How can we best present this information in a way that allows the user to follow the content of the various data sources? Traditional chronological review may not be good enough, so we may need to develop new tools to view the data differently.

Adapting review strategies for modern data challenges

The new technologies of conversational data and LLMs will change how we process datasets, and efficiency of effort will not be the only factor to consider. The roles of review teams will need to change to take advantage of the new technologies, and efforts will need to shift from tedious manual reviews to setting up automated reviews and more nuanced evaluations of those reviews. We need a modern approach to solve modern problems, and we need to rethink what we mean by “Document” and “Recall.” By doing this, we can combine data from multiple sources to build a narrative and continue to explore ever larger and ever more complex datasets with reasonable amounts of time and resources.

Conclusion: Advancing Legal Review with AI

As we conclude our Navigating AI Success Metrics series, we explored Recall, Rejection, Precision, and the innovative use of Large Language Models (LLMs) in handling complex data landscapes. Through this series, a few key themes emerged. First, the integration of diverse data types—ranging from document-based datasets to dynamic conversational threads—demands a reevaluation of traditional metrics and methodologies. We’ve seen how LLMs offer unparalleled flexibility, allowing us to redefine what constitutes a “Document” and how we measure “Recall” in the age of AI.

The transition towards automated, AI-driven review processes streamlines the identification of relevant information and significantly alleviates the manual burden on legal professionals. By adopting these advanced technologies, review teams can shift their focus from the laborious task of data labeling to the more nuanced and critical role of evaluating AI-generated insights.

Looking forward, the challenge lies in refining these technologies and ensuring they are accessible and adaptable to the specific needs of legal review. As AI continues to evolve, so must our strategies for data analysis, which require ongoing education and adaptation.

This series has aimed to provide a foundation for understanding AI’s pivotal role in transforming legal review processes. By embracing these new tools, we also open the door to further innovation, encouraging a collaborative approach to problem-solving that leverages the best technology to meet modern legal practices’ complex demands.

That’s a wrap for this ‘AI Success Metrics’ Series – make sure to explore the preceding segments for further insights and knowledge or consider exploring alternative topics like the Collaboration Data Survey Results Series!