Unprecedented Discovery Data Volume in FTX Case Highlights Growing Need for AI

21 June 2023

In the ongoing case of Sam Bankman-Fried and his failed crypto exchange FTX, the growing volume of evidence highlights the new landscape of ediscovery challenges when it comes to the breadth of new data sources showing up in corporate litigation.

According to the New York Times, this evidence ranges from snippets of computer code, Slack messages, and other digital records, to more than six million pages of emails and a small black notebook, filled with handwritten observations.

The Times reports that this is “among the largest ever collected in a white-collar securities fraud case prosecuted by the federal authorities in Manhattan.” For comparison, the 2004 securities fraud prosecution of Martha Stewart produced 525,000 pages of evidence for the defense team.

In the FTX case, Mr. Bankman-Fried’s Google accounts alone contained 2.5 million pages of ESI, and according to Nicolas Roos, a federal prosecutor investigating FTX, “the government had obtained a laptop crammed with so much information that the FBI technicians were struggling to decipher it.”

So far, the government has handed over nearly one million documents obtained from witnesses and other third parties in the case. Roos said, “We have produced 927,000, so that leaves, quick math, maybe 110,000.” To which the judge replied, “Bedtime reading.”

Time is of the Essence

The Federal Rules of Civil Procedure (FRCP) sets forth a timeline for parties to “discuss any issues about preserving discoverable information; and develop a proposed discovery plan” under Rule 26(f), giving legal teams only 69 days from the initial complaint to begin framing their litigation strategy.

This is where Early Case Assessment (ECA) comes into play. ECA is conducted early in the litigation process to evaluate the merits, risks, and potential costs of a legal case through the analysis of potential data sources. By conducting ECA, legal teams can assess the viability of pursuing a case further, determine the potential risks and costs involved, and develop an effective litigation strategy.

AI and Legal

Machine Learning has been used for many years by legal teams in the form of Technology Assistant Review (TAR) and more recently with Continuous Active Learning (CAL), but this was almost exclusively done with email datasets during the review stage of the process. 

With the widespread adoption of collaboration platforms like Slack, MS Teams, Asana, Jira, and Confluence, enterprise data is more voluminous than ever, filled with emojis and disjointed single lines of text where meaning is difficult to discern without context. The content is informal, reflecting the move to a more remote workplace where chat platforms have become the new water cooler around the office.

To add to the complexity, users easily switch between different conversations, including one-on-one chats, group chats, and even company-wide communications, so that a single conversation may span multiple channels. Examining these communications to determine relevance, sometimes months or years later, can be challenging, to say the least.

Then, in addition to the informality and the complexity of collaboration data, a final challenge is the sheer volume of the data, as we see in the FTX case. This increase in communication volume has been particularly noticeable since the COVID-19 pandemic, with some legal matters involving only a few custodians generating millions of messages that need to be collected and reviewed during discovery. Usage numbers continued to rise in 2021, 2022, and into 2023, with some users continuing to double their data on platforms like MS Teams and Slack, creating enterprise environments with message counts in the hundreds of millions.

Large enterprises can enhance their ediscovery tools by incorporating a data intelligence layer. This enables efficient ECA of the extensive datasets involved in modern litigation. With fast and precise message and file intelligence, in-house legal teams have access to the information they need to support ediscovery, empowering them to sift through vast amounts of data, construct robust case strategies, and expedite a cost-effective resolution to the legal matters at hand.

In a recent conversation with Dave Ruel, VP of Product at Hanzo, he said:

“Enterprises across many industries are accelerating the adoption of collaboration applications. Yet, many don’t have solutions to govern this data, discover insights from it, and reliably preserve it to meet their legal and regulatory requirements. Smarter collection solutions and proven, scalable AI enhancements give large enterprises the ability to manage the ever-growing risk associated with short-message business communications.”


With data volumes growing exponentially in corporate litigation, it’s clear that brute-forcing human review is not sustainable. Review has always been the most expensive stage of the ediscovery process, but now with the large datasets involved, simply throwing more junior attorneys in a room to comb through millions of documents isn’t tenable or cost-effective. AI-assisted ECA will continue to increase in importance for establishing scope and relevancy so that attorneys can focus their efforts on building effective case strategies for their clients.