Data, Not Documents: Unlocking the True Value of Your Digital Evidence

Posted on Wednesday, May 28th, 2025 in Database Discovery

To fully harness the power of digital evidence in discovery, legal teams must start to think of the evidence as data, not documents. Here’s why.

The nature of evidence has undergone a seismic shift over the past few decades. A world that was once dominated by paper-based records, physical files, and manually stored information has now become a digital landscape overflowing with emails, chat messages, social media posts, cloud-based collaboration data, and dynamic structured and unstructured data sources. This evolution has profoundly impacted how discovery is conducted – spawning an entirely new industry called “eDiscovery” – forcing legal teams to rethink how they manage and analyze evidence in litigation and investigations.

Despite this transformation of evidence to digital data, the legal industry has continued to operate under a document-centric paradigm and looks to treat everything as a “document” within the eDiscovery life cycle. That may be fine for emails and office files (like Word documents and PowerPoint presentations) which are very document centric.

But, as ESI has continued to evolve, it has continued to resemble documents less and less. Text and chat messages, for example, are short forms of communication that are conversation oriented, not document oriented. Yet, many legal teams have taken these messages and converted them into static documents with messages grouped by an arbitrary time frame (typically, 24 hours or the entire conversation) – before even analyzing and reviewing them.

The move to the cloud has also increased the use of enterprise solutions (like Salesforce), which are database oriented, not document oriented. Text/chat messages and enterprise solution databases are just two examples of how evidence today is available for discovery as data, not documents.

A data-driven approach treats digital evidence in its native form, preserving its structure, interactivity, and (very importantly) metadata. Taking a “data, not documents” approach enables more advanced analysis, including visualization of communication patterns, automated detection of anomalies, and AI-driven insights that are currently unavailable when legal professionals rush to force the evidence into their document paradigm. By embracing a “data, not documents” approach to evidence early in the case, legal professionals can unlock the full evidentiary value of their digital assets, improving efficiency, accuracy, and decision-making in the process.

The Big Myth Regarding Unstructured Data

If you’ve ever worked with data in a Microsoft Excel workbook or Google Sheets, you know that there is an inherent organization to that data. Data is organized in rows and columns—making it easy to organize that data efficiently. That’s great for Excel, but that doesn’t help when working with data types like text messages, right? Wrong. The big myth about unstructured data is that there is no structure to it at all, while the reality is that many unstructured data types are contained within structured databases and other container files.

Take text messages, for example. While text messages are considered unstructured data due to their free-form nature, they are typically stored within a structured SQLite database on mobile devices. SQLite is a lightweight, self-contained relational database management system that organizes data into tables with rows and columns. While the storage structure is highly organized, the content of the messages themselves remains unstructured. This combination allows for efficient searching, retrieval, and indexing while still dealing with free-form text data. Advanced analytics tools can extract insights from these databases by applying natural language processing (NLP) techniques to unstructured message content.

Text messages aren’t the only example of unstructured data that’s stored within structured container files – there are several other examples, including: emails in a PST or MBOX file, documents in a Document Management System (DMS) like SharePoint, multimedia files (like images, videos, and audio recordings) stored as binary large objects (BLOBs) in relational databases.

Benefits of Working with Structured Data

Are there benefits of working with structured data, not documents in discovery? Yes! Here are some of those benefits:

  • Consistency and Accuracy – Data stored in structured formats follows strict data integrity rules, ensuring reliable results when querying.
  • Automation and AI Readiness – Machine learning, predictive coding, and AI-driven analytics perform better on structured datasets because the data is typically clean and well-organized.
  • Faster Processing Speeds – Queries on structured data run much faster than full-text searches in unstructured documents, reducing processing time and costs.
  • Regulatory Compliance and Audits – Structured data is easier to track, log, and audit, which facilitates compliance with regulations like GDPR, CCPA, and SEC rules.
  • Data Linking and Correlation – Structured data allows legal teams to connect related information across different data sources (e.g., linking email metadata with HR databases to track employee departures before key document deletions).

Structured data significantly also enhances data queries, visualization, and dashboarding, making early data assessment (EDA) in eDiscovery more efficient. With structured data, data visualization and dashboards further enhance early assessment by providing interactive, real-time insights into key metrics. Compared to unstructured data, which requires extensive manual review or advanced text-based searching, structured data enables fast and accurate analysis and retrieval of relevant information.

Example of a dashboard in the Sapling Platform

 

Use Cases for Working with Structured Data

There is a virtually unlimited set of applicable use cases for working with structured data. Here are some examples that illustrate the importance of working with this evidence as data, not documents:

  • Sales and Customer Support Data: Analyzing structured data from customer support tickets provides valuable insights into how a client manages customer interactions, resolves issues, and maintains service quality.
  • Medical Billing Data: From a healthcare perspective, medical billing data can be a key data source to identify potential billing mistakes or even potential instances of fraud.
  • Wearable Medical Device Data: Analyzing structured data from wearable heart monitors is vital for assessing whether monitoring centers are responding to cardiac events in a timely and effective manner.
  • Data on Employment Disputes: Structured data can be highly important in an employment dispute by providing clear, quantifiable evidence related to workplace activities, policies, and compliance.
  • Text Messages: While the examples above are related to specific types of claims and cases, the discovery of text messages has become common across all types of cases. The arbitrary way in which many eDiscovery providers convert those text messages into PDF files in 24-hour “chunks” eliminates the flexibility of working with the text messages individually and targeting specific conversations and relevant and responsive messages within those conversations that you can get when you work with the messages in their structured data container format.
Example of Text Message Review in the Sapling Platform

Conclusion

Do you still rent movies from Blockbuster Video? Of course not – streaming technology has advanced to the point that physically going to a store and renting a movie is obsolete. The “dumbing down” of structured data by forcing it into the document paradigm that legal teams have been accustomed to – for centuries, literally – is just as obsolete. It’s time for the legal industry to cast aside the document paradigm and start thinking about the evidence as data, not documents – doing so can transform your eDiscovery workflows!

Find out more about the considerations of working with evidence as data, not documents – including how we got here, understanding evidence today and what working with structured data looks like – in our white paper here!