Data, Not Documents: Unlocking the True Value of Your Digital Evidence

To harness the power of 21st century digital evidence in discovery, legal teams must start to think of the evidence as data, not documents. Here’s why.

The nature of evidence has undergone a seismic shift over the past few decades. A world once dominated by paper-based records, physical files, and manually stored information is now a digital landscape overflowing with email, chat messages, social media posts, cloud-based collaboration data, and dynamic structured and unstructured data sources. This evolution has profoundly impacted how discovery is conducted, forcing legal teams to rethink how they manage and analyze evidence in litigation and investigations.

Despite this transformation of evidence to digital data, the legal industry has continued to operate under a document-centric paradigm and looks to treat everything as a “document” within the eDiscovery life cycle. That may be fine for emails and office files (like Word documents and PowerPoint presentations) which are very document centric.

But, as electronic information evolves, it resemble documents less and less. Text and chat messages, for example, are conversation-oriented, not document-oriented. Yet, many legal teams have taken these messages and converted them into static documents with messages grouped by an arbitrary time frame (typically, 24 hours or the entire conversation) – before even analyzing and reviewing them.

The move to the cloud has also increased the use of enterprise solutions like Salesforce.com, which are database oriented, not document oriented. Text/chat messages and enterprise solution databases are just two examples of how evidence today is available for discovery as data, not documents.

A data-driven approach treats digital evidence in its native form, preserving its structure, interactivity, and (very importantly) metadata. Taking a “data, not documents” approach enables more advanced analysis, including visualization of communication patterns, automated detection of anomalies, and AI-driven insights that are currently unavailable when legal professionals rush to force the evidence into their document paradigm. By embracing a “data, not documents” approach to evidence early in the case, legal professionals can unlock the full evidentiary value of their digital assets, improving efficiency, accuracy, and decision-making in the process.

The Big Unstructured Data Myth

If you’ve ever worked with data in a Microsoft Excel workbook or Google Sheets, you know that there is an inherent organization to that data. Data is organized in rows and columns—making it easy to organize that data efficiently. That’s great for Excel, but that doesn’t help when working with data types like text messages, right? Wrong. The big myth about unstructured data is that there is no structure to it at all, while the reality is that many unstructured data types are contained within structured databases.

Take customer service interactions in Zendesk, for example. While often large, free-form notes fields hold information about each customer interaction, the information is stored within a structured database in the cloud alongside significant amounts of contextual information such as the customer name, account number, organization, and links to past conversations. While the storage structure is highly organized, the content of the messages themselves remains unstructured. This combination allows for efficient searching, retrieval, and indexing while still dealing with free-form text data. Advanced analytics tools can extract insights from these databases using filters, AI agents, or other technologies to flag and separate responsive from unresponsive records.

What is not effective in this case, is transforming each Zendesk (or insert other system name here) record into a pdf so that it can be handed to a document-based eDiscovery system. This approach is not efficient, handicaps counsel’s ability to filter, and results in loss of critical insight and analytical power.

Benefits of Working with Structured Data

Are there benefits of working with structured data, not documents in discovery? Yes! Here are some of those benefits:

Consistency and Accuracy – Data stored in structured formats follows strict data integrity rules, ensuring reliable results when querying.
Automation and AI Readiness – Machine learning, predictive coding, and AI-driven analytics perform better on structured datasets because the data is typically clean and well-organized.
Faster Processing Speeds – Queries on structured data run much faster than full-text searches in unstructured documents, reducing processing time and costs.
Regulatory Compliance and Audits – Structured data is easier to track, log, and audit, which facilitates compliance with regulations like GDPR, CCPA, and SEC rules.
Data Linking and Correlation – Structured data allows legal teams to connect related information across different data sources (e.g., linking email metadata with HR databases to track employee departures before key document deletions).

Structured data significantly also enhances data queries, visualization, and dashboarding, making early data assessment (EDA) in eDiscovery more efficient. With structured data, data visualization and dashboards further enhance early assessment by providing interactive, real-time insights into key metrics. Compared to unstructured data, which requires extensive manual review or advanced text-based searching, structured data enables fast and accurate analysis and retrieval of relevant information.

Example of a dashboard in the Sapling Platform

Use Cases for Working with Structured Data

There is a virtually unlimited set of applicable use cases for working with structured data. Here are some examples that illustrate the importance of working with this evidence as data, not documents:

Sales and Customer Support Data: Analyzing structured data from customer support tickets provides valuable insights into how a client manages customer interactions, resolves issues, and maintains service quality.
Medical Billing Data: From a healthcare perspective, medical billing data can be a key data source to identify potential billing mistakes or even potential instances of fraud.
Wearable Medical Device Data: Analyzing structured data from wearable heart monitors is vital for assessing whether monitoring centers are responding to cardiac events in a timely and effective manner.
Data on Employment Disputes: Structured data can be highly important in an employment dispute by providing clear, quantifiable evidence related to workplace activities, policies, and compliance.
Text Messages: While the examples above are related to specific types of claims and cases, the discovery of text messages has become common across all types of cases. The arbitrary way in which many eDiscovery providers convert those text messages into PDF files in 24-hour “chunks” eliminates the flexibility of working with the text messages individually and targeting specific conversations and relevant and responsive messages within those conversations that you can get when you work with the messages in their structured data container format.

*Example of Text Message Review in the Sapling Platform*

Conclusion

Do you still rent movies from Blockbuster Video? Of course not – streaming technology has advanced to the point that physically going to a store and renting a movie is obsolete. The “dumbing down” of structured data by forcing it into the document paradigm that legal teams have been accustomed to – for centuries, literally – is just as obsolete. It’s time for the legal industry to cast aside the document paradigm and start thinking about the evidence as data, not documents – doing so can transform your eDiscovery workflows!

Find out more about the considerations of working with evidence as data, not documents – including how we got here, understanding evidence today and what working with structured data looks like – in our white paper here!

Data, Not Documents: Unlocking the True Value of Your Digital Evidence

The Big Unstructured Data Myth

Benefits of Working with Structured Data

Use Cases for Working with Structured Data

Conclusion

About Sapling

Quick Links