Integrating Structured Data Into the E-Discovery Process

This blog post initially initially appeared on Exterro and is reblogged here with their consent.

As data volumes increase and become more complex, having an integrated e-discovery environment where systems and data sources are automatically synching information and exchanging data with e-discovery applications has become critical for organizations. This is true of unstructured and semi-structured data sources, such as email servers and content management systems, as well as structured data sources, like databases and data archives. The topic of systems integration will be discussed on Exterro’s E-Discovery Masters series webcast, “Optimizing E-Discovery in a Multi-Vendor Environment.”

I recently interviewed Jim FitzGerald, Sr. Director for Exterro, and Josh Alpern, VP ILM Domain Experts for Informatica, about the important and often overlooked role structured data plays during the course of e-discovery.

Q: E-Discovery demands are often discussed in the context of unstructured data, like email. What are some of the complications that arise when a matter involves structured data?

Jim: A lot of e-discovery practitioners are comfortable with unstructured data sources like email, file shares, or documents in SharePoint, but freeze up when they have to deal with structured data. They are unfamiliar with the technology and terminology of databases, extracts, report generation, and archives. They’re unsure about the best ways to preserve or collect from these sources. If the application is an old one, this fear often gets translated into a mandate to keep everything just as it is, which translates to mothballed applications that just sit there in case data might be needed down the road. Beyond the costs, there’s also the issue that IT staff turnover means that it’s increasingly hard to generate the reports Legal and Compliance need from these old systems.

Josh: Until now, e-discovery has largely been applied to unstructured data and email for two main reasons: 1) a large portion of relevant data resides in these types of data stores, and 2) these are the data formats that everyone is most familiar with and can relate to most easily. We all use email, and we all use files and documents. So it’s easy for people to look at an email or a document and understand that everything is self-contained in that one “thing.” But structured data is different, although not necessarily any less relevant. For example, someone might understand conceptually what a “purchase order” is, but not realize that in a financial application a purchase order consists of data that is spread across 50 different database tables. Unlike with an email or a PDF document, there might not be an easy way to simply produce a purchase order, in this example, for legal discovery without understanding how those 50 database tables are related to each other. Furthermore, to use email as a comparison, everyone understands what an email “thread” is. It’s easy to ask for all the emails in single thread, and usually it’s relatively easy to identify all of those emails: they all have the same subject line. But in structured data the situation can be much more complicated. If someone asks to see every financial document related to a single purchase order, you would have to understand all of the connections between the many database tables that comprise all of those related documents and how they related back to the requested purchase order. Solutions that are focused on email and unstructured data have no means to do this.

Q: What types of matters tend to implicate structured data and are they becoming more or less common?

Jim: The ones I hear about most common are product liability cases where they need to look back at warranty claims or drug trial data, or employment disputes around pay history and practices, or financial cases where they need to look at pricing or trading patterns.

Josh: The ones that Jim mentioned are certainly prevalent. But in addition, I would add that all kinds of financial data are now governed by retention policies largely because of the same concerns that arise from potential legal situations: at some point, someone may ask for it. Anything related to consumer packaged goods, vehicle parts (planes, boats, cars, trucks, etc.) as well as industrial and durable goods, which tend to have very long lifecycles, are increasingly subject to these types of inquiries.

Q: Simply accessing legacy data to determine its relevance to a matter can present significant challenges. What are some methods by which organizations can streamline the process?

Jim: If you are keeping around mothballed applications and databases purely for reporting purposes, these are prime targets to migrate to a structured data archive. Cost savings from licenses, CPU, and storage can run to 65% per year, with the added benefit that it’s much easier to enforce a retention policy on this data, roll it off when it expires, and compliance reporting is easier to do with modern tools.

Josh: One huge challenge that comes from these legacy applications stems from the fact that there are typically a lot of them. That means that when a discovery request arises, someone – or more likely multiple people – have to go to each one of those applications one by one to search for and retrieve relevant data. Not only is that time consuming and cumbersome, but it also assumes that there are people with the skill sets and application knowledge necessary to interact with all of those different applications. In any given company, that might not be a problem *today*, shortly after the applications have been decommissioned, because all the people that used the applications when they were live are still around. But will that still be the case 5, 7, 10 or 20 years from now? Probably not. Retiring all of these legacy applications into a “platform neutral” format is a much more sustainable, not to mention cost effective, approach.

Q: How can e-discovery preservation and collection technologies be leveraged to help organizations identify and “lock down” structured data?

Jim: Integrating e-discovery — legal holds and collections — with your structured data archive can make it a lot easier to coordinate preservation and collection activities across the two systems.   This reduces the chances of stranded holds — data under preservation that could have been released, and reduces the ambiguity about what needs to happen to the data to support the needs of legal and compliance teams.

Josh: Just as there are solutions for “locking down” unstructured and semi-structured (email) data, there are solutions for locking down structured data. The first and perhaps most important step is recognizing that the solutions for unstructured and semi-structured data are simply incapable of handling structured data. Without something that is purpose built for structured data, your discovery preservation and collection process is going to ignore this entire category of data. The good news is that some of the solutions that are purpose built for structured data have built in integrations to the leading e-discovery platforms.

You can hear more from Informatica’s Josh Alpern and Exterro’s Jim FitzGerald by attending Exterro’s CLE-accredited webcast, “Optimizing E-Discovery in a Multi-Vendor Environment,” airing on Thursday, September 4. Learn more and register here.