Unlocking the Unstructured Data Treasure Trove

“Why are enterprises still clueless about unstructured data?”

Unlocking the Unstructured Data Treasure TroveThat’s the question asked by Reetika Fleming, writing at the HfS site. By clueless, Fleming points to an HfS study in which close to half of executives, 45%, believe that only a small percentage (0-25%) of their data is unstructured. She defines unstructured data as “digital pictures, videos, social media feeds, web content, handwriting and sketches, and voice memos.”

This aligns with a survey I helped conduct several years ago, which found executives to be generally unaware of how much or what kind of unstructured data they have flowing within their organizations. Close to half at the time, 45% reported their managers and professionals are aware, and 55% say there is little or no awareness, or they’re not sure of the level of awareness. It was significant that only 15% of respondents were able to report that there was a high degree of awareness of these data assets.

That still remains the case today, especially in an era of data flowing in from all corners of the enterprise – and all corners of the world, for that matter. The initiatives that matter to doing business in the months and years ahead – analytics, artificial intelligence, and robotic process automation – rely on the ability to leverage the volumes of unstructured data flowing through organizations.

“A surprising number of executives underestimate how much unstructured data their business generates or uses, which can ultimately impact the benefits,” Fleming writes. “Overcoming our manual, disjointed processes will require us to deal with unstructured data, and make it usable for our automation engines. When you start down the path of say, implementing a robotics tool, you realize, for example, how many handwritten scribbles your OCR tool can’t pick up, or non-standard forms, PDFs and images your processes involve today. You have to be able to recognize and use this unstructured data or risk your organization still having disjointed workflows and gaps going into the digital age.”

There are varying estimates of how much enterprise is unstructured, ranging as high as 80% of all data assets. This may or may not be the case, as, again, many executives just are not certain how much unstructured data they have under their roofs.

Here are ways to get a handle on the unstructured data assets that are available to today’s enterprises:

Develop an enterprise strategy that addresses unstructured data: “Organizations need to create their data strategy down to the process layer across functions, investigating the role of unstructured data, the extent to which it can be interpreted where it is most meaningful, or invest in ways to harness it down the road,” says Fleming.

Survey business users on their unstructured data usage. Find out what types of data are most important to their jobs, and how they currently use the data. What will make it easier for them to manage and analyze the data sources? Part of this process will also involve user education on the value of their unstructured data to the enterprise, there may not be awareness that certain documents are available or sharable.

Categorize data at the point of ingestion. To avoid losing sight of unstructured data assets, “the best time to add relevant metadata and structure to data is at the point of ingestion,” advises Courtney Wilson of CloudFactory. “This is often easier said than done. Many programs simply don’t provide a way to add categorization upon creation, and those that do are often woefully inaccurate meaning that human intervention and intuition are needed to complete the job.” (Note: Informatica provides tools for this purpose.) 

Merge your unstructured data strategy with a knowledge management strategy. Many organizations have developed or implemented knowledge management systems that capture and index much of the content that is moving through their networks. All too often, the knowledge management system is managed separately from the data infrastructure. There needs to be a convergence of these two domains.

Design systems and interfaces that capture and present insights based on unstructured data. Many of today’s business intelligence tools and dashboards are configured to provide insights based on relational data. The next generation of systems and dashboards need to be capable of incorporating unstructured data.