Tag Archives: Analytics

Where Were You on January 21, 2009?

Open_Data

Open Data

January 21, 2009. Why in the world would that be a date to recall? Well, for one, it was the day after Barack Obama was inaugurated as the 44th President of the United States. And secondly, it was the day President Obama released an arguably game changing document, his Memorandum on Transparency and Open Government. This one document set the stage for a new era in how government would look at the data it collects and creates. Since that time, the world of data has changed dramatically! Consider this – new analytics tools, new data types, new devices creating data, new storage ideas, new visualization applications, new concepts, new laws – the list of innovations goes on.

But, all these great innovations are not really why I’m writing today. Today, I’d like to call your attention to a news article I read in NextGov, “Amid Open Data Push, Agencies Feel Urge for Analytics”. I have to admit, as I read this article, I found myself getting just a little bit giddy. Why? Great question, thanks for asking. J Before going on with my thoughts, please take a moment to read the article. Go ahead, I have time. I’ll wait.

Picking up where I left off…

Since 2009, the notion of “open data” has been discussed primarily from one of two main perspectives:

  • Transparency of government to citizens – Accountability
  • What the private sector can do – Innovation

No doubt, there have been significant advances on both of these topics. Yet, as important as these concepts are, budget and resource constraints can cause open data efforts to be prioritized lower than, say, a mission-critical program.

Of course, I get this – mission first – but, a couple years ago it hit me, maybe government agencies are not seeing a potential opportunity that’s sitting right in front of them. Along with the mandate to publish open data, is the opportunity to consume open data and get it into their analytics engines, thus, supporting the agency’s mission! Just this slight mind shift has the potential to turn open data initiatives into a means to create value. Now do you see why I am excited by the article? (If not, I’ll assume you’ve yet to read it.)  I’m thrilled to see agencies adding a third perspective to the open data conversation:

  • Consumption of open data – Improving an agency’s ability to deliver on its mission(s)

I am looking forward to following the success of any agency effort to take advantage of open data as a strategic resource. If you have other examples beyond the cases noted in the NextGov article, please share!

Share
Posted in 5 Sales Plays, Public Sector | Tagged , , | Leave a comment

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

Big Data

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

2014 was a pivotal turning point for Informatica as our investments in Hadoop and efforts to innovate in big data gathered momentum and became a core part of Informatica’s business. Our Hadoop related big data revenue growth was in the ballpark of leading Hadoop startups – more than doubling over 2013.

In 2014, Informatica reached about 100 enterprise customers of our big data products with an increasing number going into production with Informatica together with Hadoop and other big data technologies.  Informatica’s big data Hadoop customers include companies in financial services, insurance, telcommunications, technology, energy, life sciences, healthcare and business services.  These innovative companies are leveraging Informatica to accelerate their time to production and drive greater value from their big data investments.

These customers are in-production or implementing a wide range of use cases leveraging Informatica’s great data pipeline capabilities to better put the scale, efficiency and flexibility of Hadoop to work.  Many Hadoop customers start by optimizing their data warehouse environments by moving data storage, profiling, integration and cleansing to Hadoop in order to free up capacity in their traditional analytics data warehousing systems. Customers that are further along in their big data journeys have expanded to use Informatica on Hadoop for exploratory analytics of new data types, 360 degree customer analytics, fraud detection, predictive maintenance, and analysis of massive amounts of Internet of Things machine data for optimization of energy exploration, manufacturing processes, network data, security and other large scale systems initiatives.

2014 was not just a year of market momentum for Informatica, but also one of new product development innovations.  We shipped enhanced functionality for entity matching and relationship building at Hadoop scale (a key part of Master Data Management), end-to-end data lineage through Hadoop, as well as high performance real-time streaming of data into Hadoop. We also launched connectors to NoSQL and analytics databases including Datastax Cassandra, MongoDB and Amazon Redshift. Informatica advanced our capabilities to curate great data for self-serve analytics with a connector to output Tableau’s data format and launched our self-service data preparation solution, Informatica Rev.

Customers can now quickly try out Informatica on Hadoop by downloading the free trials for the Big Data Edition and Vibe Data Stream that we launched in 2014.  Now that Informatica supports all five of the leading Hadoop distributions, customers can build their data pipelines on Informatica with confidence that no matter how the underlying Hadoop technologies evolve, their Informatica mappings will run.  Informatica provides highly scalable data processing engines that run natively in Hadoop and leverage the best of open source innovations such as YARN, MapReduce, and more.   Abstracting data pipeline mappings from the underlying Hadoop technologies combined with visual tools enabling team collaboration empowers large organizations to put Hadoop into production with confidence.

As we look ahead into 2015, we have ambitious plans to continue to expand and evolve our product capabilities with enhanced productivity to help customers rapidly get more value from their data in Hadoop. Stay tuned for announcements throughout the year.

Try some of Informatica’s products for Hadoop on the Informatica Marketplace here.

Share
Posted in B2B Data Exchange, Big Data, Data Integration, Data Services, Hadoop | Tagged , , , , , , | Leave a comment

Informatica and Pivotal Delivering Great Data to Customers

Informatica and Pivotal Delivering Great Data to Customers

Delivering Great Data to Customers

As we head into Strata + Hadoop World San Jose, Pivotal has made some interesting announcements that are sure to be the talk of the show. Pivotal’s move to open-source some of their advanced products (and to form a new organization to foster Hadoop community cooperation) are signs of the dynamism and momentum of the Big Data market.

Informatica applauds these initiatives by Pivotal and we hope that they will contribute to the accelerating maturity of Hadoop and its expansion beyond early adopters into mainstream industry adoption. By contributing HAWQ, GemFire and the Greenplum Database to the open source community, Pivotal creates further open options in the evolving Hadoop data infrastructure technology. We expect this to be well received by the open source community.

As Informatica has long served as the industry’s neutral data connector for more than 5,500 customers and have developed a rich set of capabilities for Hadoop, we are also excited to see efforts to try to reduce fragmentation in the Hadoop community.

Even before the new company Pivotal was formed, Informatica had a long history working with the Greenplum team to ensure that joint customers could confidently use Informatica tools to include the Greenplum Database in their enterprise data pipelines. Informatica has mature and high-performance native connectivity to load data in and out of Greenplum reliably using Informatica’s codeless, visual data pipelining tools. In 2014, Informatica expanded out Hadoop support to include Pivotal HD Hadoop and we have joint customers using Informatica to do data profiling, transformation, parsing and cleansing using Informatica Big Data Edition running on Pivotal HD Hadoop.

We expect these innovative developments driven by Pivotal in the Big Data technology landscape to help to move the industry forward and contribute to Pivotal’s market progress. We look forward to continuing to support Pivotal technology and to an ever increasing number of successful joint customers. Please reach out to us if you have any questions about how Informatica and Pivotal can help your organization to put Big Data into production. We want to ensure that we can help you answer the question … Are you Big Data Ready?

Share
Posted in Big Data, Data Governance, Hadoop | Tagged , , , , , | Leave a comment

Stop Trying to Manage Data Growth!(?)

Data Downpour

Data Downpour

Talking to architects about analytics at a recent event, I kept hearing the familiar theme; data scientists are spending 80% of their time on “data wrangling” leaving only 20% for delivering the business insights that will drive the company’s innovation.  It was clear to everybody that I spoke to that the situation will only worsen.  The coming growth everybody sees in data volume and complexity, will only lengthen the time to value.

Gartner recently predicted that:

“by 2015, 50% of organizations will give up on managing growth and will redirect funds to improve classification and analytics.”

50 percent

“by 2015, 50% of organizations will give up on managing growth and will redirect funds to improve classification and analytics.”

Some of the details of this study are interesting.  In the end, many organizations are coming to two conclusions:

  • It’s risky to delete data, so they keep it around as insurance.
  • All data has potential business value, so more organizations are keeping it around for potential analytical purposes.

The other mega-trend here is that more and more organizations are looking to compete on analytics – and they need data to do it, both internal data and external data.

From an architect’s perspective, here are several observations:

  • The floodgates are open and analytics is a top priority. Given that, the emphasis should be on architecting to manage the dramatic increases in both data quantity and data complexity rather than on trying to stop it.
  • The immediate architectural priority has to be on simplifying and streamlining your current enterprise data architecture. Break down those data silos and standardize your enterprise data management tools and processes as much as possible.  As discussed in other blogs, data integration is becoming the biggest bottleneck to business value delivery in your environment. Gartner has projected that “by 2018, more than half the cost of implementing new large systems will be spent on integration.”  The more standardized your enterprise data management architecture is, the more efficient it will be.
  • With each new data type, new data tool (Hive, Pig, etc.), and new data storage technology (Hadoop, NoSQL, etc.) ask first if your existing enterprise data management tools can handle the task before people go out and create a new “data silo” based on the cool, new technologies. Sometimes it will be necessary, but not always.
  • The focus needs to be on speeding value delivery for the business. And the key bottleneck is highly likely to be your enterprise data architecture.

Rather than focusing on managing data growth, the priority should be on managing it in the most standardized and efficient way possible.  It is time to think about enterprise data management as a function with standard processes, skills and tools (just like Finance, Marketing or Procurement.)

Several of our leading customers have built or are building a central “Data as a Service” platform within their organizations.  This is a single, central place where all developers and analysts can go to get trustworthy data that is managed by IT through a standard architecture and served up for use by all.

For more information, see “The Big Big Data Workbook

*Gartner Predicts 2015: Managing ‘Data Lakes’ of Unprecedented Enormity, December 2014  http://www.gartner.com/document/2934417#

Share
Posted in Architects, CIO, Data Integration Platform | Tagged , , , , , , | Leave a comment

Healthcare Consumer Engagement

The signs that healthcare is becoming a more consumer (think patients, members, Healthcareproviders)  driven industry are evident all around us. I see provider and payer organizations clamoring for more data, specifically data that is actionable, relatable and has integrity. Armed with this data, healthcare organizations are able to differentiate around a member/patient-centric view.

These consumer-centric views convey the total value of the relationships healthcare organizations have with consumers. Understanding the total value creates a more comprehensive understanding of consumers because they deliver a complete picture of an individual’s critical relationships including: patient to primary care provider, member to household, provider to network and even members to legacy plans. This is the type of knowledge that informs new treatments, targets preventative care programs and improves outcomes.

Payer organizations are collecting and analyzing data to identify opportunities for  more informed care management and segmentation to reach new, high value customers in individual markets. By segmenting and targeting messaging to specific populations, health plans generate increased member satisfaction and cost effectively expands and manages provider networks.

How will they accomplish this? Enabling members to interact in health and wellness forums, analyzing member behavior and trends and informing care management programs with a 360 view of members… to name a few . Payers will also drive new member programs, member retention and member engagement marketing and sales programs by investigating complete views of member households and market segments.

In the provider space, this relationship building can be a little more challenging because often consumers as patients do not interact with their doctor unless they are sick, creating gaps in data. When provider organizations have a better understanding of their patients and providers, they can increase patient satisfaction and proactively offer preventative care to the sickest (and most likely to engage) of patients before an episode occurs. These activities result in increased market share and improved outcomes.

Where can providers start? By creating a 360 view of the patient, organizations can now improve care coordination, open new patient service centers and develop patient engagement programs.

Analyzing populations of patients, and fostering patient engagement based on Meaningful Use requirements or Accountable Care requirements, building out referral networks and developing physician relationships are essential ingredients in consumer engagement. Knowing your patients and providing a better patient experience than your competition will differentiate provider organizations.

You may say “This all sounds great, but how does it work?” An essential ingredient is clean, safe and connected data.  Clean, safe and connected data requires an investment in data as an asset… just like you invest in real estate and human capital, you must invest in the accessibility and quality of your data. To be successful, arm your team with tools to govern data –ensuring ongoing integrity and quality of data, removes duplicate records and dynamically incorporates data validation/quality rules. These tools include master data management, data quality, metadata management and are focused on information quality. Tools focused on information quality support a total customer relationship view of members, patients and providers.

Share
Posted in Customer Acquisition & Retention, Data Governance, Data Quality, Healthcare, Master Data Management | Tagged , , , , | Leave a comment

Big Data Is Neither-Part II

Big_DataYou Say Big Dayta, I say Big Dahta

Some say Big Data is a great challenge while others say Big Data creates new opportunities. Where do you stand?  For most companies concerned with their Big Data challenges, it shouldn’t be so difficult – at least on paper. Computing costs (both hardware and software) have vastly shrunk. Databases and storage techniques have become more sophisticated and scale massively, and companies such as Informatica have made connecting and integrating all the “big” and disparate data sources much easier and have helped companies achieve a sort of “big data synchronicity”. As it is.

In the process of creating solutions to Big Data problems, humans (and the supra-species known as IT Sapiens) have a tendency to use theories based on linear thinking and the scientific method. There is data as our systems know it and data as our systems don’t. The reality, in my opinion, is that “Really Big Data” problems now and in the future will have complex correlations and unintuitive relationships that need to utilize mathematical disciplines, data models and algorithms that haven’t even been discovered or invented yet and when eventually discovered, will make current database science positively primordial.

At some point in the future, machines will be able to predict, based on big, perhaps unknown data types when someone is having a bad day or a good day, or more importantly whether a person may behave in a good or bad way. Many people do this now when they take a glance at someone across a room and infer how that person is feeling or what they will do next. They see eyes that are shiny or dull, crinkles around eyes or sides of mouths, then hear the “tone” in a voice and then their neurons put it altogether that this is a person that is having a bad day and needs a hug. Quickly. No one knows exactly how the human brain does this, but it does what it does and we go with it and we are usually right.

U.S._Air_Force_Senior_Airman__130429-F-ZX232-013

And some day, Big Data will be able to derive this and it will be an evolution point and it will also be a big business opportunity. Through bigger and better data ingestion and integration techniques and more sophisticated math and data models, a machine will do this fast and relatively speaking, cheaply. The vast majority won’t understand why or how it’s done, but it will work and it will be fairly accurate.

And my question to you all is this.

Do you see any other alternate scenarios regarding the future of big data? Is contextual computing an important evolution and will big data integration be more or less of a problem in the future.

PS. Oh yeah, one last thing to chew on concerning Big Data… If Big Data becomes big enough, does that spell the end of modelling as we know it?

Share
Posted in Big Data, Business Impact / Benefits, Business/IT Collaboration, CMO, Complex Event Processing, Data Integration Platform, Hadoop, Intelligent Data Platform | Tagged , , , | Leave a comment

Big Data Is Neither-Part I

humongdataI’ve been having some interesting conversations with work colleagues recently about the Big Data hubbub and I’ve come to the conclusion that “Big Data” as hyped is neither, really. In fact, both terms are relative. “Big” 20 years ago to many may have been 1 terabyte. “Data” 20 years ago may have been Flat files, Sybase, Oracle, Informix, SQL Server or DB2 tables. Fast forward to today and “Big” is now Exabytes (or millions of terabytes). “Data” are now expanded to include events, sensors, messages, RFID, telemetry, GPS, accelerometers, magnetometers, IoT / M2M and other new and evolving data classifications.

And then there’s social and search data.

Surely you would classify Google data as really really big data – I can tell when I do a search, and get 487,464,685 answers within fractions of a second that they appear to have gotten a handle on their big data speeds and feeds. However, it’s also telling that nearly all of those bazillion results are actually not relevant to what I am searching for.

My conclusion is that if you have the right algorithms, invest in and use the right hardware and software technology and make sure to measure the pertinent data sources, harnessing big data can yield speedy &“big”results.

So what’s the rub then?

It usually boils down to having larger and more sophisticated data stores and still not understanding its structure, OR it can’t be integrated into cohesive formats, OR there is important hidden meaning in the data that we don’t have the wherewithal to derive, see or understand a la Google? So how DO you find the timely and important information out of your company’s big data (AKA the needle in the haystack)?

needlehaystack-Big Data

More to the point, how do you better ingest, integrate, parse, analyze, prepare, and cleanse your data to get the speed, but also the relevancy in a Big Data world?

Hadoop related tools are one of the current technologies of choice when it comes to solving Big Data related problems, and as an Informatica customer, you can leverage these tools, regardless of whether it’s Big Data or Not So Big Data, fast data or slow data. In fact, it actually astounds me that many IT professionals would want to go back to hand coding with a Hadoop tool just because they don’t know that the tools to do so are right under their nose, installed and running in their familiar Informatica User Interface (AND that work with Hadoop right out of the box.)

So what does your company get out of using Informatica in conjunction with Hadoop tools? Namely, better customer service and responsiveness, better operational efficiencies, more effective supply chains, better governance, service assurance, and the ability to discover previously unknown opportunities as well as stopping problems when they are an issue – not after the fact. In other words, Big Data done right can be a great advantage to many of today’s organizations.

Much more to say on this this subject as I delve into the future of Big Data. For more, see Part 2.

Share
Posted in Big Data, Business Impact / Benefits, Complex Event Processing, Intelligent Data Platform | Tagged , , , , | Leave a comment

The Quality of the Ingredients Make the Dish-Applies to Data Quality

Data_Quality

Data Quality Leads to Other Integrated Benefits

In a previous life, I was a pastry chef in a now-defunct restaurant. One of the things I noticed while working there (and frankly while cooking at home) is that the better the ingredients, the better the final result. If we used poor quality apples in the apple tart, we ended up with a soupy, flavorless mess with a chewy crust.

The same analogy can be applied to Data Analytics. With poor quality data, you get poor results from your analytics projects. We all know that companies that can implement fantastic analytic solutions that can provide near real-time access to consumer trends are the same companies that can do successful targeted marketing campaigns that are of the minute. The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year.

The business impact of poor data quality cannot be underestimated. If not identified and corrected early on, defective data can contaminate all downstream systems and information assets, jacking up costs, jeopardizing customer relationships, and causing imprecise forecasts and poor decisions.

  • To help you quantify: Let’s say your company receives 2 million claims per month with 377 data elements per claim. Even at an error rate of .001, the claims data contains more than 754,000 errors per month and more than 9.04 million errors per year! If you determine that 10 percent of the data elements are critical to your business decisions and processes, you still must fix almost 1 million errors each year!
  • What is your exposure to these errors? Let’s estimate the risk at $10 per error (including staff time required to fix the error downstream after a customer discovers it, the loss of customer trust and loyalty and erroneous payouts. Your company’s risk exposure to poor quality claims data is $10 million a year.

Once your company values quality data as a critical resource – it is much easier to perform high-value analytics that have an impact on your bottom line. Start with creation of a Data Quality program. Data is a critical asset in the information economy, and the quality of a company’s data is a good predictor of its future success.

Share
Posted in Business Impact / Benefits, Cloud Data Integration, Customer Services, Data Aggregation, Data Integration, Data Quality, Data Warehousing, Database Archiving, Healthcare, Master Data Management, Profiling, Scorecarding, Total Customer Relationship | Tagged , , , , | 2 Comments

The New Marketing Technology Landscape Is Here… And It’s Music to Our Ears!

How Do You Like It? How Do You Like It? More, More More!
Andrea+True+Connection+ANDREAChiefmartec came out with their 2015 Marketing Technology Landscape, and if there’s one word that comes to mind, it’s MORE. 1,876 corporate logos dot the page, up from 947 in 2014. That’s definitely more, more, more – just about double to be exact. I’m honestly not sure it’s possible to squeeze any more in a single image?

But it’s strangely fitting, because this is the reality that we marketers live in.  There are an infinite number of new technologies, approaches, social media platforms, operations tools, and vendors that we have to figure out. New, critical categories of technology roll out constantly. New vendors enter and exit the landscape. As Chiefmartec says “at least on the immediate horizon, I don’t think we’re going to see a dramatic shrinking of this landscape. The landscape will change, for sure. What qualifies as “marketing” and “technology” under the umbrella of marketing technology will undoubtedly morph. But if mere quantity is the metric we’re measuring, I think it’s going to be a world of 1,000+ marketing technology companies — perhaps even a world of 2,000+ of them — for some time to come.”

Marketing_Technology_Jan2015

Middleware: I’m Coming Up So You’d Better Get This Party Started!
pinkOne thing you’ll notice if you look carefully between last year’s and this year’s version, is the arrival of the middleware layer. Chiefmartec spends quite a bit of time talking about middleware, pointing out that great tools in the category are making the marketing technology landscape easier to manage – particularly those that handle a hybrid of on premise and cloud.

Marketers have long since cared about the things on the top – the red “Marketing Experiences” and the orange “Marketing Operations”. They’ve also put a lot of focus in the dark gray/black/blue layer “Backbone Platforms” like marketing autionation & e-commerce. But only recently has that yellow middleware layer become front and center for marketers. Data integration, data management platforms, connectivity, data quality, and API’s are definitely not new to the technology landscape, and have been a critical domain of IT for decades. But as marketers are becoming more and more skilled and reliant on analytics and focused customer experience management, data is entering the forefront.

Marketers cannot focus exclusively on their Salesforce CRM, their Marketo automation, or their Adobe Experience Manager web management. Data Ready marketers realize that each of these applications can no longer be run in a silo, they need to be looked at collectively as a powerful set of tools designed to engage the customer and push them through the buying cycle, as critical pieces to the same puzzle. And to do that, they need to be looking at connecting their data sources, powering them with great data, analyzing and measuring their results, and then deciding what to do.

If you squint, you can see Informatica in the yellow Middleware layer. (I could argue that it belongs in several of these yellow boxes, not just Cloud integration, but I’ll save that for another blog!) Some might say that’s not very exciting, but I would argue that Informtaica is in a tremendous place to help marketers succeed with great data. And it all comes down to two words… complexity and change.

Why You Have to Go and Make Things So Complicated?
avril-lavigne-avril-lavigne-34900869-1280-1024Ok, admittedly terrible grammar, but you get the picture. Marketers live in a trendounsly complex world. Sure you don’t have all 1,876 of the logos on the Technology Landscape in house. You probably don’t eve have one from each of the 43 categories. But you definitely have a lot of different tecnology solutions that you rely upon on a day-to-day basis. According to a September article by ChiefMarTech, most marketers already regularly rely on more than 100 software programs.

Data ready marketers realize that their environments are complicated, and that they need a foundation. They need a platform of great data that all of their various applications and tools can leverage, and that can actually connect all of their various applications and tools together. They need to be able to connect to just about anything from just about anything. They need a complete view of all of their interactions their customers. In short, they need to make their extremely complicated world more simple, streamlined, and complete.

Ch-Ch-Ch-Ch-Changes. Turn and Face the Strange!
David-Bowie-1973I have a tendency to misunderstand lyrics, so I have to confess that until I looked up this song today, I thought the lyric was “time to face the pain” (Bowie fans, I hang my head in shame!).  But quite honestly, “turn and face the strange” illustrates my point just as well!

There is no question that marketing has changed dramatically in the past few years.  Your most critical marketing tools and processes two years ago are almost certainly different than those this year, and will almost certainly be different from what you see two years from now.  Marketers realize this.  The Marketing Technology Landscape illustrates this every year!

The data ready marketer understands that their toolbox will change, but that their data will be the foundation for whatever new piece of the technology puzzle they embrace or get rid of.  Building a foundation of great data will power any technology solution or new approach.

Data ready marketers also work with their IT counterparts to engineer for change.  They make sure that no matter what technology or data source they want to add – no matter how strange or unthinkable it is today – they never have to start from scratch.  They can connect to what they want, when they want, leveraging great data, and ultimately making great decisions.

Get Ready ‘Cause Here I Come. The Era of the Data Ready Marketer is Here
TemptationsNow that you have a few catchy tunes stuck in your head, it’s time to ask yourself, are you data ready? Are you ready to embrace the complexity of marketing technology landscape? Are you ready to think about change as a competitive weapon?

I encourage you to take our survey about data ready marketing. The results are coming out soon so don’t miss your chance to be a part. You can find the link here.

Also, follow me on twitter – The Data Ready Marketer (@StephanieABest) for some of the latest & greatest news and insights on the world of data ready marketing.

And stay tuned because we have several new Data Ready Marketing pieces coming out soon – InfoGraphics, eBooks, SlideShares, and more!

Share
Posted in 5 Sales Plays, Big Data, CMO, Customers, Data Integration Platform, Enterprise Data Management, Intelligent Data Platform | Tagged , , , , , , , , | Leave a comment

Dark Data in Government: Sounds Sinister

Dark Data in Government: Sounds Sinister

Dark Data in Government: Sounds Sinister

Anytime I read about something characterized as “dark”, my mind immediately jumps to a vision of something sneaky or sinister, something better left unsaid or undiscovered. Maybe I watched too many Alfred Hitchcock movies in my youth, who knows. However, when coupled with the word “data”, “dark” is anything BUT sinister. Sure, as you might agree, the word “undiscovered” may still apply, but, only with a more positive connotation.

To level set, let’s make sure you understand my definition of dark data. I prefer using visualizations when I can so, picture this: the end of the first Indiana Jones movie, Raiders of the Lost Ark. In this scene, we see the Ark of the Covenant, stored in a generic container, being moved down the aisle in a massive warehouse full of other generic containers. What’s in all those containers? It’s pretty much anyone’s guess. There may be a record somewhere, but, for all intents and purposes, the materials stored in those boxes are useless.

Applying this to data, once a piece of data gets shoved into some generic container and is stored away, just like the Arc, the data becomes essentially worthless. This is dark data.

Opening up a government agency to all its dark data can have significant impacts, both positive and negative. Here are couple initial tips to get you thinking in the right direction:

  1. Begin with the end in mind – identify quantitative business benefits of exposing certain dark data.
  2. Determine what’s truly available – perform a discovery project – seek out data hidden in the corners of your agency – databases, documents, operational systems, live streams, logs, etc.
  3. Create an extraction plan – determine how you will get access to the data, how often does the data update, how will handle varied formats?
  4. Ingest the data – transform the data if needed, integrate if needed, capture as much metadata as possible (never assume you won’t need a metadata field, that’s just about the time you will be proven wrong).
  5. Govern the data – establish standards for quality, access controls, security protections, semantic consistency, etc. – don’t skimp here, the impact of bad data can never really be quantified.
  6. Store it – it’s interesting how often agencies think this is the first step
  7. Get the data ready to be useful to people, tools and applications – think about how to minimalize the need for users to manipulate data – reformatting, parsing, filtering, etc. – to better enable self-service.
  8. Make it available – at this point, the data should be easily accessible, easily discoverable, easily used by people, tools and applications.

Clearly, there’s more to shining the light on dark data than I can offer in this post. If you’d like to take the next step to learning what is possible, I suggest you download the eBook, The Dark Data Imperative.

Share
Posted in Big Data, Data Warehousing, Enterprise Data Management, Governance, Risk and Compliance, Intelligent Data Platform, Public Sector | Tagged , , | Leave a comment