Beware of Data Anarchists, because Nobody ETLs Like its 1999 in a Data Democracy

data democracy
Data Democracy

A growing group of self-proclaimed data anarchists are making headlines by claiming “Let IT tremble at a data revolution. Analysts have nothing to lose but their chains!”  With efficient and scalable data platforms like Hadoop emerging, data anarchists are claiming “Down with ETL and long, complex, siloed data integration processes!”  The call for revolution seems intriguing on one hand.  With data sources proliferating, is it simply inevitable that Chief Data Officers are set up to fail with centralized data organizations?  Data anarchists are proposing a new world order where data flows without control and Chief Data Officers simply abdicate on their responsibility to manage organizational data as an asset.

Well, there’s just one problem with the argument.

Data anarchists are living in the past.  Nobody ETLs like its 1999.

And what’s even lamer is that it’s a red herring to spawn a revolution when successful organizations have already evolved their information fabrics to drive greater analyst autonomy and IT agility while also ensuring requirements around security and governance.  A lot of this innovation has been spawned from technologies that understand the “data about the data,” or metadata.  This kind of innovation has become ever more increasingly important in a world of high volume ingestion of unstructured data and dynamic schemas.  Technology only has a greater role to play in helping organizations extract valuable information from greater volumes and varieties of data.

Data chaos is not the answer.  Data democracy is.

So how can organizations evolve their information fabrics to deliver data democracy?  Here are 7 specific best practices that we’ve seen that have led to more successful decision-ready analytics without the anarchy of data chaos.

  • Automate data ingestion with high performance pre-built connectors or data processing engines.  With efficient and scalable data platforms like Hadoop, unprepared data can be landed in raw form without manual or complex processes.  Automate ingestion with high performance pre-built connectors or data processing engines to enable agility and speed.
  • Develop fit-for-purpose data for different classes of analysts. New classes of data scientists may prefer getting data that is 70% clean very fast unlike traditional BI users who were willing to wait for the 100% clean data.  Different classes of fit-for-purpose data can be delivered in platforms like Hadoop with rapid data profiling and data validation tools.
  • Empower data consumers to easily blend data on their own. Data quality need not be a totally IT driven process anymore.  After some initial data cleansing, just provide technical data consumers like data scientists direct access to data along with lightweight data blending tools to empower them to collaborate in the data curation process.
  • Execute agile data curation with multi-persona collaboration and rapid prototyping. Fast and successful data projects always come out of collaboration between business stakeholders and IT.  Multi-persona tools with rapid prototyping capabilities turn data curation into an agile process that delivers value quickly.
  • Monitor data quality instead of micromanaging it. Data quality need not be a fixed mandate that is equally applied to all data.  Rule-based monitoring and alerting enables IT to flexibly monitor data processes and collaboratively drive visibility of the quality of different data assets.
  • Empower data accessibility with virtualization or brokering. Before the value of data is known, it might not make sense to move it from a source system.  Data virtualization can help provide fast views into datasets without moving it.  Once the value of data is known, then data movement and curation can be automated with a data broker that automatically ingests data into central hub and then enables consumers to subscribe to it.
  • Leverage a universal metadata catalog powered by data intelligence for highest return on data. New machine learning techniques and graph based technologies can be used to infer the structure, meaning, value, and risk of data assets and surface that understanding to business analysts and IT to drive greater autonomy while also ensuring greater security and governance.

These are the best practices we’ve found within Informatica’s most leading customers in 2015.  This is what modern information fabrics are delivering to help organizations repeatably deliver the right data at the right time for more pervasive, elastic, and trusted analytics.  This is the foundation of a well-secured, well-governed data democracy.

Raj Patil, Head of Data Strategy, Architecture, and Decision Science at BNY Mellon said it best at the MIT CDO Symposium event recently when he said, “Ungoverned transformation is what causes a mess.. I’m not suggesting that you build a data warehouse that requires you to model the whole world.  Pick an approach that enables you to evolve as you go, like an ontological approach..Our aim is to deliver an enterprise data graph that supports curation, security, governance, etc” 

Data anarchists only create data chaos that turns data assets into a liability.  Successful data leaders are using evolved information fabrics to power a well-secured, well-governed data democracy.