Tag Archives: Data Quality

Master Data Management in Oil and Gas Industry

MDM_Oil+Gas Industry

Master Data Management in Oil and Gas Industry

The Oil and Gas (O&G) industry is an important backbone of every economy. It is also an industry that has weathered the storm of constantly changing economic trends, regulations and technological innovations. O&G companies by nature have complex and intensive data processes. For a profitable functioning under changing trends, policies and guidelines, O&G’s need to manage these data processes really well.

The industry is subject to pricing volatility based on microeconomic pattern of supply and demand affected by geopolitical developments, economic meltdown and public scrutiny. The competition from other sources such as cheap natural gas and low margins are adding fuel to the burning fire making it hard for O&G’s to have a sustainable and predictable outcome.

A recent PWC survey of oil and gas CEOs similarly concluded that “energy CEOs can’t control market factors such as the world’s economic health or global oil supply, but can change how they respond to market conditions, such as getting the most out of technology investments, more effective use of partnerships and diversity strategies.”  The survey also revealed that nearly 80% of respondents agreed that digital technologies are creating value for their companies when it comes to data analysis and operational efficiency.

O&G firms run three distinct business operations; upstream exploration & production (E&P’s), midstream (storage & transportation) and downstream (refining & distribution). All of these operations need a few core data domains being standardized for every major business process. However, a key challenge faced by O&G companies is that this critical core information is often spread across multiple disparate systems making it hard to take timely decisions. To ensure effective operations and to grow their asset base, it is vital for these companies to capture and manage critical data related to these domains.

E&P’s core data domains include wellhead/materials (asset), geo-spatial location data and engineer/technicians (associate). Midstream includes trading partners and distribution and downstream includes commercial and residential customers. Classic distribution use cases emerge around shipping locations, large-scale clients like airlines and other logistics providers buying millions of gallons of fuel and industrial lube products down to gas station customers. The industry also relies heavily on reference data and chart of accounts for financial cost and revenue roll-ups.

The main E&P asset, the well, goes through its life cycle and changes characteristics (location, ID, name, physical characterization, depth, crew, ownership, etc.), which are all master data aspects to consider for this baseline entity. If we master this data and create a consistent representation across the organization, it can then be linked to transaction and interaction data so that O&G’s can drive their investment decisions, split cost and revenue through reporting and real-time processes around

  • Crew allocation
  • Royalty payments
  • Safety and environmental inspections
  • Maintenance and overall production planning

E&P firms need a solution that allows them to:

  • Have a flexible multidomain platform that permits easier management of different entities under one solution
  • Create a single, cross-enterprise instance of a wellhead master
  • Capture and master the relationships between the well, equipment, associates, land and location
  • Govern end-to-end management of assets, facilities, equipment and sites throughout their life cycle

The upstream O&G industry is uniquely positioned to take advantage of vast amount of data from its operations. Thousands of sensors at the well head, millions of parts in the supply chain, global capital projects and many highly trained staff create a data-rich environment. A well implemented MDM brings a strong foundation for this data driven industry providing clean, consistent and connected information about the core master data so these companies can cut the material cost, IT maintenance and increase margins.

To know more on how you can achieve upstream operational excellence with Informatica Master Data Management, check out this recorded webinar with @OilAndGasData

~Prash
@MDMGeek
www.mdmgeek.com

Share
Posted in Data Quality, Operational Efficiency, Utilities & Energy | Tagged , , , | Leave a comment

Building an Impactful Data Governance – One Step at a Time

Let’s face it, building a Data Governance program is no overnight task.  As one CDO puts it:  ”data governance is a marathon, not a sprint”.  Why? Because data governance is a complex business function that encompasses technology, people and process, all of which have to work together effectively to ensure the success of the initiative.  Because of the scope of the program, Data Governance often calls for participants from different business units within an organization, and it can be disruptive at first.

Why bother then?  Given that data governance is complex, disruptive, and could potentially introduce additional cost to a company?  Well, the drivers for data governance can vary for different organizations.  Let’s take a close look at some of the motivations behind data governance program.

For companies in heavily regulated industries, establishing a formal data governance program is a mandate.  When a company is not compliant, consequences can be severe. Penalties could include hefty fines, brand damage, loss in revenue, and even potential jail time for the person who is held accountable for being noncompliance. In order to meet the on-going regulatory requirements, adhere to data security policies and standards, companies need to rely on clean, connected and trusted data to enable transparency, auditability in their reporting to meet mandatory requirements and answer critical questions from auditors.  Without a dedicated data governance program in place, the compliance initiative could become an on-going nightmare for companies in the regulated industry.

A data governance program can also be established to support customer centricity initiative. To make effective cross-sells and ups-sells to your customers and grow your business,  you need clear visibility into customer purchasing behaviors across multiple shopping channels and touch points. Customer’s shopping behaviors and their attributes are captured by the data, therefore, to gain thorough understanding of your customers and boost your sales, a holistic Data Governance program is essential.

Other reasons for companies to start a data governance program include improving efficiency and reducing operational cost, supporting better analytics and driving more innovations. As long as it’s a business critical area and data is at the core of the process, and the business case is loud and sound, then there is a compelling reason for launching a data governance program.

Now that we have identified the drivers for data governance, how do we start?  This rather loaded question really gets into the details of the implementation. A few critical elements come to consideration including: identifying and establishing various task forces such as steering committee, data governance team and business sponsors; identifying roles and responsibilities for the stakeholders involved in the program; defining metrics for tracking the results.  And soon you will find that on top of everything, communications, communications and more communications is probably the most important tactic of all for driving the initial success of the program.

A rule of thumb?  Start small, take one-step at a time and focus on producing something tangible.

Sounds easy, right? Think this is easy?!Well, let’s hear what the real-world practitioners have to say. Join us at this Informatica webinar to hear Michael Wodzinski, Director of Information Architecture, Lisa Bemis, Director of Master Data, Fabian Torres, Director of Project Management from Houghton Mifflin Harcourt, global leader in publishing, as well as David Lyle, VP of product strategy from Informatica to discuss how to implement  a successful data governance practice that brings business impact to an enterprise organization.

If you are currently kicking the tires on setting up data governance practice in your organization,  I’d like to invite you to visit a member-only website dedicated to Data Governance:  http://governyourdata.com/. This site currently has over 1,000 members and is designed to foster open communications on everything data governance. There you will find conversations on best practices, methodologies, frame works, tools and metrics.  I would also encourage you to take a data governance maturity assessment to see where you currently stand on the data governance maturity curve, and compare the result against industry benchmark.  More than 200 members have taken the assessment to gain better understanding of their current data governance program,  so why not give it a shot?

Governyourdata.com

Governyourdata.com

Data Governance is a journey, likely a never-ending one.  We wish you best of the luck on this effort and a joyful ride! We love to hear your stories.

Share
Posted in Big Data, Data Governance, Data Integration, Data Quality, Enterprise Data Management, Master Data Management | Tagged , , , , , , , , , , , | 1 Comment

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

Big Data

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

2014 was a pivotal turning point for Informatica as our investments in Hadoop and efforts to innovate in big data gathered momentum and became a core part of Informatica’s business. Our Hadoop related big data revenue growth was in the ballpark of leading Hadoop startups – more than doubling over 2013.

In 2014, Informatica reached about 100 enterprise customers of our big data products with an increasing number going into production with Informatica together with Hadoop and other big data technologies.  Informatica’s big data Hadoop customers include companies in financial services, insurance, telcommunications, technology, energy, life sciences, healthcare and business services.  These innovative companies are leveraging Informatica to accelerate their time to production and drive greater value from their big data investments.

These customers are in-production or implementing a wide range of use cases leveraging Informatica’s great data pipeline capabilities to better put the scale, efficiency and flexibility of Hadoop to work.  Many Hadoop customers start by optimizing their data warehouse environments by moving data storage, profiling, integration and cleansing to Hadoop in order to free up capacity in their traditional analytics data warehousing systems. Customers that are further along in their big data journeys have expanded to use Informatica on Hadoop for exploratory analytics of new data types, 360 degree customer analytics, fraud detection, predictive maintenance, and analysis of massive amounts of Internet of Things machine data for optimization of energy exploration, manufacturing processes, network data, security and other large scale systems initiatives.

2014 was not just a year of market momentum for Informatica, but also one of new product development innovations.  We shipped enhanced functionality for entity matching and relationship building at Hadoop scale (a key part of Master Data Management), end-to-end data lineage through Hadoop, as well as high performance real-time streaming of data into Hadoop. We also launched connectors to NoSQL and analytics databases including Datastax Cassandra, MongoDB and Amazon Redshift. Informatica advanced our capabilities to curate great data for self-serve analytics with a connector to output Tableau’s data format and launched our self-service data preparation solution, Informatica Rev.

Customers can now quickly try out Informatica on Hadoop by downloading the free trials for the Big Data Edition and Vibe Data Stream that we launched in 2014.  Now that Informatica supports all five of the leading Hadoop distributions, customers can build their data pipelines on Informatica with confidence that no matter how the underlying Hadoop technologies evolve, their Informatica mappings will run.  Informatica provides highly scalable data processing engines that run natively in Hadoop and leverage the best of open source innovations such as YARN, MapReduce, and more.   Abstracting data pipeline mappings from the underlying Hadoop technologies combined with visual tools enabling team collaboration empowers large organizations to put Hadoop into production with confidence.

As we look ahead into 2015, we have ambitious plans to continue to expand and evolve our product capabilities with enhanced productivity to help customers rapidly get more value from their data in Hadoop. Stay tuned for announcements throughout the year.

Try some of Informatica’s products for Hadoop on the Informatica Marketplace here.

Share
Posted in B2B Data Exchange, Big Data, Data Integration, Data Services, Hadoop | Tagged , , , , , , | Leave a comment

Great Data Increases Value and De-Risks the Drone

great data

Great Data Increases Value and De-Risks the Drone

At long last, the anxiously awaited rules from the FAA have brought some clarity to the world of commercial drone use. Up until now, commercial drone use has been prohibited. The new rules, of course, won’t sit well with Amazon who would like to drop merchandise on your porch at all hours. But the rules do work really well for insurers who would like to use drones to service their policyholders. So now drones, and soon to be fleets of unmanned cars will be driving the roadways in any numbers of capacities. It seems to me to be an ambulance chaser’s dream come true. I mean who wouldn’t want some seven or eight figure payday from Google for getting rear-ended?

What about “Great Data”? What does that mean in the context of unmanned vehicles, both aerial and terrestrial? Let’s talk about two aspects. First, the business benefits of great data using unmanned drones.

An insurance adjuster or catastrophe responder can leverage an aerial drone to survey large areas from a central location. They will pin point the locations needing attention for further investigation. This is a common scenario that many insurers talk about when the topic of aerial drone use comes up. Second to that is the ability to survey damage in hard to reach locations like roofs or difficult terrain (like farmland). But this is where great data comes into play. Surveying, service and use of unmanned vehicles demands that your data can answer some of the following questions for your staff operating in this new world:

Where am I?

Quality data and geocoded locations as part of that data is critical. In order to locate key risk locations, your data must be able to coordinate with the lat/long of the location recorded by your unmanned vehicles and the location of your operator. Ensure clean data through robust data quality practices.

Where are my policyholders?

Knowing the location of your policyholders not only relies on good data quality, but on knowing who they are and what risks you are there to help service. This requires a total customer relationship solution where you have a full view of not only locations, but risks, coverages and entities making up each policyholder.

What am I looking at?

Archived, current and work in process imaging is a key place where a Big Data environment can assist over time. By comparing saved images with new and processing claims, claims fraud and additional opportunities for service can be detected quickly by the drone operator.

Now that we’ve answered the business value questions and leveraged this new technology to better service policyholders and speed claims, let’s turn to how great data can be used to protect the insurer and drone operator from liability claims. This is important. The FAA has stopped short of requiring commercial drone operators to carry special liability insurance, leaving that instead to the drone operators to orchestrate with their insurer. And now we’re back to great data. As everyone knows, accidents happen. Technology, especially robotic mobile technology is not infallible. Something will crash somewhere, hopefully not causing injury or death, but sadly that too will likely happen. And there is nothing that will keep the ambulance chasers at bay more than robust great data. Any insurer offering liability cover for a drone operator should require that some of the following questions be answered by the commercial enterprise. And the interesting fact is that this information should be readily available if the business questions above have been answered.

  • Where was my drone?
  • What was it doing?
  • Was it functioning properly?

Properly using the same data management technology as in the previous questions will provide valuable data to be used as evidence in the case of liability against a drone operator. Insurers would be wise to ask these questions of their liability policyholders who are using unmanned technology as a way to gauge liability exposure in this brave new world. The key to the assessment of risk being robust data management and great data feeding the insurer’s unmanned policyholder service workers.

Time will tell all the great and imaginative things that will take place with this new technology. One thing is for certain. Great data management is required in all aspects from amazing customer service to risk mitigation in operations.  Happy flying to everyone!!

Share
Posted in Big Data, Data Quality, Financial Services | Tagged , , , , | Leave a comment

Guiding Your Way to Master Data Management Nirvana

Achieving and maintaining a single, semantically consistent version of master data is crucial for every organization. As many companies are moving from an account or product-centric approach to a customer-centric model, master data management is becoming an important part of their enterprise data management strategy. MDM provides the clean, consistent and connected information your organizations need for you to –

  1. Empower customer facing teams to capitalize on cross-sell and up-sell opportunities
  2. Create trusted information to improve employee productivity
  3. Be agile with data management so you can make confident decisions in a fast changing business landscape
  4. Improve information governance and be compliant with regulations

Master Data ManagementBut there are challenges ahead for the organizations. As Andrew White of Gartner very aptly wrote in a blog post, we are only half pregnant with Master Data Management. Andrew in his blog post talked about increasing number of inquiries he gets from organizations that are making some pretty simple mistakes in their approach to MDM without realizing the impact of those decisions on a long run.

Over last 10 years, I have seen many organizations struggle to implement MDM in a right way. Few MDM implementations have failed and many have taken more time and incurred cost before showing value.

So, what is the secret sauce?

A key factor for a successful MDM implementation lays in mapping your business objectives to features and functionalities offered by the product you are selecting. It is a phase where you ask right questions and get them answered. There are few great ways in which organizations can get this done and talking to analysts is one of them. The other option is to attend MDM focused events that allow you to talk to experts, learn from other customer’s experience and hear about best practices.

We at Informatica have been working hard to deliver you a flexible MDM platform that provides complete capabilities out of the box. But MDM journey is more than just technology and product features as we have learnt over the years. To ensure our customer success, we are sharing knowledge and best practices we have gained with hundreds of successful MDM and PIM implementations. The Informatica MDM Day, is a great opportunity for organizations where we will –

  • Share best practices and demonstrate our latest features and functionality
  • Show our product capabilities which will address your current and future master data challenges
  • Provide you opportunity to learn from other customer’s MDM and PIM journeys.
  • Share knowledge about MDM powered applications that can help you realize early benefits
  • Share our product roadmap and our vision
  • Provide you an opportunity to network with other like-minded MDM, PIM experts and practitioners

So, join us by registering today for our MDM Day event in New York on 24th February. We are excited to see you all there and walk with you towards MDM Nirvana.

~Prash
@MDMGeek
www.mdmgeek.com

Share
Posted in Big Data, Customers, DaaS, Data Governance, Master Data Management, PiM, Product Information Management | Tagged , , , , , , | Leave a comment

How to Ace Application Migration & Consolidation (Hint: Data Management)

Myth Vs Reality: Application Migration & Consolidation

Myth Vs Reality: Application Migration & Consolidation (No, it’s not about dating)

Will your application consolidation or migration go live on time and on budget?  According to Gartner, “through 2019, more than 50% of data migration projects will exceed budget and/or result in some form of business disruption due to flawed execution.”1  That is a scary number by any measure. A colleague of mine put it well: ‘I wouldn’t get on a plane that had 50% chance of failure’. So should you be losing sleep over your migration or consolidation project? Well that depends.  Are you the former CIO of Levi Strauss? Who, according to Harvard Business Review, was forced to resign due to a botched SAP migration project and a $192.5 million earnings write-off?2  If so, perhaps you would feel a bit apprehensive. Otherwise, I say you can be cautiously optimistic, if you go into it with a healthy dose of reality. Please ensure you have a good understanding of the potential pitfalls and how to address them.  You need an appreciation for the myths and realities of application consolidation and migration.

First off, let me get one thing off my chest.  If you don’t pay close attention to your data, throughout the application consolidation or migration process, you are almost guaranteed delays and budget overruns. Data consolidation and migration is at least 30%-40% of the application go-live effort. We have learned this by helping customers deliver over 1500 projects of this type.  What’s worse, if you are not super meticulous about your data, you can be assured to encounter unhappy business stakeholders at the end of this treacherous journey. The users of your new application expect all their business-critical data to be there at the end of the road. All the bells and whistles in your new application will matter naught if the data falls apart.  Imagine if you will, students’ transcripts gone missing, or your frequent-flyer balance a 100,000 miles short!  Need I say more?  Now, you may already be guessing where I am going with this.  That’s right, we are talking about the myths and realities related to your data!   Let’s explore a few of these.

Myth #1: All my data is there.

Reality #1: It may be there… But can you get it? if you want to find, access and move out all the data from your legacy systems, you must have a good set of connectivity tools to easily and automatically find, access and extract the data from your source systems. You don’t want to hand-code this for each source.  Ouch!

Myth #2: I can just move my data from point A to point B.

Reality #2: You can try that approach if you want.  However you might not be happy with the results.  Reality is that there can be significant gaps and format mismatches between the data in your legacy system and the data required by your new application. Additionally you will likely need to assemble data from disparate systems. You need sophisticated tools to profile, assemble and transform your legacy data so that it is purpose-fit for your new application.

Myth #3: All my data is clean.

Reality #3:  It’s not. And here is a tip:  better profile, scrub and cleanse your data before you migrate it. You don’t want to put a shiny new application on top of questionable data . In other words let’s get a fresh start on the data in your new application!

Myth #4: All my data will move over as expected

Reality #4: It will not.  Any time you move and transform large sets of data, there is room for logical or operational errors and surprises.  The best way to avoid this is to automatically validate that your data has moved over as intended.

Myth #5: It’s a one-time effort.

Reality #5: ‘Load and explode’ is formula for disaster.  Our proven methodology recommends you first prototype your migration path and identify a small subset of the data to move over. Then test it, tweak your model, try it again and gradually expand.  More importantly, your application architecture should not be a one-time effort.  It is work in progress and really an ongoing journey.  Regardless of where you are on this journey, we recommend paying close attention to managing your application’s data foundation.

As you can see, there is a multitude of data issues that can plague an application consolidation or migration project and lead to its doom.  These potential challenges are not always recognized and understood early on.  This perception gap is a root-cause of project failure. This is why we are excited to host Philip Russom, of TDWI, in our upcoming webinar to discuss data management best practices and methodologies for application consolidation and migration. If you are undertaking any IT modernization or rationalization project, such as consolidating applications or migrating legacy applications to the cloud or to ‘on-prem’ application, such as SAP, this webinar is a must-see.

So what’s your reality going to be like?  Will your project run like a dream or will it escalate into a scary nightmare? Here’s hoping for the former.  And also hoping you can join us for this upcoming webinar to learn more:

Webinar with TDWI:
Successful Application Consolidation & Migration: Data Management Best Practices.

Date: Tuesday March 10, 10 am PT / 1 pm ET

Don’t miss out, Register Today!

1) Gartner report titled “Best Practices Mitigate Data Migration Risks and Challenges” published on December 9, 2014

2) Harvard Business Review: ‘Why your IT project may be riskier than you think’.

Share
Posted in Data Integration, Data Migration, Data Quality, Enterprise Data Management | Tagged , , , , , , , , , , , , , | 2 Comments

Patient Experience-The Quality of Your Data is Important!

Patient_Experience

Patient Experience-Viable Data is Important!

Patient experience is key to growth and success for all health delivery organizations.  Gartner has stated that the patient experience needs to be one of the highest priorities for organizations. The quality of your data is critical to achieving that goal.  My recent experience with my physician’s office demonstrates how easy it is for the quality of data to influence the patient experience and undermine a patient’s trust in their physician and the organization with which they are interacting.

I have a great relationship with my doctor and have always been impressed by the efficiency of the office.  I never wait beyond my appointment time, the care is excellent and the staff is friendly and professional.  There is an online tool that allows me to see my records, send messages to my doctor, request an appointment and get test results. The organization enjoys the highest reputation for clinical quality.  Pretty much perfect from my perspective – until now.

I needed to change a scheduled appointment due to a business conflict.  Since I expected some negotiation I decided to make the phone call rather than request it on line…there are still transactions for which human to human is optimal!  I had all my information at hand and made the call.  The phone was pleasantly answered and the request given.  The receptionist requested my name and date of birth, but then stated that I did not have a future appointment.  I am looking at the online tool, which clearly states that I am scheduled for February 17 at 8:30 AM.  The pleasant young woman confirms my name, date of birth and address and then tells me that I do not have an appointment scheduled.  I am reasonably savvy about these things and figured out the core problem, which is that my last name is hyphenated.  Armed with that information, my other record is found and a new appointment scheduled. The transaction is completed.

But now I am worried. My name has been like this for many years and none of my other key data has changed.   Are there parts of my clinical history missing in the record that my doctor is using?   Will that have a negative impact on the quality of my care?  If I were to be unable to clearly respond, might that older record be accessed and my current medications and history not be available?  The receptionist did not address the duplicate issue clearly by telling me that she would attend to merging the records, so I have no reason to believe that she will.  My confidence is now shaken and I am less trustful of the system and how well it will serve me going forward.  I have resolved my issue, but not everyone would be able to push back to insure that their records are now accurate.

Many millions of dollars are being spent on electronic health records.  Many more millions are being spent to redesign work flow to accommodate the new EHR’s. Physicians and other clinicians are learning new ways to access data and treat their patients. The foundation for all of this is accurate data.  Nicely displayed but inaccurate data will not result in improved care or enhanced member experience.  As healthcare organizations move forward with the razzle dazzle of new systems they need to remember the basics of good quality data and insure that it is available to these new applications.

Share
Posted in Data Privacy, Data Quality, Healthcare, Operational Efficiency | Tagged , , , | Leave a comment

Big Data Is Neither-Part I

humongdataI’ve been having some interesting conversations with work colleagues recently about the Big Data hubbub and I’ve come to the conclusion that “Big Data” as hyped is neither, really. In fact, both terms are relative. “Big” 20 years ago to many may have been 1 terabyte. “Data” 20 years ago may have been Flat files, Sybase, Oracle, Informix, SQL Server or DB2 tables. Fast forward to today and “Big” is now Exabytes (or millions of terabytes). “Data” are now expanded to include events, sensors, messages, RFID, telemetry, GPS, accelerometers, magnetometers, IoT / M2M and other new and evolving data classifications.

And then there’s social and search data.

Surely you would classify Google data as really really big data – I can tell when I do a search, and get 487,464,685 answers within fractions of a second that they appear to have gotten a handle on their big data speeds and feeds. However, it’s also telling that nearly all of those bazillion results are actually not relevant to what I am searching for.

My conclusion is that if you have the right algorithms, invest in and use the right hardware and software technology and make sure to measure the pertinent data sources, harnessing big data can yield speedy &“big”results.

So what’s the rub then?

It usually boils down to having larger and more sophisticated data stores and still not understanding its structure, OR it can’t be integrated into cohesive formats, OR there is important hidden meaning in the data that we don’t have the wherewithal to derive, see or understand a la Google? So how DO you find the timely and important information out of your company’s big data (AKA the needle in the haystack)?

needlehaystack-Big Data

More to the point, how do you better ingest, integrate, parse, analyze, prepare, and cleanse your data to get the speed, but also the relevancy in a Big Data world?

Hadoop related tools are one of the current technologies of choice when it comes to solving Big Data related problems, and as an Informatica customer, you can leverage these tools, regardless of whether it’s Big Data or Not So Big Data, fast data or slow data. In fact, it actually astounds me that many IT professionals would want to go back to hand coding with a Hadoop tool just because they don’t know that the tools to do so are right under their nose, installed and running in their familiar Informatica User Interface (AND that work with Hadoop right out of the box.)

So what does your company get out of using Informatica in conjunction with Hadoop tools? Namely, better customer service and responsiveness, better operational efficiencies, more effective supply chains, better governance, service assurance, and the ability to discover previously unknown opportunities as well as stopping problems when they are an issue – not after the fact. In other words, Big Data done right can be a great advantage to many of today’s organizations.

Much more to say on this this subject as I delve into the future of Big Data. For more, see Part 2.

Share
Posted in Big Data, Business Impact / Benefits, Complex Event Processing, Intelligent Data Platform | Tagged , , , , | Leave a comment

The Quality of the Ingredients Make the Dish-Applies to Data Quality

Data_Quality

Data Quality Leads to Other Integrated Benefits

In a previous life, I was a pastry chef in a now-defunct restaurant. One of the things I noticed while working there (and frankly while cooking at home) is that the better the ingredients, the better the final result. If we used poor quality apples in the apple tart, we ended up with a soupy, flavorless mess with a chewy crust.

The same analogy can be applied to Data Analytics. With poor quality data, you get poor results from your analytics projects. We all know that companies that can implement fantastic analytic solutions that can provide near real-time access to consumer trends are the same companies that can do successful targeted marketing campaigns that are of the minute. The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year.

The business impact of poor data quality cannot be underestimated. If not identified and corrected early on, defective data can contaminate all downstream systems and information assets, jacking up costs, jeopardizing customer relationships, and causing imprecise forecasts and poor decisions.

  • To help you quantify: Let’s say your company receives 2 million claims per month with 377 data elements per claim. Even at an error rate of .001, the claims data contains more than 754,000 errors per month and more than 9.04 million errors per year! If you determine that 10 percent of the data elements are critical to your business decisions and processes, you still must fix almost 1 million errors each year!
  • What is your exposure to these errors? Let’s estimate the risk at $10 per error (including staff time required to fix the error downstream after a customer discovers it, the loss of customer trust and loyalty and erroneous payouts. Your company’s risk exposure to poor quality claims data is $10 million a year.

Once your company values quality data as a critical resource – it is much easier to perform high-value analytics that have an impact on your bottom line. Start with creation of a Data Quality program. Data is a critical asset in the information economy, and the quality of a company’s data is a good predictor of its future success.

Share
Posted in Business Impact / Benefits, Cloud Data Integration, Customer Services, Data Aggregation, Data Integration, Data Quality, Data Warehousing, Database Archiving, Healthcare, Master Data Management, Profiling, Scorecarding, Total Customer Relationship | Tagged , , , , | 2 Comments

“It’s not you, it’s me!” – says Data Quality to Big Data

“It’s not you, it’s me!” – says Data Quality to Big Data

“It’s not you, it’s me!” – says Data Quality to Big Data

I couldn’t help myself start this blog with George Costanza’s “You are giving me the – It’s not you, it’s me! – routine? I invented – It’s not you, it’s me …”

The thing that resonates today, in the odd context of big data, is that we may all need to look in the mirror, hold a thumb drive full of information in our hands, and concede once and for all It’s not the data… it’s us.

Many organizations have a hard time making something useful from the ever-expanding universe of big-data, but the problem doesn’t lie with the data: It’s a people problem.

The contention is that big-data is falling short of the hype because people are:

  1. too unwilling to create cultures that value standardized, efficient, and repeatable information, and
  2. too complex to be reduced to “thin data” created from digital traces.

Evan Stubbs describes poor data quality as the data analyst’s single greatest problem.


About the only satisfying thing about having bad data is the schadenfreude that goes along with it. There’s cold solace in knowing that regardless of how poor your data is, everyone else’s is equally as bad. The thing is poor quality data doesn’t just appear from the ether. It’s created. Leave the dirty dishes for long enough and you’ll end up with cockroaches and cholera. Ignore data quality and eventually you’ll have black holes of untrustworthy information. Here’s the hard truth: we’re the reason bad data exists.


I will tell you that most data teams make “large efforts” to scrub their data. Those “infrequent” big cleanups however only treat the symptom, not the cause – and ultimately lead to inefficiency, cost, and even more frustration.

It’s intuitive and natural to think that data quality is a technological problem. It’s not; it’s a cultural problem. The real answer is that you need to create a culture that values standardized, efficient, and repeatable information.

If you do that, then you’ll be able to create data that is re-usable, efficient, and high quality. Rather than trying to manage a shanty of half-baked source tables, effective teams put the effort into designing, maintaining, and documenting their data. Instead of being a one-off activity, it becomes part of business as usual, something that’s simply part of daily life.

However, even if that data is the best it can possibly be, is it even capable of delivering on the big-data promise of greater insights about things like the habits, needs, and desires of customers?

Despite the enormous growth of data and the success of a few companies like Amazon and Netflix, “the reality is that deeper insights for most organizations remain elusive,” write Mikkel Rasmussen and Christian Madsbjerg in a Bloomberg Businessweek blog post that argues “big-data gets people wrong.”


Big-data delivers thin data. In the social sciences, we distinguish between two types of human behavior data. The first – thin data – is from digital traces: He wears a size 8, has blue eyes, and drinks pinot noir. The second – rich data – delivers an understanding of how people actually experience the world: He could smell the grass after the rain, he looked at her in that special way, and the new running shoes made him look faster. Big-data focuses solely on correlation, paying no attention to causality. What good is thin “information” when there is no insight into what your consumers actually think and feel?


Accenture reported only 20 percent of the companies it profiled had found a proven causal link between “what they measure and the outcomes they are intending to drive.”

Now, I can contend they keys to transforming big-data to strategic value are critical thinking skills.

Where do we get such skills? People, it seems, are both the problem and the solution. Are we failing on two fronts: failing to create the right data-driven cultures, and failing to interpret the data we collect?

Twitter @bigdatabeat

Share
Posted in Architects, Big Data, Business Impact / Benefits, CIO, Data Governance, Data Quality, Data Transformation, Hadoop | Tagged , , , | Leave a comment