Tag Archives: Data Governance

Building an Impactful Data Governance – One Step at a Time

Let’s face it, building a Data Governance program is no overnight task.  As one CDO puts it:  ”data governance is a marathon, not a sprint”.  Why? Because data governance is a complex business function that encompasses technology, people and process, all of which have to work together effectively to ensure the success of the initiative.  Because of the scope of the program, Data Governance often calls for participants from different business units within an organization, and it can be disruptive at first.

Why bother then?  Given that data governance is complex, disruptive, and could potentially introduce additional cost to a company?  Well, the drivers for data governance can vary for different organizations.  Let’s take a close look at some of the motivations behind data governance program.

For companies in heavily regulated industries, establishing a formal data governance program is a mandate.  When a company is not compliant, consequences can be severe. Penalties could include hefty fines, brand damage, loss in revenue, and even potential jail time for the person who is held accountable for being noncompliance. In order to meet the on-going regulatory requirements, adhere to data security policies and standards, companies need to rely on clean, connected and trusted data to enable transparency, auditability in their reporting to meet mandatory requirements and answer critical questions from auditors.  Without a dedicated data governance program in place, the compliance initiative could become an on-going nightmare for companies in the regulated industry.

A data governance program can also be established to support customer centricity initiative. To make effective cross-sells and ups-sells to your customers and grow your business,  you need clear visibility into customer purchasing behaviors across multiple shopping channels and touch points. Customer’s shopping behaviors and their attributes are captured by the data, therefore, to gain thorough understanding of your customers and boost your sales, a holistic Data Governance program is essential.

Other reasons for companies to start a data governance program include improving efficiency and reducing operational cost, supporting better analytics and driving more innovations. As long as it’s a business critical area and data is at the core of the process, and the business case is loud and sound, then there is a compelling reason for launching a data governance program.

Now that we have identified the drivers for data governance, how do we start?  This rather loaded question really gets into the details of the implementation. A few critical elements come to consideration including: identifying and establishing various task forces such as steering committee, data governance team and business sponsors; identifying roles and responsibilities for the stakeholders involved in the program; defining metrics for tracking the results.  And soon you will find that on top of everything, communications, communications and more communications is probably the most important tactic of all for driving the initial success of the program.

A rule of thumb?  Start small, take one-step at a time and focus on producing something tangible.

Sounds easy, right? Think this is easy?!Well, let’s hear what the real-world practitioners have to say. Join us at this Informatica webinar to hear Michael Wodzinski, Director of Information Architecture, Lisa Bemis, Director of Master Data, Fabian Torres, Director of Project Management from Houghton Mifflin Harcourt, global leader in publishing, as well as David Lyle, VP of product strategy from Informatica to discuss how to implement  a successful data governance practice that brings business impact to an enterprise organization.

If you are currently kicking the tires on setting up data governance practice in your organization,  I’d like to invite you to visit a member-only website dedicated to Data Governance:  http://governyourdata.com/. This site currently has over 1,000 members and is designed to foster open communications on everything data governance. There you will find conversations on best practices, methodologies, frame works, tools and metrics.  I would also encourage you to take a data governance maturity assessment to see where you currently stand on the data governance maturity curve, and compare the result against industry benchmark.  More than 200 members have taken the assessment to gain better understanding of their current data governance program,  so why not give it a shot?

Governyourdata.com

Governyourdata.com

Data Governance is a journey, likely a never-ending one.  We wish you best of the luck on this effort and a joyful ride! We love to hear your stories.

Share
Posted in Big Data, Data Governance, Data Integration, Data Quality, Enterprise Data Management, Master Data Management | Tagged , , , , , , , , , , , | 1 Comment

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

Big Data

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

2014 was a pivotal turning point for Informatica as our investments in Hadoop and efforts to innovate in big data gathered momentum and became a core part of Informatica’s business. Our Hadoop related big data revenue growth was in the ballpark of leading Hadoop startups – more than doubling over 2013.

In 2014, Informatica reached about 100 enterprise customers of our big data products with an increasing number going into production with Informatica together with Hadoop and other big data technologies.  Informatica’s big data Hadoop customers include companies in financial services, insurance, telcommunications, technology, energy, life sciences, healthcare and business services.  These innovative companies are leveraging Informatica to accelerate their time to production and drive greater value from their big data investments.

These customers are in-production or implementing a wide range of use cases leveraging Informatica’s great data pipeline capabilities to better put the scale, efficiency and flexibility of Hadoop to work.  Many Hadoop customers start by optimizing their data warehouse environments by moving data storage, profiling, integration and cleansing to Hadoop in order to free up capacity in their traditional analytics data warehousing systems. Customers that are further along in their big data journeys have expanded to use Informatica on Hadoop for exploratory analytics of new data types, 360 degree customer analytics, fraud detection, predictive maintenance, and analysis of massive amounts of Internet of Things machine data for optimization of energy exploration, manufacturing processes, network data, security and other large scale systems initiatives.

2014 was not just a year of market momentum for Informatica, but also one of new product development innovations.  We shipped enhanced functionality for entity matching and relationship building at Hadoop scale (a key part of Master Data Management), end-to-end data lineage through Hadoop, as well as high performance real-time streaming of data into Hadoop. We also launched connectors to NoSQL and analytics databases including Datastax Cassandra, MongoDB and Amazon Redshift. Informatica advanced our capabilities to curate great data for self-serve analytics with a connector to output Tableau’s data format and launched our self-service data preparation solution, Informatica Rev.

Customers can now quickly try out Informatica on Hadoop by downloading the free trials for the Big Data Edition and Vibe Data Stream that we launched in 2014.  Now that Informatica supports all five of the leading Hadoop distributions, customers can build their data pipelines on Informatica with confidence that no matter how the underlying Hadoop technologies evolve, their Informatica mappings will run.  Informatica provides highly scalable data processing engines that run natively in Hadoop and leverage the best of open source innovations such as YARN, MapReduce, and more.   Abstracting data pipeline mappings from the underlying Hadoop technologies combined with visual tools enabling team collaboration empowers large organizations to put Hadoop into production with confidence.

As we look ahead into 2015, we have ambitious plans to continue to expand and evolve our product capabilities with enhanced productivity to help customers rapidly get more value from their data in Hadoop. Stay tuned for announcements throughout the year.

Try some of Informatica’s products for Hadoop on the Informatica Marketplace here.

Share
Posted in B2B Data Exchange, Big Data, Data Integration, Data Services, Hadoop | Tagged , , , , , , | Leave a comment

Guiding Your Way to Master Data Management Nirvana

Achieving and maintaining a single, semantically consistent version of master data is crucial for every organization. As many companies are moving from an account or product-centric approach to a customer-centric model, master data management is becoming an important part of their enterprise data management strategy. MDM provides the clean, consistent and connected information your organizations need for you to –

  1. Empower customer facing teams to capitalize on cross-sell and up-sell opportunities
  2. Create trusted information to improve employee productivity
  3. Be agile with data management so you can make confident decisions in a fast changing business landscape
  4. Improve information governance and be compliant with regulations

Master Data ManagementBut there are challenges ahead for the organizations. As Andrew White of Gartner very aptly wrote in a blog post, we are only half pregnant with Master Data Management. Andrew in his blog post talked about increasing number of inquiries he gets from organizations that are making some pretty simple mistakes in their approach to MDM without realizing the impact of those decisions on a long run.

Over last 10 years, I have seen many organizations struggle to implement MDM in a right way. Few MDM implementations have failed and many have taken more time and incurred cost before showing value.

So, what is the secret sauce?

A key factor for a successful MDM implementation lays in mapping your business objectives to features and functionalities offered by the product you are selecting. It is a phase where you ask right questions and get them answered. There are few great ways in which organizations can get this done and talking to analysts is one of them. The other option is to attend MDM focused events that allow you to talk to experts, learn from other customer’s experience and hear about best practices.

We at Informatica have been working hard to deliver you a flexible MDM platform that provides complete capabilities out of the box. But MDM journey is more than just technology and product features as we have learnt over the years. To ensure our customer success, we are sharing knowledge and best practices we have gained with hundreds of successful MDM and PIM implementations. The Informatica MDM Day, is a great opportunity for organizations where we will –

  • Share best practices and demonstrate our latest features and functionality
  • Show our product capabilities which will address your current and future master data challenges
  • Provide you opportunity to learn from other customer’s MDM and PIM journeys.
  • Share knowledge about MDM powered applications that can help you realize early benefits
  • Share our product roadmap and our vision
  • Provide you an opportunity to network with other like-minded MDM, PIM experts and practitioners

So, join us by registering today for our MDM Day event in New York on 24th February. We are excited to see you all there and walk with you towards MDM Nirvana.

~Prash
@MDMGeek
www.mdmgeek.com

Share
Posted in Big Data, Customers, DaaS, Data Governance, Master Data Management, PiM, Product Information Management | Tagged , , , , , , | Leave a comment

Data Governance, Transparency and Lineage with Informatica and Hortonworks

Data GovernanceInformatica users leveraging HDP are now able to see a complete end-to-end visual data lineage map of everything done through the Informatica platform. In this blog post, Scott Hedrick, director Big Data Partnerships at Informatica, tells us more about end-to-end visual data lineage.

Hadoop adoption continues to accelerate within mainstream enterprise IT and, as always, organizations need the ability to govern their end-to-end data pipelines for compliance and visibility purposes. Working with Hortonworks, Informatica has extended the metadata management capabilities in Informatica Big Data Governance Edition to include data lineage visibility of data movement, transformation and cleansing beyond traditional systems to cover Apache Hadoop.

Informatica users are now able to see a complete end-to-end visual data lineage map of everything done through Informatica, which includes sources outside Hortonworks Data Platform (HDP) being loaded into HDP, all data integration, parsing and data quality transformation running on Hortonworks and then loading of curated data sets onto data warehouses, analytics tools and operational systems outside Hadoop.

Regulated industries such as banking, insurance and healthcare are required to have detailed histories of data management for audit purposes. Without tools to provide data lineage, compliance with regulations and gathering the required information for audits can prove challenging.

With Informatica, the data scientist and analyst can now visualize data lineage and detailed history of data transformations providing unprecedented transparency into their data analysis. They can be more confident in their findings based on this visibility into the origins and quality of the data they are working with to create valuable insights for their organizations. Web-based access to visual data lineage for analysts also facilitates team collaboration on challenging and evolving data analytics and operational system projects.

The Informatica and Hortonworks partnership brings together leading enterprise data governance tools with open source Hadoop leadership to extend governance to this new platform. Deploying Informatica for data integration, parsing, data quality and data lineage on Hortonworks reduces risk to deployment schedules.

A demo of Informatica’s end-to-end metadata management capabilities on Hadoop and beyond is available here:

Learn More

  • A free trial of Informatica Big Data Edition in the Hortonworks Sandbox is available here .
Share
Posted in B2B, Data Governance, Data Security, Data Services | Tagged , , , , | Leave a comment

A Date with Data

A Date with DataAs Valentine’s Day approaches and retailers & restaurants prepare to sell millions of cards, teddy bears, bottles of champagne and for the lucky few, some expensive jewels, I started to think about my love affair with data and the many ups and downs we had over the years!

Our first date together was arranged by a third party and everything I was told was from their perspective. I had many questions; could I trust data, was I getting the complete picture from the third party, would we be compatible and ultimately “fit for purpose” or would data break my heart!

As we shared information we were both apprehensive, not everything was fitting together, there were gaps in data’s story, and I just could not make an informed decision, this lead to mistrust between the two of us. I stated to ask other friends and associates for their information and tried to reconcile with my view of data. I wanted it to work but what could I do?

A close friend, Stewart, recommended I get some professional advice to help with my issues with data and pointed me towards Doctor Rob, one of the leading authorities on data, specialising in data governance.

The first bit of advice Doctor Rob gave me was; it should never have been about data, the dream must be about your long term goals together, your commitment to get it right, your interactions with others in your circle of friends and dependents.

The second piece of advice was to decide what roles and responsibilities each of us would take on in the relationship. Evaluate if we have the right skills or do we need external support or training to succeed.

While we are still on our journey together data and I are now in a long term committed relationship and look forward to many years on Cloud 9.

Now all I have to decide is will I go to Tiffany’s or Claire’s for that piece of jewellery!

 

 

Share
Posted in Data Governance, Retail | Tagged , | Leave a comment

Healthcare Consumer Engagement

The signs that healthcare is becoming a more consumer (think patients, members, Healthcareproviders)  driven industry are evident all around us. I see provider and payer organizations clamoring for more data, specifically data that is actionable, relatable and has integrity. Armed with this data, healthcare organizations are able to differentiate around a member/patient-centric view.

These consumer-centric views convey the total value of the relationships healthcare organizations have with consumers. Understanding the total value creates a more comprehensive understanding of consumers because they deliver a complete picture of an individual’s critical relationships including: patient to primary care provider, member to household, provider to network and even members to legacy plans. This is the type of knowledge that informs new treatments, targets preventative care programs and improves outcomes.

Payer organizations are collecting and analyzing data to identify opportunities for  more informed care management and segmentation to reach new, high value customers in individual markets. By segmenting and targeting messaging to specific populations, health plans generate increased member satisfaction and cost effectively expands and manages provider networks.

How will they accomplish this? Enabling members to interact in health and wellness forums, analyzing member behavior and trends and informing care management programs with a 360 view of members… to name a few . Payers will also drive new member programs, member retention and member engagement marketing and sales programs by investigating complete views of member households and market segments.

In the provider space, this relationship building can be a little more challenging because often consumers as patients do not interact with their doctor unless they are sick, creating gaps in data. When provider organizations have a better understanding of their patients and providers, they can increase patient satisfaction and proactively offer preventative care to the sickest (and most likely to engage) of patients before an episode occurs. These activities result in increased market share and improved outcomes.

Where can providers start? By creating a 360 view of the patient, organizations can now improve care coordination, open new patient service centers and develop patient engagement programs.

Analyzing populations of patients, and fostering patient engagement based on Meaningful Use requirements or Accountable Care requirements, building out referral networks and developing physician relationships are essential ingredients in consumer engagement. Knowing your patients and providing a better patient experience than your competition will differentiate provider organizations.

You may say “This all sounds great, but how does it work?” An essential ingredient is clean, safe and connected data.  Clean, safe and connected data requires an investment in data as an asset… just like you invest in real estate and human capital, you must invest in the accessibility and quality of your data. To be successful, arm your team with tools to govern data –ensuring ongoing integrity and quality of data, removes duplicate records and dynamically incorporates data validation/quality rules. These tools include master data management, data quality, metadata management and are focused on information quality. Tools focused on information quality support a total customer relationship view of members, patients and providers.

Share
Posted in Customer Acquisition & Retention, Data Governance, Data Quality, Healthcare, Master Data Management | Tagged , , , , | Leave a comment

The Magnificent Seven Facts on B2C eCommerce in North America

The latest North American B2C e-commerce market report is out now. For my followers I took the freedom to summarize some “Magnificent Seven Facts on B2C eCommerce in North America” in a short blog.  The report covers United States, Canada and Mexico, but as well comparisons to Europe and Asia. According to this report, North American B2C e-commerce market is expected to reach $494.0 billion in 2014.

The Magnificent Seven Facts

  1. 122.5 million households in North America
  2. 336 million internet users in North America
  3. North America makes up 29.2% of the total global online sales ($1,552.0bn) in 2013.
  4. In terms of global B2C e-commerce, North America ranked third in 2013, behind Asia-Pacific and Europe
  5. North American consumers spent on average$2,116 online in2013. This is significantly above the global average of €1,280.
  6. With an average spending per e-shopper of $2,216, American consumers spent most online in2013. Canadians ranked second with an average spending of $1,577, while Mexican e-shoppers on average spent $1,133 online in2013.
  7. Canadians are more likely to shop mobile

Mobile Commerce: Canada Leads the Pack

Within North America, mobile commerce is most popular in Canada, with more than half of the online purchases per week being made through a mobile device. At 38.2%, US Americans still make their mobile purchases in the safe surroundings of their homes.

What are the barriers preventing mobile purchasing?

barriers mobile shopping north america

Free downloads available now

Would you like to find out more about global e-commerce? The free light versions of our Regional/Continental Reports can be downloaded here.

 

Share
Posted in Data Governance, PiM, Product Information Management, Retail | Tagged , , , | Leave a comment

Garbage In, Garbage Out? Don’t Take Data for Granted in Analytics Initiatives!

Cant trust data_1The verdict is in. Data is now broadly perceived as a source of competitive advantage. We all feel the heat to deliver good data. It is no wonder organizations view Analytics initiatives as highly strategic. But the big question is, can you really trust your data? Or are you just creating pretty visualizations on top of bad data?

We also know there is a shift towards self-service Analytics. But did you know that according to Gartner, “through 2016, less than 10% of self-service BI initiatives will be governed sufficiently to prevent inconsistencies that adversely affect the business”?1 This means that you may actually show up at your next big meeting and have data that contradicts your colleague’s data.  Perhaps you are not working off of the same version of the truth. Maybe you have siloed data on different systems and they are not working in concert? Or is your definition of ‘revenue’ or ‘leads’ different from that of your colleague’s?

So are we taking our data for granted? Are we just assuming that it’s all available, clean, complete, integrated and consistent?  As we work with organizations to support their Analytics journey, we often find that the harsh realities of data are quite different from perceptions. Let’s further investigate this perception gap.

For one, people may assume they can easily access all data. In reality, if data connectivity is not managed effectively, we often need to beg borrow and steal to get the right data from the right person. If we are lucky. In less fortunate scenarios, we may need to settle for partial data or a cheap substitute for the data we really wanted. And you know what they say, the only thing worse than no data is bad data. Right?

Another common misperception is: “Our data is clean. We have no data quality issues”.  Wrong again.  When we work with organizations to profile their data, they are often quite surprised to learn that their data is full of errors and gaps.  One company recently discovered within one minute of starting their data profiling exercise, that millions of their customer records contained the company’s own address instead of the customers’ addresses… Oops.

Another myth is that all data is integrated.  In reality, your data may reside in multiple locations: in the cloud, on premise, in Hadoop and on mainframe and anything in between. Integrating data from all these disparate and heterogeneous data sources is not a trivial task, unless you have the right tools.

And here is one more consideration to mull over. Do you find yourself manually hunting down and combining data to reproduce the same ad hoc report over and over again? Perhaps you often find yourself doing this in the wee hours of the night? Why reinvent the wheel? It would be more productive to automate the process of data ingestion and integration for reusable and shareable reports and Analytics.

Simply put, you need great data for great Analytics. We are excited to host Philip Russom of TDWI in a webinar to discuss how data management best practices can enable successful Analytics initiatives. 

And how about you?  Can you trust your data?  Please join us for this webinar to learn more about building a trust-relationship with your data!

  1. Gartner Report, ‘Predicts 2015: Power Shift in Business Intelligence and Analytics Will Fuel Disruption’; Authors: Josh Parenteau, Neil Chandler, Rita L. Sallam, Douglas Laney, Alan D. Duncan; Nov 21 2014
Share
Posted in Architects, Business/IT Collaboration, Data Governance, Data Integration, Data Warehousing | Tagged , , , , , , | 1 Comment

Is Your Data Ready to Maximize Value from Your CRM Investments?

Is Your Data Ready to Maximize Value from Your CRM Investments?

Is Your Data Ready to Maximize Value from Your CRM Investments?

A friend of mine recently reached out to me about some advice on CRM solutions in the market.  Though I have not worked for a CRM vendor, I’ve had both direct experience working for companies that implemented such solutions to my current role interacting with large and small organizations regarding their data requirements to support ongoing application investments across industries. As we spoke, memories started to surface when he and I had worked on implementing Salesforce.com (SFDC) many years ago. Memories that we wanted to forget but important to call out given his new situation.

We worked together for a large mortgage lending software vendor selling loan origination solutions to brokers and small lenders mainly through email and snail mail based marketing.  He was responsible for Marketing Operations, and I ran Product Marketing. The company looked at Salesforce.com to help streamline our sales operations  and improve how we marketed and serviced our customers.  The existing CRM system was from the early 90’s and though it did what the company needed it to do, it was heavily customized, costly to operate, and served its life. It was time to upgrade, to help grow the business, improve business productivity, and enhance customer relationships.

After 90 days of rolling out SFDC, we ran into some old familiar problems across the business.  Sales reps continued to struggle in knowing who was a current customer using our software, marketing managers could not create quality mailing lists for prospecting purposes, and call center reps were not able to tell if the person on the other end was a customer or prospect. Everyone wondered why this was happening given we adopted the best CRM solution in the market.  You can imagine the heartburn and ulcers we all had after making such a huge investment in our new CRM solution.  C-Level executives were questioning our decisions and blaming the applications. The truth was, the issues were not related to SFDC but the data that we had migrated into the system and the lack proper governance and a capable information architecture to support the required data management integration between systems that caused these significant headaches.

During the implementation phase, IT imported our entire customer database of 200K+ unique customer entities from the old system to SFDC. Unfortunately, the mortgage industry was very transient and on average there were roughly 55K licenses mortgage brokers and lenders in the market and because no one ever validated the accuracy of who was really a customer vs. someone who had ever bought out product, we had a serious data quality issues including:

  • Trial users  who purchased evaluation copies of our products that expired were tagged as current customers
  • Duplicate records caused by manual data entry errors consisting of companies with similar but entered slightly differently with the same business address were tagged as unique customers
  • Subsidiaries of parent companies in different parts of the country that were tagged again as a unique customer.
  • Lastly, we imported the marketing contact database of prospects which were incorrectly accounted for as a customer in the new system

We also failed to integrate real-time purchasing data and information from our procurement systems for sales and support to handle customer requests. Instead of integrating that data in real-time with proper technology, IT had manually loaded these records at the end of the week via FTP resulting in incorrect billing information, statement processing, and a ton of complaints from customers through our call center. The price we paid for not paying attention to our data quality and integration requirements before we rolled out Salesforce.com was significant for a company of our size. For example:

  • Marketing got hit pretty hard. Each quarter we mailed evaluation copies of new products to our customer database of 200K, each costing the company $12 per to produce and mail. Total cost = $2.4M annually.  Because we had such bad data,  we would get 60% of our mailings returned because of invalid addresses or wrong contact information. The cost of bad data to marketing = $1.44M annually.
  • Next, Sales struggled miserably when trying to upgrade a customer by running cold call campaigns using the names in the database. As a result, sales productivity dropped by 40% and experienced over 35% sales turnover that year. Within a year of using SFDC, our head of sales got let go. Not good!
  • Customer support used SFDC to service customers, our average all times were 40 min per service ticket. We had believed that was “business as usual” until we surveyed what reps were spending their time each day and over 50% said it was dealing with billing issues caused by bad contact information in the CRM system.

At the end of our conversation, this was my advice to my friend:

  • Conduct a data quality audit of the systems that would interact with the CRM system. Audit how complete your critical master and reference data is including names, addresses, customer ID, etc.
  • Do this before you invest in a new CRM system. You may find that much of the challenges faced with your existing applications may be caused by the data gaps vs. the legacy application.
  • If they had a data governance program, involve them in the CRM initiative to ensure they understand what your requirements are and see how they can help.
  • However, if you do decide to modernize, collaborate and involve your IT teams, especially between your Application Development teams and your Enterprise Architects to ensure all of the best options are considered to handle your data sharing and migration needs.
  • Lastly, consult with your technology partners including your new CRM vendor, they may be working with solution providers to help address these data issues as you are probably not the only one in this situation.

Looking Ahead!

CRM systems have come a long way in today’s Big Data and Cloud Era. Many firms are adopting more flexible solutions offered through the Cloud like Salesforce.com, Microsoft Dynamics, and others. Regardless of how old or new, on premise or in the cloud, companies invest in CRM not to just serve their sales teams or increase marketing conversion rates, but to improve your business relationship with your customers. Period! It’s about ensuring you have data in these systems that is trustworthy, complete, up to date, and actionable to improve customer service and help drive sales of new products and services to increase wallet share. So how to do you maximize your business potential from these critical business applications?

Whether you are adopting your first CRM solution or upgrading an existing one, keep in mind that Customer Relationship Management is a business strategy, not just a software purchase. It’s also about having a sound and capable data management and governance strategy supported by people, processes, and technology to ensure you can:

  • Access and migrate data from old to new avoiding develop cost overruns and project delays.
  • Identify, detect, and distribute transactional and reference data from existing systems into your front line business application in real-time!
  • Manage data quality errors including duplicate records, invalid names and contact information due to proper data governance and proactive data quality monitoring and measurement during and after deployment
  • Govern and share authoritative master records of customer, contact, product, and other master data between systems in a trusted manner.

Will your data be ready for your new CRM investments?  To learn more:

Follow me on Twitter @DataisGR8

Share
Posted in Architects, Cloud, Cloud Application Integration, Cloud Computing, Cloud Data Integration, Cloud Data Management, CMO, Customer Acquisition & Retention, SaaS | Tagged , , , , , , , , , | Leave a comment

There are Three Kinds of Lies: Lies, Damned lies, and Data

Lies, Damned lies, and Data

Lies, Damned lies, and Data

The phrase Benjamin Disraeli used in the 19th century was: There are three kinds of lies: lies, damned lies, and statistics.

Not so long ago, Google created a Web site to figure out just how many people had influenza. How they did this was by tracking “flu-related search queries”, “location of the query,” and applied it to an estimation algorithm. According to the website, at the flu season’s peak in January, nearly 11 percent of the United States population may have influenza. This means that nearly 44 million of us will have had the flu or flu-like symptoms. In its weekly report the Centers for Disease Control and Prevention put this at 5.6%, which means that less than 23 million of us actually went to the doctor’s office to be tested for flu or to get a flu-shot.

Now, imagine if I were a drug manufacturer. There is a theory about what went wrong. The problems may be due to widespread media coverage of this year’s flu season. Then add social media, which helped news of the flu spread quicker than the virus itself. In other words, the algorithm is looking only at the numbers, not at the context of the search results.

In today’s digitally connected world, data is everywhere: in our phones, search queries, friendships, dating profiles, cars, food, and reading habits. Almost everything we touch is part of a larger data set. The people and companies that interpret the data may fail to apply background and outside conditions to the numbers they capture.

Now, while we build our big data repositories, we have to spend some time to explain how we collected the data and under what context.

Twitter @bigdatabeat

Share
Posted in Big Data, Cloud Data Management, Data Governance, Data Transformation, Data Warehousing, Hadoop | Tagged , , , , | Leave a comment