Category Archives: Pervasive Data Quality
When I talk to customers about dealing with poor data quality, I consistently hear something like, “We know we have data quality problems, but we can’t get the business to help take ownership and do something about it.” I think that this is taking the easy way out. Throwing your hands up in the air doesn’t make change happen – it only prolongs the pain. If you want to affect a positive change in data quality and are looking for ways to engage the business, then you should join Barbara Latulippe, Director of Enterprise Information Management for EMC and and Kristen Kokie, VP IT Enterprise Strategic Services for Informatica for our webinar on Thursday October 24th to hear how they have dealt with data quality in their combined 40+ years in IT.
Now, understandably, tackling data quality problems is no small undertaking, and it isn’t easy. In many instances, the reason why organizations choose to do nothing about data quality is that bad data has been present for so long that manual work around efforts have become ingrained in the business processes for consuming data. In these cases, changing the way people do things becomes the largest obstacle to dealing with the root cause of the issues. But that is also where you will be able to find the costs associated with bad data: lost productivity, ineffective decision making, missed opportunities, etc..
As discussed in this previous webinar,(link to replay on the bottom of the page), successfully dealing with poor data quality takes initiative, and it takes communication. IT Departments are the engineers of the business: they are the ones who understand process and workflows; they are the ones who build the integration paths between the applications and systems. Even if they don’t own the data, they do end up owning the data driven business processes that consume data. As such, IT is uniquely positioned to provide customized suggestions based off of the insight from multiple previous interactions with the data.
Bring facts to the table when talking to the business. As those who directly interact daily with data, IT is in position to measure and monitor data quality, to identify key data quality metrics; data quality scorecards and dashboards can shine a light on bad data and directly relate it to the business via the downstream workflows and business processes. Armed with hard facts about impact on specific business processes, a Business user has an easier time affixing a dollar value on the impact of that bad data. Here’s some helpful resources where you can start to build your case for improved data quality. With these tools and insight, IT can start to affect change.
Data is becoming the lifeblood of organizations and IT organizations have a huge opportunity to get closer to the business by really knowing the data of the business. While data quality invariably involves technological intervention, it is more so a process and change management issue that ends up being critical to success. The easier it is to tie bad data to specific business processes, the more constructive the conversation can be with the Business.
In the Information Age we live and work in, where it’s hard to go even one day without a Google search, where do you turn for insights that can help you solve work challenges and progress your career? This is a tough question. How can we deal with the challenges of information overload – which some have called information pollution? (more…)
In my previous blog I explored the importance of a firm understanding of commercial packaged applications on data quality success. In this final post, I will examine the benefits of having operational experience as a key enabler of effective data quality delivery. (more…)
So goes the line in the 1999 Oliver Stone film, Any Given Sunday. In the film, Al Pacino plays Tony D’Amato, a “been there, done that” football coach who, faced with a new set of challenges, has to re-evaluate his tried and true assumptions about everything he had learned through his career. In an attempt to rally his troops, D’Amato delivers a wonderful stump speech challenging them to look for ways to move the ball forward, treating every inch of the field as something sacred and encouraging them to think differently about how to do so.
It’s that time of year again when Gartner publishes their annual Magic Quadrant for Data Quality Tools and Informatica has been positioned as a leader. Informatica continues to build off of a strong heritage of success in delivering enterprise data quality solutions through a continued focus on delivering industry leading data management solutions to address the end-to-end data quality lifecycle. In large part, our progress can be attributed to our product roadmap, specifically our recent release of Informatica Data Quality 9.5, aimed at helping organizations maximize their return on data. Let’s look at a few of the more significant additions for this release:
Data Governance and Stewardship
Data governance as a corporate initiative is certainly here to stay and increasingly, organizations are looking for ways to leverage technology more effectively in support of their governance efforts. In Informatica Data Quality 9.5 we’ve added new capabilities so data stewards can use technology to their advantage. In particular, new workflow and task management capabilities help automate otherwise manual, error prone data reconciliation steps. Armed with such capabilities, data stewards can effectively align the activities of both business and IT in addressing data quality issues without excessive overhead.
With the rise of social computing platforms, customer analytics has a new wealth of information from which to produce unforeseen insight into customer and product patterns. The challenge, however, is how to make sense of what is otherwise unstructured information. One solution is natural language processing (NLP) which uses probabilistic analytics initiatives. With NLP available in Informatica Data Quality 9.5 organizations can now make sense of free form text from feeds like Facebook or LinkedIn and use that to sharpen their decision making processes while harnessing the benefits of big data.
Data discovery, quite simply, is the process of uncovering specific types of data when its meaning is not otherwise known. With new data discovery capabilities now available in Informatica Data Quality 9.5, organizations can easily and effectively identify key pieces of information without the need for manual intervention. This is particularly of value in areas such as data masking, where the identification of sensitive information such as personally identifiable information (PII) is of utmost importance.
These are just a few of the capabilities introduced in Informatica Data Quality 9.5 last month. As is evidenced in the 2012 Magic Quadrant, we believe the innovation and roadmap for our products is clearly going in the right direction. We’re excited by what lies ahead and have plans to continue to drive new and innovative capabilities for the data quality market.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
A recent trip to a supermarket in Telluride, Colorado struck me as a funny place to find an analogy for data quality, but there it was. You see, supermarkets here require you to bring your own bags to cart your groceries home. Those brown disposable plastic bags are banned here – the town has made a firm commitment to the philosophy of Reduce, Reuse and Recycle. By adhering to this environmental philosophy, data integration teams can develop and deploy successful data quality strategies across the enterprise despite the constraints of today’s “do more with less” IT budgets.
In the decade that I’ve been in the Information Management space, I’ve noticed that success in data integration usually comes in small increments – typically on a project by project basis. However, by leveraging those small incremental successes and deploying them in a repeatable, consistent fashion – either as standardized rules sets or data services in a SOA – development teams can maximize their impact at the enterprise level.
In a May 2012 survey by the Ponemon Institute, 66 percent said they are not confident their organization would be able to detect the loss or theft of sensitive personal information contained in systems operated by third parties, including cloud providers. In addition, the majority are not confident that their organization would be able detect the loss or theft of sensitive personal information in their company’s production environment.
Which aspect of data security for your cloud solution is most important?
1. Is it to protect the data in copies of production/cloud applications used for test or training purposes? For example, do you need to secure data in your Salesforce.com Sandbox?
2. Is it to protect the data so that a user will see data based on her/his role, privileges, location and data privacy rules?
3. Is it to protect the data before it gets to the cloud?
As compliance continues to drive people to action, compliance with contractual agreements, especially for the cloud infrastructure continues to drive investment. In addition, many organizations are supporting Salesforce.com as well as packaged solutions such as Oracle eBusiness, Peoplesoft, SAP, and Siebel.
Of the available data protection solutions, tokenization has been used and is well known for supporting PCI data and preserving the format and width of a table column. But because many tokenization solutions today require creating database views or changing application source code, it has been difficult for organizations to support packaged applications that don’t allow these changes. In addition, databases and applications take a measurable performance hit to process tokens.
What might work better is to dynamically tokenize data before it gets to the cloud. So there would be a transparent layer between the cloud and on-premise data integration that would replace the sensitive data with tokens. In this way, additional code to the application would not be required.
In the Ponemon survey, most said the best control is to dynamically mask sensitive information based on the user’s privilege level. After dynamically masking sensitive data, people said encrypting all sensitive information contained in the record is the best option.
The strange thing is that people recognize there is a problem but are not spending accordingly. In the same survey from Ponemon, 69% of organizations find it difficult to restrict user access to sensitive information in IT and business environments. However, only 33% say they have adequate budgets to invest in the necessary solutions to reduce the insider threat.
Is this an opportunity for you?
Hear Larry Ponemon discuss the survey results in more detail during a CSOonline.com/Computerworld webinar, Data Privacy Challenges and Solutions: Research Findings with Ponemon Institute, on Wednesday, June 13.
I regularly receive questions regarding the types of skills data quality analysts should have in order to be effective. In my experience, regardless of scope, high performing data quality analysts need to possess a well-rounded, balanced skill set – one that marries technical “know how” and aptitude with a solid business understanding and acumen. But, far too often, it seems that undue importance is placed on what I call the data quality “hard skills”, which include; a firm grasp of database concepts, hands on data analysis experience using standard analytical tool sets, expertise with commercial data quality technologies, knowledge of data management best practices and an understanding of the software development life cycle. (more…)
If you haven’t already, I think you should read The Forrester Wave™: Data Virtualization, Q1 2012. For several reasons – one, to truly understand the space, and two, to understand the critical capabilities required to be a solution that solves real data integration problems.
At the very outset, let’s clearly define Data Virtualization. Simply put, Data Virtualization is foundational to Data Integration. It enables fast and direct access to the critical data and reports that the business needs and trusts. It is not to be confused with simple, traditional Data Federation. Instead, think of it as a superset which must complement existing data architectures to support BI agility, MDM and SOA. (more…)