Tag Archives: Data Governance
Today, I am going to take a stab at rationalizing why one could even consider solving a problem with a solution that is well-known to be sub-par. Consider the Ford Pinto: Would you choose this car for your personal, land-based transportation simply because of the new plush dice in the window? For my European readers, replace the Pinto with the infamous Trabant and you get my meaning. The fact is, both of these vehicles made the list of the “worst cars ever built” due to their mediocre design, environmental hazards or plain personal safety record.
Rational people would never choose a vehicle this way. So I always ask myself, “How can IT organizations rationalize buying product X just because product Y is thrown in for free?” Consider the case in which an organization chooses their CRM or BPM system simply because the vendor throws in an MDM or Data Quality Solution for free: Can this be done with a straight face? You often hear vendors claim that “everything in our house is pre-integrated”, “plug & play” or “we have accelerators for this.” I would hope that IT procurement officers have come to understand that these phrases don’t close a deal in a cloud-based environment. An on-premise construct can never achieve this Nirvana unless it is customized based on client requirements.
Anyone can see the logic in getting “2 for the price of 1.” However, as IT procurement organizations seek to save a percentage of money every deal, they can’t lose sight of this key fact:
Standing up software (configuring, customizing, maintaining) and operating it over several years requires CLOSE inspection and scrutiny.
Like a Ford Pinto, Software cannot just be driven off the lot without a care, leaving you only to worry about changing the oil and filters at recommended intervals. Customization, operational risk and maintenance are a significant cost, which all my seasoned padawans will know. If Pinto buyers would have understood the Total Cost of Ownership before they made their purchase, they would have opted for Toyotas instead. Here is the bottom line:
If less than 10% of the overall requirements are solved by the free component
AND (and this is a big AND)
If less than 12% of the overall financial value is provided by the free component
Then it makes ZERO sense select a solution based on freebie add-ons.
When an add-on component is of significantly lower-quality than industry leading solutions, it becomes even more illogical to rely on it simply because it’s “free.” If analysts have affirmed that the leading solutions have stronger capabilities, flexibility and scalability, what does an IT department truly “save” by choosing an inferior “free” add-on?
So just why DO procurement officers to gravitate toward “free” add-ons, rather than high quality solutions? As a former procurement manager, I remember the motivations perfectly. Procurement teams are often measured by, and rewarded for, the savings they achieve. Because their motivation is near-term savings, long term quality issues are not the primary decision driver. And, if IT fails to successfully communicate the risks, cost drivers and potential failure rates to Procurement, the motivation to save up-front money will win every time.
Both sellers and buyers need to avoid these dances of self-deception, the “Pre-Integration Tango” and the “Freebie Cha-Cha”. No matter how much you loved driving that Pinto or Trabant off the dealer lot, your opinion changed after you drove it for 50,000 miles.
I’ve been in procurement. I’ve built, sold and implemented “accelerators” and “blueprints.” In my opinion, 2-for-1 is usually a bad idea in software procurement. The best software is designed to make 1+1=3. I would love to hear from you if you agree with my above “10% requirements/12% value” rule-of-thumb. If not, let me know what your decision logic would be.
Leo Eweani makes the case that the data tsunami is coming. “Businesses are scrambling to respond and spending accordingly. Demand for data analysts is up by 92%; 25% of IT budgets are spent on the data integration projects required to access the value locked up in this data “ore” – it certainly seems that enterprise is doing The Right Thing – but is it?”
Data is exploding within most enterprises. However, most enterprises have no clue how to manage this data effectively. While you would think that an investment in data integration would be an area of focus, many enterprises don’t have a great track record in making data integration work. “Scratch the surface, and it emerges that 83% of IT staff expect there to be no ROI at all on data integration projects and that they are notorious for being late, over-budget and incredibly risky.”
The core message from me is that enterprises need to ‘up their game’ when it comes to data integration. This recommendation is based upon the amount of data growth we’ve already experienced, and will experience in the near future. Indeed, a “data tsunami” is on the horizon, and most enterprises are ill prepared for it.
So, how do you get prepared? While many would say it’s all about buying anything and everything, when it comes to big data technology, the best approach is to splurge on planning. This means defining exactly what data assets are in place now, and will be in place in the future, and how they should or will be leveraged.
To face the forthcoming wave of data, certain planning aspects and questions about data integration rise to the top:
Performance, including data latency. Or, how quickly does the data need to flow from point or points A to point or points B? As the volume of data quickly rises, the data integration engines have got to keep up.
Data security and governance. Or, how will the data be protected both at-rest and in-flight, and how will the data be managed in terms of controls on use and change?
Abstraction, and removing data complexity. Or, how will the enterprise remap and re-purpose key enterprise data that may not currently exist in a well-defined and functional structure?
Integration with cloud-based data. Or, how will the enterprise link existing enterprise data assets with those that exist on remote cloud platforms?
While this may seem like a complex and risky process, think through the problems, leverage the right technology, and you can remove the risk and complexity. The enterprises that seem to fail at data integration do not follow that advice.
I suspect the explosion of data to be the biggest challenge enterprise IT will face in many years. While a few will take advantage of their data, most will struggle, at least initially. Which route will you take?
Nine years ago when I started in the data integration and quality space, data quality was all about algorithms and cleansing technology. Data went in, and the “best” solution was the one that could do the best job of fuzzy matching the data and cleaning more data than the other products. Of course, not data quality solution could clean 100% of the data so “exceptions” were dumped into a file that were left as an “exercise for the user” to deal with on their own. This usually meant using the data management product of choice, when there is nothing else…. Data goes into a spreadsheet, and then users would remediate the mistakes by hand in the spreadsheet. Then someone would write an SQL query to write the corrections back into the database. In the end, managing the exceptions was a very manual process with very little to no governance to the process.
The problem with this of course is that for very many companies, data stewardship is not the person’s day job. So if they have to spend time checking to see if someone else has corrected an error in the data, or getting approval to make a data change, or spending time then consolidating all the manual changes they made and then communicating those changes to management, then they don’t have much time left to sleep, much less eat. In the end, but business of creating quality data just doesn’t get done, or doesn’t get done well. In the end, data quality is a business issue, supported by IT, but the business facing part of the solution has been missing.
But that is about to change. Informatica already provides the most scalable data quality product for handling the automated portion of the data quality process. And now, in the latest release of Informatica Data Quality 9.6, we have created a new edition called the Data Quality Governance Edition to fully manage the exception process. This edition provides a completely governed process for managing remediation of data exceptions by business data stewards. It allows organizations to create their own customized process with different levels of review. Additionally, it makes it possible for business users to create their own data quality rules, describing the rules in plain language…. no coding necessary.
And of course, every organization wants to be able to track how they are improving. And Informatica Data Quality 9.6 includes embeddable dashboards that show the progress of how data quality is improving and impacting the business in a positive way.
Great data isn’t an accident. Great data happens by design. And for the first time, data cleansing has been combined with a holistic data stewardship process, allowing business and IT to collaborate to create quality data that supports critical business processes.
Over the last 40 years, data has become increasingly distributed. It used to all sit on storage connected to a mainframe. It used to be that the application of computing power to solve business problems was limited by the availability of CPU, memory, network and disk. Those limitations are no longer big inhibitors. Data fragmentation is now the new inhibitor to business agility. Data is now generated from distributed data sources not just within a corporation, but from business partners, from device sensors and from consumers Facebook-ing and tweeting away on the internet.
So to solve any interesting business problem in today’s fragmented data world, you now have to pull data together from a wide variety of data sources. That means business agility 100% depends on data integration agility. But how do you do deliver that agility in a way that is not just fast, but reliable, and delivers high quality data?
First, to achieve data integration agility, you need to move from a traditional waterfall development process to an agile development process.
Second, if you need reliability, you have to think about how you start treating your data integration process as a critical business process. That means thinking about how you will make your integration processes highly available. It also means you need to monitor and validate your operational data integration processes on an ongoing basis. The good news is that the capabilities you need for data validation as well as operational monitoring and alerting for your data integration process are now built into Informatica’s newest PowerCenter Edition, PowerCenter Premium Edition.
Lastly, the days where you can just move data from A to B without including a data quality process are over. Great data doesn’t happen by accident, it happens by design. And that means you also have to build in data quality directly into your data integration process.
Great businesses depend on great data. And great data means data that is delivered on time, with confidence and with high quality. So think about how your understanding of data integration and great data can make your career. Great businesses depend on great data and people like you who have the skills to make a difference. As a data professional, the time has never been better for you to make a contribution to the greatness of your organization. You have the opportunity to make a difference and have an impact because your skills and your understanding of data integration has never been more critical.
When I was seven years old, Danny Weiss had a birthday party where we played the telephone game. The idea is this: there are 8 people sitting around a table, the first person tells the next person a little story. They tell the next person, the story, and so on, all the way around the room. At the end of the game, you compare the original story that the first person tells and compare it to the story the 8th person tells. Of course, the stories are very different and everyone giggles hysterically… we were seven years old after all.
The reason I was thinking about this story is that data integration development is similarly inefficient as a seven year old birthday party. The typical process is that a business analyst, using the knowledge in their head about the business applications they are responsible for, creates a spreadsheet in Microsoft Excel that has a list of database tables and columns along with a set of business rules for how the data is to be transformed as it moved to a target system (a data warehouse or another application). The spreadsheet, which is never checked against real data, is then passed to a developer who then creates code in separate system in order to move the data, which is then checked by a QA person which is then checked again by the business analyst at the end of the process. This is the first time the business analyst verifies their specification against real data.
99 times out of 100, the data in the target system doesn’t match what the business analyst was expecting. Why? Either the original specification was wrong because the business analyst had a typo or the data is inaccurate. Or the data in the original system wasn’t organized the way the analyst thought it was organized. Or the developer misinterpreted the spreadsheet. Or the business analyst simply doesn’t need this data anymore – he needs some other data. The result is lots of errors, just like the telephone game. And the only way to fix it is with rework and then more rework.
But there is a better way. What if the data analyst could validate their specification against real data and self correct on the fly before passing the specification to the developer. What if the specification were not just a specification, but a prototype that could be passed directly to the developer who wouldn’t recode it, but would just modify it to add scalability and reliability? The result is much less rework and much faster time to development. In fact, up to 5 times faster.
That is what Agile Data integration is all about. Rapid prototyping and self-validation against real data up front by the business analyst. Sharing of results in a common toolset back and forth to the developer to improve the accuracy of communication.
Because we believe the agile process is so important to your success, Informatica is giving all of our PowerCenter Standard Edition (and higher editions) customers agile data integration for FREE!!! That’s right, if you are a current customer of Informatica PowerCenter, we are giving you the tools you need to go from the old fashion error-prone, waterfall, telephone game style of development to a modern 21st century Agile process.
• FREE rapid prototyping and data profiling for the data analyst.
• Go from prototype to production with no recoding.
• Better communication and better collaboration between analyst and developer
PowerCenter 9.6. Agile Data Integration built in. No more telephone game. It doesn’t get any better than that.
“Opportunity for the large community to share experiences, lessons learnt, and help those that are starting the MDM journey get on the right track.”
Next month, Informatica will host its third MDM Day conference. Our past two events in Las Vegas and London have been huge successes thanks to the active participation of our customers, partners, and colleagues. The conference is structured to provide opportunities for you to share your ideas, provide guidance to our product management team, and learn from other customers’ MDM and PIM journeys.
When: February 12th, 8:30 AM – 5:00 PM
Where: Westin Times Square
How: Register Here
I’m excited to share that, since its launch in January 2013, the GovernYourData.com community has been very well received. With over 4,700 unique visitors and nearing 600 registered members, many data management practitioners recognize it as a valuable go-to resource to support their data governance efforts. While maintaining our core objective of vendor- and product-neutrality, the site offers over 100 best practice blog posts from over 17 different contributors, shares the details on a dozen upcoming industry events, and has links to a wide variety of white papers, analyst research, recommended books, and other educational resources. (more…)
A front office as defined by Wikipedia is “a business term that refers to a company’s departments that come in contact with clients, including the marketing, sales, and service departments” while a back office is “tasks dedicated to running the company….without being seen by customers.” Wikipedia goes on to say that “Back office functions can be outsourced to consultants and contractors, including ones in other countries.” Data Management was once a back office activity but in recent years it has moved to the front office. What changed? (more…)
I’m glad to hear you feel comfortable explaining data to your friends, and I completely understand why you’ll avoid discussing metadata with them. You’re in great company – most business leaders also avoid discussing metadata at all costs! You mentioned during our last call that you keep reading articles in the New York Times about this thing called “Big Data” so as promised I’ll try to explain it as best I can. (more…)