Tag Archives: Best Practices
In my last blog, I talked about the dreadful experience of cleaning raw data by hand as a former analyst a few years back. Well, the truth is, I was not alone. At a recent data mining Meetup event in San Francisco bay area, I asked a few analysts: “How much time do you spend on cleaning your data at work?” “More than 80% of my time” and “most my days” said the analysts, and “they are not fun”.
But check this out: There are over a dozen Meetup groups focused on data science and data mining here in the bay area I live. Those groups put on events multiple times a month, with topics often around hot, emerging technologies such as machine learning, graph analysis, real-time analytics, new algorithm on analyzing social media data, and of course, anything Big Data. Cools BI tools, new programming models and algorithms for better analysis are a big draw to data practitioners these days.
That got me thinking… if what analysts said to me is true, i.e., they spent 80% of their time on data prepping and 1/4 of that time analyzing the data and visualizing the results, which BTW, “is actually fun”, quoting a data analyst, then why are they drawn to the events focused on discussing the tools that can only help them 20% of the time? Why wouldn’t they want to explore technologies that can help address the dreadful 80% of the data scrubbing task they complain about?
Having been there myself, I thought perhaps a little self-reflection would help answer the question.
As a student of math, I love data and am fascinated about good stories I can discover from them. My two-year math program in graduate school was primarily focused on learning how to build fabulous math models to simulate the real events, and use those formula to predict the future, or look for meaningful patterns.
I used BI and statistical analysis tools while at school, and continued to use them at work after I graduated. Those software were great in that they helped me get to the results and see what’s in my data, and I can develop conclusions and make recommendations based on those insights for my clients. Without BI and visualization tools, I would not have delivered any results.
That was fun and glamorous part of my job as an analyst, but when I was not creating nice charts and presentations to tell the stories in my data, I was spending time, great amount of time, sometimes up to the wee hours cleaning and verifying my data, I was convinced that was part of my job and I just had to suck it up.
It was only a few months ago that I stumbled upon data quality software – it happened when I joined Informatica. At first I thought they were talking to the wrong person when they started pitching me data quality solutions.
Turns out, the concept of data quality automation is a highly relevant and extremely intuitive subject to me, and for anyone who is dealing with data on the regular basis. Data quality software offers an automated process for data cleansing and is much faster and delivers more accurate results than manual process. To put that in math context, if a data quality tool can reduce the data cleansing effort from 80% to 40% (btw, this is hardly a random number, some of our customers have reported much better results), that means analysts can now free up 40% of their time from scrubbing data, and use that times to do the things they like – playing with data in BI tools, building new models or running more scenarios, producing different views of the data and discovering things they may not be able to before, and do all of that with clean, trusted data. No more bored to death experience, what they are left with are improved productivity, more accurate and consistent results, compelling stories about data, and most important, they can focus on doing the things they like! Not too shabby right?
I am excited about trying out the data quality tools we have here at Informtica, my fellow analysts, you should start looking into them also. And I will check back in soon with more stories to share..
Question: What do American Airlines, Liberty Mutual, Discount Tire and MD Anderson all have in common?
a) They are all top in their field.
b) They all view data as critical to their business success.
c) They are all using Agile Data Integration to drive business agility.
d) They have spoken about their Data Integration strategy at Informatica World in Vegas.
Did you reply all of the above? If so then give yourself a Ding Ding Ding. Or shall we say Ka-Ching in honor of our host city?
Indeed Data experts from these companies and many more flocked to Las Vegas for Informatica World. They shared their enthusiasm for the important role of data in their business. These industry leaders discussed best practices that facilitate an Agile Data Integration process.
American Airlines recently completed a merger with US Airways, making them the largest airline in the world. In order to service critical reporting requirements for the merged airlines, the enterprise data team undertook a huge Data Integration task. This effort involved large-scale data migration and included many legacy data sources. The project required transferring over 4TB of current history data for Day 1 reporting. There is still a major task of integrating multiple combined subject areas in order to give a full picture of combined reporting.
American Airlines architects recommend the use of Data Integration design patterns in order to improve agility. The architects shared success-factors for merger Data Integration. They discussed the importance of ownership by leadership from IT and business. They emphasized the benefit of open and honest communications between teams. They architects also highlighted the need to identify integration teams and priorities. Finally the architects discussed the significance of understanding cultural differences and celebrating success. The team summarized with merger Data Integration lessons learned : Metadata is key, IT and business collaboration is critical, and profiling and access to the data is helpful.
Liberty Mutual, the third largest property and casualty insurer in the US, has grown through acquisitions. The Data Integration team needs to support this business process. They have been busy integrating five claim systems into one. They are faced with a large-scale Data Integration challenge. To add to the complexity, their business requires that each phase is completed in one weekend, no data is lost in the process and that all finances balance out at the end of each merge. Integrating all claims in a single location was critical for smooth processing of insurance claims. A single system also leads to reduced costs and complexity for support and maintenance.
Liberty Mutual experts recommend a methodology of work preparation, profiling, delivery and validation. Rinse and repeat. Additionally, the company chose to utilize a visual Data Integration tool. This tool was quick and easy for the team to learn and greatly enhanced development agility.
Discount Tire, the largest independent tire dealer in the USA, shared tips and tricks from migrating legacy data into a new SAP system. This complex project included data conversion from 50 legacy systems. The company needs to combine and aggregate data from many systems, including customer, sales, financial and supply chain. This integrated system helps Discount Tire make key business decisions and remain competitive in a highly competitive space.
Discount Tire has automated their data validation process in development and in production. This reduces testing time, minimizes data defects and increases agility of development and operations. They have also implemented proactive monitoring in order to accomplish early detection and correction of data problems in production.
MD Anderson Cancer Center is the No. 1 hospital for cancer care in the US according to U.S. News and World Report. They are pursuing the lofty goal of erasing cancer from existence. Data Integration is playing an important role in this fight against cancer. In order to accomplish their goal, MD Anderson researchers rely on integration of vast amounts of genomic, clinical and pharmaceutical data to facilitate leading-edge cancer research.
MD Anderson experts pursue Agile Data Integration through close collaboration between IT and business stakeholders. This enables them to meet the data requirements of the business faster and better. They shared that data insights, through metadata management, offer a significant value to the organization. Finally the experts at MD Anderson believe in ‘Map Once, Deploy Anywhere’ in order to accomplish Agile Data Integration.
So let’s recap, Data Integration is helping:
- An airlines continue to serve its customers and run its business smoothly post-merger.
- A tire retail company to procure and provide tires to its customers and maintain leadership
- An insurance company to process claims accurately and in a timely manner, while minimizing costs, and
- A cancer research center to cure cancer.
Not too shabby, right? Data Integration is clearly essential to business success!
So OK, I know, I know… what happens in Vegas, stays in Vegas. Still, this was one love-fest I was compelled to share! Wish you were there. Hopefully you will next year!
To learn more about Agile Data Integration, check out this webinar: Great Data by Design II: How to Get Started with Next-Gen Data Integration
Now you can experience the next best thing by attending InformaticaWorld 2014 and hearing the American Airlines US Airways Data Architects talk about the data challenges they faced. They will discuss the role of architecture in M&A, integrating legacy data, lessons learned, and best practices in Data Integration.
While you are at the show, you will have the opportunity to hear many industry experts discuss current trends in Agile end-to-end Data Integration.
Agile Data Integration Development
To deliver the agility that your business requires, IT and Business must pursue a collaborative Data Integration process, with the appropriate Analyst self-service Data Integration tools. At InformaticaWorld, you can learn about Agile Data Integration development from the experts at GE Aviation, who will discuss Agile Data Integration for Big Data Analytics. Experts from Roche, will discuss how Agile Data Integration has lead to a 5x reduction in development time, improved business self-service capabilities and increased data credibility.
Another aspect of agility is your ability to scale your Data Warehouse to rapidly support more data, data sources, users and projects. Come hear the experts from Liberty Mutual share challenges, pitfalls, best practices and recommendations for those considering large-scale Data Integration projects, including successful implementation of complex data migrations, data quality and data distribution processes.
The management of an enterprise-scale Data Warehouse involves the operation of a mature and complex mission-critical environment, which is commonly driven through an Integration Competency Center (ICC) initiative. You now have the need to inspect and adapt your production system and expedite data validation and monitoring processes through automation, so that data issues can be quickly caught and corrected and resources can be freed up to focus on development.
The experts from University of Pittsburgh Medical Center, along with Informatica Professional Services experts, will discuss best practices, lessons learned and the process of transitioning from ‘analytics as project’ to an enterprise initiative through the use of an Integration Competency Center.
Hear from the Informatica Product Experts
You will have many opportunities to hear directly from the Informatica product experts about end-to-end Data Integration Agility delivered in the recent 9.6 release of PowerCenter.
See PowerCenter 9.6 in Action
Don’t miss the opportunity to see live demos of the cool new features of PowerCenter 9.6 release at the multitude of hands-on labs being offered at InformaticaWorld this year.
For example you can learn how to empower business users through self-service Data Integration with PowerCenter Analyst tool; how to reduce testing time of Data Integration projects through automated validation tests; and how to scale your Data Integration with High Availability and Grid.
The sessions we described here are a sampling of the rich variety of sessions that will be offered on Data Integration at the show. We hope that you will join us at InformaticaWorld this year in Las Vegas on May 13-15 and as you plan your visit, please check out the complete listing of sessions and labs that are focused on Data Integration.
Please feel free to leave a comment and let us know which InformaticaWorld session/s you are most looking forward to! See you there!
Recently, we posted an initial discussion between Informatica’s CMO Marge Breya and CIO Eric Johnson, explaining how CIOs and CMOs can align and thrive. In the dialog below, Breya and Johnson provide additional detail on how their departments partner effectively.
Q: Pretty much everyone agrees that marketing has changed from an art to a science. How does that shift translate into how you work together day to day?
Eric: The different ways that marketers now have to get to the prospects and customers to grow their marketshare has exploded. It used to be a single marketing solution that was an after-thought, and bolted on to the CRM solution. Now, there are just so many ways that marketers have to consider how they market to people. It’s driven by things going on in the market, like how people interact with companies and the lifestyle changes people have made around mobile devices.
Marge: Just look at the sheer number of systems and sources of data we care about. If you want to understand upsell and cross-sell for customers you have to look at what’s happening in the ERP system, what’s happened from a bookings standpoint, whether the customer is a parent or child of another customer, how you think about data by region, by industry by job title. And there’s how you think about successful conversion of leads. Is it the way you’d predicted? What’s your most valuable content? Who’s your most valuable outlet or event? What’s your ROI? You can’t get that from any one single system. More and more, it’s all about conversion rates, about forecasting and theories about how the business is working from a model standpoint. And I haven’t even talked about social.
Q: With so many emerging technologies to look at, how do CMOs reconcile the need to quickly add new products, while CIOs reconcile the need for everything to work securely and well together?
Eric: There’s this yin and yang that’s starting to build between the CIO and the CMO as we both understand each other and the world we each live in, and therefore collaborate and partner more. But at the same time, there’s a tension between a CMO’s need to bring in solutions very quickly, and the CIO’s need to do some basic vetting of that technology. It’s a tension between speed vs. scale and liability to the company. It’s on a case-by-case basis, but as a CIO you don’t say “no.” You give options. You show CMOs the tradeoffs they’re going to make.
There are also risks that are easy to take and worth taking. They won’t cause any problems with the enterprise on a security or integration perspective, so let’s just try it. It may not work — and that’s OK.
Marge: There’s temptation across departments for the shiny new object. You’ll hear about a new technology, and you think this might solve our problems, or move the business faster. The tension even within the marketing department is: do we understand how and if it will impact the business process? And do we understand how that business process will have to change if the shiny new object comes on board?
Q: CMOs are getting data from potentially hundreds of sources, including partners, third parties, LinkedIn and Google. How do the two of you work together to determine a trustworthy data source? Do you talk about it?
Eric: The issue of trusting your data and making sure you’re doing your due diligence on it is incredibly important. Without doing that, you are running the risk of finding yourself in a very tricky situation from a legal perspective, and potentially a liability perspective. To do that, we have a lot of technology that helps us manage a lot data sources coming into a single source of truth.
On top of that, we are working with marketers who are much more savvy about technology and data. And that makes IT’s job easier — and our partnership better — because we are now talking the same language. Sometimes it’s even hard to tell where the line between the two groups actually sits. Some of the marketing people are as technical as the IT people, and some of the IT people are becoming pretty well-versed in marketing.
Q: How do you decide what technologies to buy?
Marge: A couple of weeks ago we went on a shopping trip, and spent the day at a venture capital firm looking at new companies. It was fun. He and I were brainstorming and questioning each other to see if each technology would be useful, and could we imagine how everything would go together. We first explored possibilities, and then we considered whether it was practical.
Eric: Ultimately, Marge owns the budget. But before the budgeting cycle we sit down to discuss what things she wants to work on, and whether she wants to swap technology out. I make sure Marge is getting what she needs from the technologies. There’s a reliance on the IT team to do some due diligence on the technical aspects of this technology: Does it work. Do we want to do business with these people? Is it going to scale? So each party has a role to play in evaluating whether it’s a good solution for the company. As a CIO you don’t say “no” unless there’s something really bad, and you hope you have a relationship with the CMO where you can say here are the tradeoffs you’re making. You say no one has an agenda here, but here are the risks you have to be ok taking. It’s not a “no.” It’s options.
A few years back, there was a movement in some businesses to establish “data stewards” – individuals who would sit at the hearts of the enterprise and make it their job to assure that data being consumed by the organization is of the highest possible quality, is secure, is contextually relevant, and capable of interoperating across any applications that need to consume it. While the data steward concept came along when everything was relational and structured, these individuals are now earning their pay when it comes to managing the big data boom.
The rise of big data is creating more than simple headaches for data stewards, it is creating turf wars across enterprises. As pointed out in a recent article in The Wall Street Journal, there isn’t yet a lot of clarity as to who owns and cares for such data. Is it IT? Is it lines of business? Is it legal? There are arguments that can be made for all jurisdictions.
In organizations these days, for example, marketing executives are generating, storing and analyzing large volumes of their own data within content management systems and social media analysis solutions. Many marketing departments even have their own IT budgets. Along with marketing, of course, everyone else within enterprises is seeking to pursue data analytics to better run their operations as well as foresee trends.
Typically, data has been under the domain of the CIO, the person who oversaw the collection, management and storage of information. In the Wall Street Journal article, however, it’s suggested that legal departments may be the best caretakers of big data, since big data poses a “liability exposure,” and legal departments are “better positioned to understand how to use big data without violating vendor contracts and joint-venture agreements, as well as keeping trade secrets.”
However, legal being legal, it’s likely that insightful data may end up getting locked away, never to see the light of day. Others may argue IT department needs to retain control, but there again, IT isn’t trained to recognize information that may set the business on a new course.
Focusing on big data ownership isn’t just an academic exercise. The future of the business may depend on the ability to get on top of big data. Gartner, for one, predicts that within the next three years, at least of a third of Fortune 100 organizations will experience an information crisis, “due to their inability to effectively value, govern and trust their enterprise information.”
This ability to “value, govern and trust” goes way beyond the traditional maintenance of data assets that IT has specialized in over the past few decades. As Gartner’s Andrew White put it: “Business leaders need to manage information, rather than just maintain it. When we say ‘manage,’ we mean ‘manage information for business advantage,’ as opposed to just maintaining data and its physical or virtual storage needs. In a digital economy, information is becoming the competitive asset to drive business advantage, and it is the critical connection that links the value chain of organizations.”
For starters, then, it is important that the business have full say over what data needs to be brought in, what data is important for further analysis, and what should be done with data once it gains in maturity. IT, however, needs to take a leadership role in assuring the data meets the organization’s quality standards, and that it is well-vetted so that business decision-makers can be confident in the data they are using.
The bottom line is that big data is a team effort, involving the whole enterprise. IT has a role to play, as does legal, as do the line of business.
Research firm Gartner, Inc., sent shockwaves across the technology landscape when it forecast CMOs will spend more on IT than CIOs by 2017[i]. The rationale? “We frequently hear our technology and service provider clients tell us they are dealing with business buyers more, and need to “speak the language.” Gartner itself has fueled this inferno with assertions such as, “By 2017 the CMO will spend more on IT than the CIO” (see “Webinar: By 2017 the CMO Will Spend More on IT Than the CIO”).”[ii] In the two years since Gartner first made that prediction, analysts and pundits have talked about a CIO/CMO battle for data supremacy — describing the two roles as “foes” inhabiting “separate worlds[iii]” that don’t even speak the same language.
But when CIOs are from Mars and CMOs are from Venus, their companies can end up with disjointed technologies that don’t play well together. The result? Security flaws, no single version of “truth,” and regulatory violations that can damage the business. The trick, then, is aligning the CIO and CMO planets.
Informatica’s CMO Marge Breya and CIO Eric Johnson show how they do it.
Q: There’s been a lot of talk lately about how CMOs are now the biggest users of data. That represents a shift in how CMOs and CIOs traditionally have worked together. How do you think the roles of the CMO and CIO need to mesh?
Eric: As I look across the lines of business, and evaluate the level of complexity, the volume of data and the systems we’re supporting, marketing is now by far the most complex part of the business we support. The systems that they have, the data that they have, has grown exponentially over the last four or five years. Now more than ever, [CMOs and CIOs are] very much attached at the hip. We have to be working in conjunction with one another.
Marge: Just to add to that I’d say over the last five years, we’ve been attached to things like CRM systems, or partner relationship systems. From a marketing standpoint, it has really been about management: How do you have visibility into what’s happening with the business. But over the last couple of years it’s become increasingly more important to focus on the “R” word — the relationship: How do you look at a customer name and understand how it relates to their past buying behavior. As a result, you need to understand how information lives from system to system, all across a time series, in order to make really great decisions. The “relate’ word is probably most important, at least in my team right now, and it’s not possible for me to relate data across the organization without having a great relationship with IT.
Q: So how often do you find yourselves talking together?
Eric: We talk to each other probably weekly, and I think our teams work together daily. There’s a constant collaboration and making sure that we’re in sync. You hear about the CIO/CMO relationship. I think it should be an easy relationship because there’s so much going on technology-wise and data-wise that the CMOs are becoming much more technically knowledgeable, and CIOs are starting to understand more and more what’s going on in their business that the line between them should be all about how you work together.
Marge: Of all the business partners in the company, Eric … helps us in marketing reimagine how marketing can be done. If the two of us can go back and forth, understand what’s working and what’s not working, and reimagine how we can be far more effective, or productive or know new things — to me that’s the judge of a healthy relationship between a CIO and a CMO. And luckily, we have that.
Q: It seems as if 2013 was the year of “big data.” But a Gartner survey[iv] said “The adoption is still at the early stages with fewer than 8% of all respondents indicating their organization has deployed big data solutions.” What do you think are the issues that are making it so difficult for companies?
Eric: The concept of big data is something companies want to get involved in. They want to understand how they can leverage this fast-growing volume of data from various sources. But the challenge is being able to understand what you’re looking for, and to know what kind of questions you have.
Marge: There’s a big focus on big data, almost for the sake of it in some cases. People get confused about whether it’s about the haystack, or the needle. Having a haystack for the heck of it isn’t usually what’s done. It’s for a purpose. It’s important to understand what part of that haystack is important for what part of your business. How up-to-date is it? How much can you trust the data. How much can you make real decisions from it. And frankly, who should have access to it. So much of the data we have today is sensitive, affected by privacy laws and other kinds of regulations. I think big data is appropriately a great term right now, but more importantly, it’s not just about big data, it’s about great data. How are you going to use it? And how it’s going to affect your business process.
Eric: You could go down into a rat hole if you’re chasing something and you’re not really sure what you’re going to do with it.
Marge: On the other hand, you can explore years of behavior and maybe come up with a great predictive model for what a new buying signal scoring engine could look like.
Q: One promise of big data is the ability to pull in data from so many sources. That would suggest a real need for you two to work together to ensure the quality and the integrity of the data. How do you collaborate on those issues?
Eric: There’s definitely a lot of work that has to be done working with the CMO and the marketing organization: To sit down and understand where’s this data coming from, what’s it going to be used for, and making sure you have the people and processing components. Especially with the level of complexity we have, with all the data coming in from so many sources, making sure that we really map that out, understand the data and what it looks like and what some of the challenges could be. So it’s partnering very closely with marketing to understand those processes, understand what they want to do with the data, and then putting the people, the processes and the technology in place so you can trust the data and have a single source of truth.
Marge: You hit the nail on the head with “people, process and technology.” Often, folks think of database quality or accuracy as being an IT problem. It’s a business problem. Most people know their business, they know what their data should look like. They know what revenue shapes should look like. What’s norm for the business. If the business people aren’t there from a governance standpoint, from a stewardship standpoint — literally saying “does this data make sense?” — without that partnership, forget it.
Gartner does a nice job of describing the digital landscape that marketers are facing today in its infographic below. In order to use technology as a differentiator, organizations need to get the most value from their data. The relationships between these technology is going to make the difference between organizations that gain a competitive advantage from their operations and the laggards.
[i] Gartner Research, December 20, 2013, “Market Trends: The Rising Importance of the Business Buyer – Fact of Fiction?” Derry N. Finkeldey
[ii] Gartner Research, December 20, 2013, “Market Trends: The Rising Importance of the Business Buyer – Fact of Fiction?” Derry N. Finkeldey
[iii] Gartner blog, January 25, 2013, “CMOs: Are You Cheating on Your CIO?”, Jennifer Beck, Vice President & Gartner Fellow
[iv] Gartner Research, September 12, 2013, “Survey Analysis: Big Data Adoption in 2013 Shows Substance Behind the Hype,” Lisa Kart, Nick Heudecker, Frank Buytendijk
Rob Karel has been doing a nice job explaining Big Data, Metadata and other topics for Mom, so now I’d like to tackle another key group of stakeholders – your children. My kids have been asking me for years what I do at work. It hasn’t been easy to come up with an explanation that they can understand, so I usually just end up with something like “I go to meetings and stuff.” That works for a while, but it’s not very informative or inspiring. So if their friends ask “what does your dad do for work”, I can’t imagine what stories they make up. So here goes my attempt to explain to a sixth-grader what the job of a systems integration professional is. (more…)
I’m excited to share that, since its launch in January 2013, the GovernYourData.com community has been very well received. With over 4,700 unique visitors and nearing 600 registered members, many data management practitioners recognize it as a valuable go-to resource to support their data governance efforts. While maintaining our core objective of vendor- and product-neutrality, the site offers over 100 best practice blog posts from over 17 different contributors, shares the details on a dozen upcoming industry events, and has links to a wide variety of white papers, analyst research, recommended books, and other educational resources. (more…)
A front office as defined by Wikipedia is “a business term that refers to a company’s departments that come in contact with clients, including the marketing, sales, and service departments” while a back office is “tasks dedicated to running the company….without being seen by customers.” Wikipedia goes on to say that “Back office functions can be outsourced to consultants and contractors, including ones in other countries.” Data Management was once a back office activity but in recent years it has moved to the front office. What changed? (more…)
For some of you “old timers” in the IT industry, you will remember the days when we used to hand-code our own Database Management Systems. Of course today we just go out and buy a general purpose DBMS like MySQL, Oracle, dBASE, or IBM DB2 to name a few. Or, if we wind the clock back further, there was a time when we used to write our own operating systems. Today it comes with the hardware or we can buy an OS like UNIX, iOS, Linux, OS X, Windows, and IBM z/OS. And I can still remember hand-coding network protocols in the days before TCP/IP became ubiquitous. Today we select from UDP, HTTP, POP3, FTP, IMAP, RMI, SOAP and others. (more…)