Tag Archives: Best Practices
A few years back, there was a movement in some businesses to establish “data stewards” – individuals who would sit at the hearts of the enterprise and make it their job to assure that data being consumed by the organization is of the highest possible quality, is secure, is contextually relevant, and capable of interoperating across any applications that need to consume it. While the data steward concept came along when everything was relational and structured, these individuals are now earning their pay when it comes to managing the big data boom.
The rise of big data is creating more than simple headaches for data stewards, it is creating turf wars across enterprises. As pointed out in a recent article in The Wall Street Journal, there isn’t yet a lot of clarity as to who owns and cares for such data. Is it IT? Is it lines of business? Is it legal? There are arguments that can be made for all jurisdictions.
In organizations these days, for example, marketing executives are generating, storing and analyzing large volumes of their own data within content management systems and social media analysis solutions. Many marketing departments even have their own IT budgets. Along with marketing, of course, everyone else within enterprises is seeking to pursue data analytics to better run their operations as well as foresee trends.
Typically, data has been under the domain of the CIO, the person who oversaw the collection, management and storage of information. In the Wall Street Journal article, however, it’s suggested that legal departments may be the best caretakers of big data, since big data poses a “liability exposure,” and legal departments are “better positioned to understand how to use big data without violating vendor contracts and joint-venture agreements, as well as keeping trade secrets.”
However, legal being legal, it’s likely that insightful data may end up getting locked away, never to see the light of day. Others may argue IT department needs to retain control, but there again, IT isn’t trained to recognize information that may set the business on a new course.
Focusing on big data ownership isn’t just an academic exercise. The future of the business may depend on the ability to get on top of big data. Gartner, for one, predicts that within the next three years, at least of a third of Fortune 100 organizations will experience an information crisis, “due to their inability to effectively value, govern and trust their enterprise information.”
This ability to “value, govern and trust” goes way beyond the traditional maintenance of data assets that IT has specialized in over the past few decades. As Gartner’s Andrew White put it: “Business leaders need to manage information, rather than just maintain it. When we say ‘manage,’ we mean ‘manage information for business advantage,’ as opposed to just maintaining data and its physical or virtual storage needs. In a digital economy, information is becoming the competitive asset to drive business advantage, and it is the critical connection that links the value chain of organizations.”
For starters, then, it is important that the business have full say over what data needs to be brought in, what data is important for further analysis, and what should be done with data once it gains in maturity. IT, however, needs to take a leadership role in assuring the data meets the organization’s quality standards, and that it is well-vetted so that business decision-makers can be confident in the data they are using.
The bottom line is that big data is a team effort, involving the whole enterprise. IT has a role to play, as does legal, as do the line of business.
Research firm Gartner, Inc., sent shockwaves across the technology landscape when it forecast CMOs will spend more on IT than CIOs by 2017[i]. The rationale? “We frequently hear our technology and service provider clients tell us they are dealing with business buyers more, and need to “speak the language.” Gartner itself has fueled this inferno with assertions such as, “By 2017 the CMO will spend more on IT than the CIO” (see “Webinar: By 2017 the CMO Will Spend More on IT Than the CIO”).”[ii] In the two years since Gartner first made that prediction, analysts and pundits have talked about a CIO/CMO battle for data supremacy — describing the two roles as “foes” inhabiting “separate worlds[iii]” that don’t even speak the same language.
But when CIOs are from Mars and CMOs are from Venus, their companies can end up with disjointed technologies that don’t play well together. The result? Security flaws, no single version of “truth,” and regulatory violations that can damage the business. The trick, then, is aligning the CIO and CMO planets.
Informatica’s CMO Marge Breya and CIO Eric Johnson show how they do it.
Q: There’s been a lot of talk lately about how CMOs are now the biggest users of data. That represents a shift in how CMOs and CIOs traditionally have worked together. How do you think the roles of the CMO and CIO need to mesh?
Eric: As I look across the lines of business, and evaluate the level of complexity, the volume of data and the systems we’re supporting, marketing is now by far the most complex part of the business we support. The systems that they have, the data that they have, has grown exponentially over the last four or five years. Now more than ever, [CMOs and CIOs are] very much attached at the hip. We have to be working in conjunction with one another.
Marge: Just to add to that I’d say over the last five years, we’ve been attached to things like CRM systems, or partner relationship systems. From a marketing standpoint, it has really been about management: How do you have visibility into what’s happening with the business. But over the last couple of years it’s become increasingly more important to focus on the “R” word — the relationship: How do you look at a customer name and understand how it relates to their past buying behavior. As a result, you need to understand how information lives from system to system, all across a time series, in order to make really great decisions. The “relate’ word is probably most important, at least in my team right now, and it’s not possible for me to relate data across the organization without having a great relationship with IT.
Q: So how often do you find yourselves talking together?
Eric: We talk to each other probably weekly, and I think our teams work together daily. There’s a constant collaboration and making sure that we’re in sync. You hear about the CIO/CMO relationship. I think it should be an easy relationship because there’s so much going on technology-wise and data-wise that the CMOs are becoming much more technically knowledgeable, and CIOs are starting to understand more and more what’s going on in their business that the line between them should be all about how you work together.
Marge: Of all the business partners in the company, Eric … helps us in marketing reimagine how marketing can be done. If the two of us can go back and forth, understand what’s working and what’s not working, and reimagine how we can be far more effective, or productive or know new things — to me that’s the judge of a healthy relationship between a CIO and a CMO. And luckily, we have that.
Q: It seems as if 2013 was the year of “big data.” But a Gartner survey[iv] said “The adoption is still at the early stages with fewer than 8% of all respondents indicating their organization has deployed big data solutions.” What do you think are the issues that are making it so difficult for companies?
Eric: The concept of big data is something companies want to get involved in. They want to understand how they can leverage this fast-growing volume of data from various sources. But the challenge is being able to understand what you’re looking for, and to know what kind of questions you have.
Marge: There’s a big focus on big data, almost for the sake of it in some cases. People get confused about whether it’s about the haystack, or the needle. Having a haystack for the heck of it isn’t usually what’s done. It’s for a purpose. It’s important to understand what part of that haystack is important for what part of your business. How up-to-date is it? How much can you trust the data. How much can you make real decisions from it. And frankly, who should have access to it. So much of the data we have today is sensitive, affected by privacy laws and other kinds of regulations. I think big data is appropriately a great term right now, but more importantly, it’s not just about big data, it’s about great data. How are you going to use it? And how it’s going to affect your business process.
Eric: You could go down into a rat hole if you’re chasing something and you’re not really sure what you’re going to do with it.
Marge: On the other hand, you can explore years of behavior and maybe come up with a great predictive model for what a new buying signal scoring engine could look like.
Q: One promise of big data is the ability to pull in data from so many sources. That would suggest a real need for you two to work together to ensure the quality and the integrity of the data. How do you collaborate on those issues?
Eric: There’s definitely a lot of work that has to be done working with the CMO and the marketing organization: To sit down and understand where’s this data coming from, what’s it going to be used for, and making sure you have the people and processing components. Especially with the level of complexity we have, with all the data coming in from so many sources, making sure that we really map that out, understand the data and what it looks like and what some of the challenges could be. So it’s partnering very closely with marketing to understand those processes, understand what they want to do with the data, and then putting the people, the processes and the technology in place so you can trust the data and have a single source of truth.
Marge: You hit the nail on the head with “people, process and technology.” Often, folks think of database quality or accuracy as being an IT problem. It’s a business problem. Most people know their business, they know what their data should look like. They know what revenue shapes should look like. What’s norm for the business. If the business people aren’t there from a governance standpoint, from a stewardship standpoint — literally saying “does this data make sense?” — without that partnership, forget it.
Gartner does a nice job of describing the digital landscape that marketers are facing today in its infographic below. In order to use technology as a differentiator, organizations need to get the most value from their data. The relationships between these technology is going to make the difference between organizations that gain a competitive advantage from their operations and the laggards.
[i] Gartner Research, December 20, 2013, “Market Trends: The Rising Importance of the Business Buyer – Fact of Fiction?” Derry N. Finkeldey
[ii] Gartner Research, December 20, 2013, “Market Trends: The Rising Importance of the Business Buyer – Fact of Fiction?” Derry N. Finkeldey
[iii] Gartner blog, January 25, 2013, “CMOs: Are You Cheating on Your CIO?”, Jennifer Beck, Vice President & Gartner Fellow
[iv] Gartner Research, September 12, 2013, “Survey Analysis: Big Data Adoption in 2013 Shows Substance Behind the Hype,” Lisa Kart, Nick Heudecker, Frank Buytendijk
Rob Karel has been doing a nice job explaining Big Data, Metadata and other topics for Mom, so now I’d like to tackle another key group of stakeholders – your children. My kids have been asking me for years what I do at work. It hasn’t been easy to come up with an explanation that they can understand, so I usually just end up with something like “I go to meetings and stuff.” That works for a while, but it’s not very informative or inspiring. So if their friends ask “what does your dad do for work”, I can’t imagine what stories they make up. So here goes my attempt to explain to a sixth-grader what the job of a systems integration professional is. (more…)
I’m excited to share that, since its launch in January 2013, the GovernYourData.com community has been very well received. With over 4,700 unique visitors and nearing 600 registered members, many data management practitioners recognize it as a valuable go-to resource to support their data governance efforts. While maintaining our core objective of vendor- and product-neutrality, the site offers over 100 best practice blog posts from over 17 different contributors, shares the details on a dozen upcoming industry events, and has links to a wide variety of white papers, analyst research, recommended books, and other educational resources. (more…)
A front office as defined by Wikipedia is “a business term that refers to a company’s departments that come in contact with clients, including the marketing, sales, and service departments” while a back office is “tasks dedicated to running the company….without being seen by customers.” Wikipedia goes on to say that “Back office functions can be outsourced to consultants and contractors, including ones in other countries.” Data Management was once a back office activity but in recent years it has moved to the front office. What changed? (more…)
For some of you “old timers” in the IT industry, you will remember the days when we used to hand-code our own Database Management Systems. Of course today we just go out and buy a general purpose DBMS like MySQL, Oracle, dBASE, or IBM DB2 to name a few. Or, if we wind the clock back further, there was a time when we used to write our own operating systems. Today it comes with the hardware or we can buy an OS like UNIX, iOS, Linux, OS X, Windows, and IBM z/OS. And I can still remember hand-coding network protocols in the days before TCP/IP became ubiquitous. Today we select from UDP, HTTP, POP3, FTP, IMAP, RMI, SOAP and others. (more…)
Last week I described how Informatica Identity Resolution (IIR) can be used to match data from different lists or databases even when the data includes typos, translation mistakes, transcription errors, invalid abbreviations, and other errors. IIR has a wide range of use cases. Here are a few. (more…)
Whether you are establishing a new outsourced delivery model for your integration services or getting ready for the next round of contract negotiations with your existing supplier, you need a way to hold the supplier accountable – especially when it is an exclusive arrangement. Here are four key metrics that should be included in the multi-year agreement. (more…)
If you have been following publications in the Potential at Work Community or any number of Linkedin discussions such this one on the DrJJ group (a think-tank for information management best practices), you will have noticed the Agile methodology topic come up time and time again. For instance, check out the article Architect Your Way From Sluggish to Speed or the video Focus on Agility Adaptability. It hasn’t always been this way. For many years the architectural focus was on RASP.
In previous posts, we introduced the concept of the Informatica ILM Nearline and discussed how Informatica ILM Nearline could help your business. To recapitulate: the major advantage of Informatica ILM Nearline is its superior data access performance, which enables a more aggressive approach to migrating huge volumes of data out of the online repository to an accessible, highly compressed archive (on inexpensive 2nd and 3rd tier storage infrastructure).
Today, I will be considering the question of when an enterprise should consider implementing Informatica ILM Nearline. Broadly speaking, such implementations fall into two categories: they either offer a “cure” for an existing data management problem or represent a proactive implementation of data best practices within the organization.
Cure or Prevention?
The “cure” type of implementation is typically associated with a data warehouse or business application “rescue” project. This is undertaken when the production system grows to a point where database size causes major performance problems and affects the ability to meet Service Level Agreements (SLAs) and manage business processes in a timely manner. In these kinds of situation, it is mainly the operations division of the organization that is affected, and who demand an immediate fix that can take the form of an Informatica ILM Nearline implementation. The question here is: How quickly can the “cure” implementation stabilize performance and ensure satisfaction of SLAs?
On the other hand, the best practice approach, much like current practices related to healthy living, focuses on prevention rather than on curing. In this respect, best practices dictate that the Informatica ILM Nearline implementation should start as soon as some of the data in the production system becomes “infrequently accessed”, or “cold”. In data warehouses and data marts where the current month or two is being analyzed most often, this means data older than 90 days. For transactional systems the archiving cutoff may be a year or two, depending on typical length of your business processes. The main idea is to keep the size production databases from inflating for no good business reason and ‘nearlining’ the data as soon as possible without interrupting business operations or hurting the value of your data. Ultimately this should work to protect the enterprise from an operational crisis arising from deteriorating performance and unmet SLAs.
In order to better judge the impact of using either of these two approaches, it is important to understand the various steps involved in the “Nearlining” process. What do we find when we “dissect” the process of leveraging the Informatica ILM Nearline?
Dissecting the “Informatica ILM Nearline” Process
Informatica Informatica ILM Nearline involves multiple processes, whose performance characteristics can significantly influence the speed at which data is migrated out of the online database. The various processes are managed by the overall integrated nearline solution of Informatica coupled with a SAP Business Warehouse system:
- The first step is to lock the data that is targeted by the archiving process, in order to ensure that the data is not modified while the process is going on. SAP Business Warehouse does it automatically and you execute Data Archive Processes (DAP) for the cold data.
- Next comes the extraction of the data to be migrated. This is usually achieved via an SQL statement based on business rules for data migration. Often, the extraction can be performed using multiple extraction/consumer processes working in parallel.
- The next step is to secure the newly extracted data, so that it is recoverable.
- Then, the integrity of the extracted data must be validated (normally by comparing it to its online counterpart).
- Next, delete the online data that has been moved to nearline.
- Then, reorganize the tablespace of the deleted data.
- Finally, rebuild/reorganize the index associated with the online table from which data has been nearlined.
The Database Housekeeping process is often the slowest part of a Data Nearlining process, and thus can dictate the pace and scheduling of the implementation. In a production environment, the database housekeeping process is frequently decoupled from ongoing operations and performed over a weekend. It may be surprising to learn that deleting data can be a more expensive process than inserting it, but just ask an enterprise DBA about what is involved in deleting 1 TB from an Enterprise Data Warehouse and see what answer you get: for many, the task of fitting such a process into standard Batch Windows would be a nightmare.
So, it is easy to see that starting earlier in implementing Informatica ILM Nearline as a best practice can help to massively reduce not only the cost of the implementation, but also the time required to perform it. Therefore, the main recommendation to take away from this discussion is: Don’t wait too long to consider embarking on your Informatica ILM Nearline strategy!
That’s it for today. In my next post, I will take up the topic of which data should be initially considered as a candidate for migration.