Category Archives: Business Impact / Benefits
This got me thinking: What is the biggest bottleneck in the delivery of business value today? I know I look at things from a data perspective, but data is the biggest bottleneck. Consider this prediction from Gartner:
“Gartner predicts organizations will spend one-third more on app integration in 2016 than they did in 2013. What’s more, by 2018, more than half the cost of implementing new large systems will be spent on integration. “
When we talk about application integration, we’re talking about moving data, synchronizing data, cleansing, data, transforming data, testing data. The question for architects and senior management is this: Do you have the Data Foundation for Execution you need to drive the business results you require to compete? The answer, unfortunately, for most companies is; No.
All too often data management is an add-on to larger application-based projects. The result is unconnected and non-interoperable islands of data across the organization. That simply is not going to work in the coming competitive environment. Here are a couple of quick examples:
- Many companies are looking to compete on their use of analytics. That requires collecting, managing, and analyzing data from multiple internal and external sources.
- Many companies are focusing on a better customer experience to drive their business. This again requires data from many internal sources, plus social, mobile and location-based data to be effective.
When I talk to architects about the business risks of not having a shared data architecture, and common tools and practices for enterprise data management, they “get” the problem. So why aren’t they addressing it? The issue is that they find that they are only funded to do the project they are working on and are dealing with very demanding timeframe requirements. They have no funding or mandate to solve the larger enterprise data management problem, which is getting more complex and brittle with each new un-connected project or initiative that is added to the pile.
Studies such as “The Data Directive” by The Economist show that organizations that actively manage their data are more successful. But, if that is the desired future state, how do you get there?
Changing an organization to look at data as the fuel that drives strategy takes hard work and leadership. It also takes a strong enterprise data architecture vision and strategy. For fresh thinking on the subject of building a data foundation for execution, see “Think Data-First to Drive Business Value” from Informatica.
* By the way, Informatica is proud to announce that we are now a sponsor of the MIT Center for Information Systems Research.
Last time I talked about how benchmark data can be used in IT and business use cases to illustrate the financial value of data management technologies. This time, let’s look at additional use cases, and at how to philosophically interpret the findings.
So here are some additional areas of investigation for justifying a data quality based data management initiative:
- Compliance or any audits data and report preparation and rebuttal (FTE cost as above)
- Excess insurance premiums on incorrect asset or party information
- Excess tax payments due to incorrect asset configuration or location
- Excess travel or idle time between jobs due to incorrect location information
- Excess equipment downtime (not revenue generating) or MTTR due to incorrect asset profile or misaligned reference data not triggering timely repairs
- Equipment location or ownership data incorrect splitting service cost or revenues incorrectly
- Party relationship data not tied together creating duplicate contacts or less relevant offers and lower response rates
- Lower than industry average cross-sell conversion ratio due to inability to match and link departmental customer records and underlying transactions and expose them to all POS channels
- Lower than industry average customer retention rate due to lack of full client transactional profile across channels or product lines to improve service experience or apply discounts
- Low annual supplier discounts due to incorrect or missing alternate product data or aggregated channel purchase data
I could go on forever, but allow me to touch on a sensitive topic – fines. Fines, or performance penalties by private or government entities, only make sense to bake into your analysis if they happen repeatedly in fairly predictable intervals and are “relatively” small per incidence. They should be treated like M&A activity. Nobody will buy into cost savings in the gazillions if a transaction only happens once every ten years. That’s like building a business case for a lottery win or a life insurance payout with a sample size of a family. Sure, if it happens you just made the case but will it happen…soon?
Use benchmarks and ranges wisely but don’t over-think the exercise either. It will become paralysis by analysis. If you want to make it super-scientific, hire an expensive consulting firm for a 3 month $250,000 to $500,000 engagement and have every staffer spend a few days with them away from their day job to make you feel 10% better about the numbers. Was that worth half a million dollars just in 3rd party cost? You be the judge.
In the end, you are trying to find out and position if a technology will fix a $50,000, $5 million or $50 million problem. You are also trying to gauge where key areas of improvement are in terms of value and correlate the associated cost (higher value normally equals higher cost due to higher complexity) and risk. After all, who wants to stand before a budget committee, prophesy massive savings in one area and then fail because it would have been smarter to start with something simpler and quicker win to build upon?
The secret sauce to avoiding this consulting expense and risk is a natural curiosity, willingness to do the legwork of finding industry benchmark data, knowing what goes into them (process versus data improvement capabilities) to avoid inappropriate extrapolation and using sensitivity analysis to hedge your bets. Moreover, trust an (internal?) expert to indicate wider implications and trade-offs. Most importantly, you have to be a communicator willing to talk to many folks on the business side and have criminal interrogation qualities, not unlike in your run-of-the-mill crime show. Some folks just don’t want to talk, often because they have ulterior motives (protecting their legacy investment or process) or hiding skeletons in the closet (recent bad performance). In this case, find more amenable people to quiz or pry the information out of these tough nuts, if you can.
Lastly; if you find ROI numbers, which appear astronomical at first, remember that leverage is a key factor. If a technical capability touches one application (credit risk scoring engine), one process (quotation), one type of transaction (talent management self-service), a limited set of people (procurement), the ROI will be lower than a technology touching multiple of each of the aforementioned. If your business model drives thousands of high-value (thousands of dollars) transactions versus ten twenty-million dollar ones or twenty-million one-dollar ones, your ROI will be higher. After all, consider this; retail e-mail marketing campaigns average an ROI of 578% (softwareprojects.com) and this with really bad data. Imagine what improved data can do just on that front.
I found massive differences between what improved asset data can deliver in a petrochemical or utility company versus product data in a fashion retailer or customer (loyalty) data in a hospitality chain. The assertion of cum hoc ergo propter hoc is a key assumption how technology delivers financial value. As long as the business folks agree or can fence in the relationship, you are on the right path.
What’s your best and worst job to justify someone giving you money to invest? Share that story.
Earlier this month, CNBC.com published its first ever R&D All-Stars: CNBC RQ 50, ranking the top 50 public companies by return on research and development investment. Coming in the top ten, and the first pure software play was Informatica, mentioned as first among great software companies like Google, Amazon, and Salesforce. CNBC.com is referencing a companion article by David Spiegel – Boring stocks that generate R&D heat-and profits. The article made an excellent point: When R&D productivity links R&D spending to corporate revenue growth and market value, it is a better gauge of the productivity of that spending.
Unlike other R&D lists or rankings, the RQ50 was less concerned with pure dollars than what the company actually did with it. The RQ50 measures increase in revenue as it relates to increase in R&D expenditures. Its methodology was provided by Professor Anne Marie Knott, of Washington University in St. Louis, who tracks and studies corporate R&D investment, and has found that the companies that regularly turn R&D into income typically place innovation at the forefront of the corporate mission and have a structure and culture that support it.
Informatica is on the list because its revenue gains between 2006 and 2013 correlate directly with its increased R&D investment over the same period. While the list specifically cites the 2013 figures, the result is due to a systematic and long-term strategic initiative to place innovation at the core of our business plan.
Informatica has innovated broadly across its product spectrum. I can personally speak to one area where it has invested smartly and made significant gains – Informatica Cloud. Informatica decided to make its initial investment in the cloud in 2006 and was early in the market with regards to cloud integration. In fact, back in 2006, very few of today’s well-known SaaS companies were even publicly traded. The most popular SaaS app today, Salesforce.com had revenues of just $309 million in FY2006 compared with over $4 billion in FY2014. Amazon EC2, one of the core services of Amazon Web Services (AWS) itself had only been announced in that year. Apart from EC2, Amazon only had six other services in 2006. In 2014, that number has ballooned to over 30.
In his article about the RQ50, Spiegel talks about how the companies on the list aren’t just listening to what customers want or need now. They’re also challenging themselves to come up with the things the market can use two or ten years into the future. In 2006, Informatica took the same approach with its initial investment in cloud integration.
For us, it started with an observation and then a commitment to the belief that we were at an inflection point with the cloud, and on the cusp of what was going to become a true megatrend that represented a huge opportunity for the integration industry. Informatica assembled a small, agile group made up of strong leaders with varying skills and experience pulled from different areas—sales, engineering, and product management — throughout the company. It also meant throwing away the traditional measures of success and identifying new and more appropriate metrics to benchmark our progress. And finally, it included partnering with like-minded companies like Salesforce and NetSuite initially, and later on with Amazon, and taking our core strength – on-premise data integration technology – and pivoting it into a new direction.
The result was the first iteration of the Informatica Cloud. It leveraged the fruit of our R&D investment – the Vibe Virtual Data Machine – to provide SaaS administrators and line of business IT with the ability to perform lightweight cloud integrations between their on-premise and cloud applications without the involvement of an integration developer. Subsequent work and innovation have continued along the same path, adding tools like drag-and-drop design interfaces and mapping wizards, with the end goal of giving line-of-business (LOB) IT, cloud application administrators and citizen integrators a single platform to perform all the integration patterns they require, on their timeline. Informatica Cloud has consistently delivered 2-3 releases every year, and is now already on Release 20. From originally starting out with Data Replication for Salesforce, the Cloud team added bigger and better functionality such as developing connectivity for over 100 applications and data protocols, opening up our integration services through REST APIs, going beyond integration by incorporating cloud master data management and cloud test data management capabilities, and most recently announcing optimized batch and real-time cloud integration under a single unified platform.
And it goes on to this day, with investments in new innovations and directions, like Informatica Project Springbok. With Project Springbok, we’re duplicating what we did with Informatica Cloud but this time for citizen integrators. We’re using our vast experiences working with customers and building cutting-edge technology IP over the last 20 years and enabling citizen integrators to harmonize data faster for better insights (and hopefully, less late nights writing spreadsheet formulas). What we do after Project Springbok is anyone’s guess, but wherever that is, it will be sure to put us on lists like the RQ 50 for some time to come.
Get connected. Be connected. Make connections. Find connections. The Internet of Things (IoT) is all about connecting people, processes, data and, as the name suggests, things. The recent social media frenzy surrounding the ALS Ice Bucket Challenge has certainly reminded everyone of the power of social media, the Internet and a willingness to answer a challenge. Fueled by personal and professional connections, the craze has transformed fund raising for at least one charity. Similarly, IoT may potentially be transformational to the business of the public sector, should government step up to the challenge.
Government is struggling with the concept and reality of how IoT really relates to the business of government, and perhaps rightfully so. For commercial enterprises, IoT is far more tangible and simply more fun. Gaming, televisions, watches, Google glasses, smartphones and tablets are all about delivering over-the-top, new and exciting consumer experiences. Industry is delivering transformational innovations, which are connecting people to places, data and other people at a record pace.
It’s time to accept the challenge. Government agencies need to keep pace with their commercial counterparts and harness the power of the Internet of Things. The end game is not to deliver new, faster, smaller, cooler electronics; the end game is to create solutions that let devices connecting to the Internet interact and share data, regardless of their location, manufacturer or format and make or find connections that may have been previously undetectable. For some, this concept is as foreign or scary as pouring ice water over their heads. For others, the new opportunity to transform policy, service delivery, leadership, legislation and regulation is fueling a transformation in government. And it starts with one connection.
One way to start could be linking previously siloed systems together or creating a golden record of all citizen interactions through a Master Data Management (MDM) initiative. It could start with a big data and analytics project to determine and mitigate risk factors in education or linking sensor data across multiple networks to increase intelligence about potential hacking or breaches. Agencies could stop waste, fraud and abuse before it happens by linking critical payment, procurement and geospatial data together in real time.
This is the Internet of Things for government. This is the challenge. This is transformation.
You probably know this already, but I’m going to say it anyway: It’s time you changed your infrastructure. I say this because most companies are still running infrastructure optimized for ERP, CRM and other transactional systems. That’s all well and good for running IT-intensive, back-office tasks. Unfortunately, this sort of infrastructure isn’t great for today’s business imperatives of mobility, cloud computing and Big Data analytics.
Virtually all of these imperatives are fueled by information gleaned from potentially dozens of sources to reveal our users’ and customers’ activities, relationships and likes. Forward-thinking companies are using such data to find new customers, retain existing ones and increase their market share. The trick lies in translating all this disparate data into useful meaning. And to do that, IT needs to move beyond focusing solely on transactions, and instead shine a light on the interactions that matter to their customers, their products and their business processes.
They need what we at Informatica call a “Data First” perspective. You can check out my first blog first about being Data First here.
A Data First POV changes everything from product development, to business processes, to how IT organizes itself and —most especially — the impact IT has on your company’s business. That’s because cloud computing, Big Data and mobile app development shift IT’s responsibilities away from running and administering equipment, onto aggregating, organizing and improving myriad data types pulled in from internal and external databases, online posts and public sources. And that shift makes IT a more-empowering force for business change. Think about it: The ability to connect and relate the dots across data from multiple sources finally gives you real power to improve entire business processes, departments and organizations.
I like to say that the role of IT is now “big I, little t,” with that lowercase “t” representing both technology and transactions. But that role requires a new set of priorities. They are:
- Think about information infrastructure first and application infrastructure second.
- Create great data by design. Architect for connectivity, cleanliness and security. Check out the eBook Data Integration for Dummies.
- Optimize for speed and ease of use – SaaS and mobile applications change often. Click here to try Informatica Cloud for free for 30 days.
- Make data a team sport. Get tools into your users’ hands so they can prepare and interact with it.
I never said this would be easy, and there’s no blueprint for how to go about doing it. Still, I recognize that a little guidance will be helpful. In a few weeks, Informatica’s CIO Eric Johnson and I will talk about how we at Informatica practice what we preach.
Malcolm Gladwell wrote an article in The New Yorker magazine in January, 2007 entitled “Open Secrets.” In the article, he pointed out that a national-security expert had famously made a distinction between puzzles and mysteries.
Osama bin Laden’s whereabouts were, for many years, a puzzle. We couldn’t find him because we didn’t have enough information. The key to the puzzle, it was assumed, would eventually come from someone close to bin Laden, and until we could find that source, bin Laden would remain at large. In fact, that’s precisely what happened. Al-Qaida’s No. 3 leader, Khalid Sheikh Mohammed, gave authorities the nicknames of one of bin Laden’s couriers, who then became the linchpin to the CIA’s efforts to locate Bin Laden.
By contrast, the problem of what would happen in Iraq after the toppling of Saddam Hussein was a mystery. It wasn’t a question that had a simple, factual answer. Mysteries require judgments and the assessment of uncertainty, and the hard part is not that we have too little information but that we have too much.
This was written before “Big Data” was a household word and it begs the very interesting question of whether organizations and corporations that are, by anyone’s standards, totally deluged with data, are facing puzzles or mysteries. Consider the amount of data that a company like Western Union deals with.
Western Union is a 160-year old company. Having built scale in the money transfer business, the company is in the process of evolving its business model by enabling the expansion of digital products, growth of web and mobile channels, and a more personalized online customer experience. Sounds good – but get this: the company processes more than 29 transactions per seconds on average. That’s 242 million consumer-to-consumer transactions and 459 million business payments in a year. Nearly a billion transactions – a billion! As my six-year-old might say, that number is big enough “to go to the moon and back.” Layer on top of that the fact that the company operates in 200+ countries and territories, and conducts business in 120+ currencies. Senior Director and Head of Engineering Abhishek Banerjee has said, “The data is speaking to us. We just need to react to it.” That implies a puzzle, not a mystery – but only if data scientists are able to conduct statistical modeling and predictive analysis, systematically noting trends in sending and receiving behaviors. Check out what Banerjee and Western Union CTO Sanjay Saraf have to say about it here.
Or consider General Electric’s aggressive and pioneering move into what’s dubbed as the industrial internet. In a white paper entitled “The Case for an Industrial Big Data Platform: Laying the Groundwork for the New Industrial Age,” GE reveals some of the staggering statistics related to the industrial equipment that it manufactures and supports (services comprise 75% of GE’s bottom line):
- A modern wind turbine contains approximately 50 sensors and control loops which collect data every 40 milliseconds.
- A farm controller then receives more than 30 signals from each turbine at 160-millisecond intervals.
- At every one-second interval, the farm monitoring software processes 200 raw sensor data points with various associated properties with each turbine.
Phew! I’m no electricity operations expert, and you probably aren’t either. And most of us will get no further than simply wrapping our heads around the simple fact that GE turbines are collecting a LOT of data. But what the paper goes on to say should grab your attention in a big way: “The key to success for this wind farm lies in the ability to collect and deliver the right data, at the right velocity, and in the right quantities to a wide set of well-orchestrated analytics.” And the paper goes on to recommend that anyone involved in the Industrial Internet revolution strongly consider its talent requirements, with the suggestion that Chief Data officers and/or Data Scientists may be the next critical hires.
Which brings us back to Malcolm Gladwell. In the aforementioned article, Gladwell goes on to pull apart the Enron debacle, and argues that it was a prime example of the perils of too much information. “If you sat through the trial of (former CEO) Jeffrey Skilling, you’d think that the Enron scandal was a puzzle. The company, the prosecution said, conducted shady side deals that no one quite understood. Senior executives withheld critical information from investors…We were not told enough—the classic puzzle premise—was the central assumption of the Enron prosecution.” But in fact, that was not true. Enron employed complicated – but perfectly legal–accounting techniques used by companies that engage in complicated financial trading. Many journalists and professors have gone back and looked at the firm’s regulatory filings, and have come to the conclusion that, while complex and difficult to identify, all of the company’s shenanigans were right there in plain view. Enron cannot be blamed for covering up the existence of its side deals. It didn’t; it disclosed them. As Gladwell summarizes:
“Puzzles are ‘transmitter-dependent’; they turn on what we are told. Mysteries are ‘receiver dependent’; they turn on the skills of the listener.”
I would argue that this extremely complex, fast moving and seismic shift that we call Big Data will favor those who have developed the ability to attune, to listen and make sense of the data. Winners in this new world will recognize what looks like an overwhelming and intractable mystery, and break that mystery down into small and manageable chunks and demystify the landscape, to uncover the important nuggets of truth and significance.
A mid-sized insurer recently approached our team for help. They wanted to understand how they fell short in making their case to their executives. Specifically, they proposed that fixing their customer data was key to supporting the executive team’s highly aggressive 3-year growth plan. (This plan was 3x today’s revenue). Given this core organizational mission – aside from being a warm and fuzzy place to work supporting its local community – the slam dunk solution to help here is simple. Just reducing the data migration effort around the next acquisition or avoiding the ritual annual, one-off data clean-up project already pays for any tool set enhancing data acquisitions, integration and hygiene. Will it get you to 3x today’s revenue? It probably won’t. What will help are the following:
Hard cost avoidance via software maintenance or consulting elimination is the easy part of the exercise. That is why CFOs love it and focus so much on it. It is easy to grasp and immediate (aka next quarter).
Soft cost reduction, like staff redundancies are a bit harder. Despite them being viable, in my experience very few decision makers want work on a business case to lay off staff. My team had one so far. They look at these savings as freed up capacity, which can be re-deployed more productively. Productivity is also a bit harder to quantify as you typically have to understand how data travels and gets worked on between departments.
However, revenue effects are even harder and esoteric to many people as they include projections. They are often considered “soft” benefits, although they outweigh the other areas by 2-3 times in terms of impact. Ultimately, every organization runs their strategy based on projections (see the insurer in my first paragraph).
The hardest to quantify is risk. Not only is it based on projections – often from a third party (Moody’s, TransUnion, etc.) – but few people understand it. More often, clients don’t even accept you investigating this area if you don’t have an advanced degree in insurance math. Nevertheless, risk can generate extra “soft” cost avoidance (beefing up reserve account balance creating opportunity cost) but also revenue (realizing a risk premium previously ignored). Often risk profiles change due to relationships, which can be links to new “horizontal” information (transactional attributes) or vertical (hierarchical) from parent-child relationships of an entity and the parent’s or children’s transactions.
Given the above, my initial advice to the insurer would be to look at the heartache of their last acquisition, use a benchmark for IT productivity from improved data management capabilities (typically 20-26% – Yankee Group) and there you go. This is just the IT side so consider increasing the upper range by 1.4x (Harvard Business School) as every attribute change (last mobile view date) requires additional meetings on a manager, director and VP level. These people’s time gets increasingly more expensive. You could also use Aberdeen’s benchmark of 13hrs per average master data attribute fix instead.
You can also look at productivity areas, which are typically overly measured. Let’s assume a call center rep spends 20% of the average call time of 12 minutes (depending on the call type – account or bill inquiry, dispute, etc.) understanding
- Who the customer is
- What he bought online and in-store
- If he tried to resolve his issue on the website or store
- How he uses equipment
- What he cares about
- If he prefers call backs, SMS or email confirmations
- His response rate to offers
- His/her value to the company
If he spends these 20% of every call stringing together insights from five applications and twelve screens instead of one frame in seconds, which is the same information in every application he touches, you just freed up 20% worth of his hourly compensation.
Then look at the software, hardware, maintenance and ongoing management of the likely customer record sources (pick the worst and best quality one based on your current understanding), which will end up in a centrally governed instance. Per DAMA, every duplicate record will cost you between $0.45 (party) and $0.85 (product) per transaction (edit touch). At the very least each record will be touched once a year (likely 3-5 times), so multiply your duplicated record count by that and you have your savings from just de-duplication. You can also use Aberdeen’s benchmark of 71 serious errors per 1,000 records, meaning the chance of transactional failure and required effort (% of one or more FTE’s daily workday) to fix is high. If this does not work for you, run a data profile with one of the many tools out there.
If standardization of records (zip codes, billing codes, currency, etc.) is the problem, ask your business partner how many customer contacts (calls, mailing, emails, orders, invoices or account statements) fail outright and/or require validation because of these attributes. Once again, if you apply the productivity gains mentioned earlier, there are you savings. If you look at the number of orders that get delayed in form of payment or revenue recognition and the average order amount by a week or a month, you were just able to quantify how much profit (multiply by operating margin) you would be able to pull into the current financial year from the next one.
The same is true for speeding up the introduction or a new product or a change to it generating profits earlier. Note that looking at the time value of funds realized earlier is too small in most instances especially in the current interest environment.
If emails bounce back or snail mail gets returned (no such address, no such name at this address, no such domain, no such user at this domain), e(mail) verification tools can help reduce the bounces. If every mail piece (forget email due to the miniscule cost) costs $1.25 – and this will vary by type of mailing (catalog, promotion post card, statement letter), incorrect or incomplete records are wasted cost. If you can, use fully loaded print cost incl. 3rd party data prep and returns handling. You will never capture all cost inputs but take a conservative stab.
If it was an offer, reduced bounces should also improve your response rate (also true for email now). Prospect mail response rates are typically around 1.2% (Direct Marketing Association), whereas phone response rates are around 8.2%. If you know that your current response rate is half that (for argument sake) and you send out 100,000 emails of which 1.3% (Silverpop) have customer data issues, then fixing 81-93% of them (our experience) will drop the bounce rate to under 0.3% meaning more emails will arrive/be relevant. This in turn multiplied by a standard conversion rate (MarketingSherpa) of 3% (industry and channel specific) and average order (your data) multiplied by operating margin gets you a benefit value for revenue.
If product data and inventory carrying cost or supplier spend are your issue, find out how many supplier shipments you receive every month, the average cost of a part (or cost range), apply the Aberdeen master data failure rate (71 in 1,000) to use cases around lack of or incorrect supersession or alternate part data, to assess the value of a single shipment’s overspend. You can also just use the ending inventory amount from the 10-k report and apply 3-10% improvement (Aberdeen) in a top-down approach. Alternatively, apply 3.2-4.9% to your annual supplier spend (KPMG).
You could also investigate the expediting or return cost of shipments in a period due to incorrectly aggregated customer forecasts, wrong or incomplete product information or wrong shipment instructions in a product or location profile. Apply Aberdeen’s 5% improvement rate and there you go.
Consider that a North American utility told us that just fixing their 200 Tier1 suppliers’ product information achieved an increase in discounts from $14 to $120 million. They also found that fixing one basic out of sixty attributes in one part category saves them over $200,000 annually.
So what ROI percentages would you find tolerable or justifiable for, say an EDW project, a CRM project, a new claims system, etc.? What would the annual savings or new revenue be that you were comfortable with? What was the craziest improvement you have seen coming to fruition, which nobody expected?
Next time, I will add some more “use cases” to the list and look at some philosophical implications of averages.
This post was written by guest author Dale Kim, Director of Industry Solutions at MapR Technologies, a valued Informatica partner that provides a distribution for Apache Hadoop that ensures production success for its customers.
Apache Hadoop is growing in popularity as the foundation for an enterprise data hub. An Enterprise Data Hub (EDH) extends and optimizes the traditional data warehouse model by adding complementary big data technologies. It focuses your data warehouse on high value data by reallocating less frequently used data to an alternative platform. It also aggregates data from previously untapped sources to give you a more complete picture of data.
So you have your data, your warehouses, your analytical tools, your Informatica products, and you want to deploy an EDH… now what about Hadoop?
Requirements for Hadoop in an Enterprise Data Hub
Let’s look at characteristics required to meet your EDH needs for a production environment:
You already expect these from your existing enterprise deployments. Shouldn’t you hold Hadoop to the same standards? Let’s discuss each topic:
Enterprise-grade is about the features that keep a system running, i.e., high availability (HA), disaster recovery (DR), and data protection. HA helps a system run even when components (e.g., computers, routers, power supplies) fail. In Hadoop, this means no downtime and no data loss, but also no work loss. If a node fails, you still want jobs to run to completion. DR with remote replication or mirroring guards against site-wide disasters. Mirroring needs to be consistent to ensure recovery to a known state. Using file copy tools won’t cut it. And data protection, using snapshots, lets you recover from data corruption, especially from user or application errors. As with DR replicas, snapshots must be consistent, in that they must reflect the state of the data at the time the snapshot was taken. Not all Hadoop distributions can offer this guarantee.
Hadoop interoperability is an obvious necessity. Features like a POSIX-compliant, NFS-accessible file system let you reuse existing, file system-based applications on Hadoop data. Support for existing tools lets your developers get up to speed quickly. And integration with REST APIs enables easy, open connectivity with other systems.
You should be able to logically divide clusters to support different use cases, job types, user group, and administrators as needed. To avoid a complex, multi-cluster setup, choose a Hadoop distribution with multi-tenancy capabilities to simplify the architecture. This gives you less risk for error and no data/effort duplication.
Security should be a priority to protect against the exposure of confidential data. You should assess how you’ll handle authentication (with or without Kerberos), authorization (access controls), over-the-network encryption, and auditing. Many of these features should be native to your Hadoop distribution, and there are also strong security vendors that provide technologies for securing Hadoop.
Any large scale deployment needs fast read, write, and update capabilities. Hadoop can support the operational requirements of an EDH with integrated, in-Hadoop databases like Apache HBase™ and Accumulo™, as well as MapR-DB (the MapR NoSQL database). This in-Hadoop model helps to simplify the overall EDH architecture.
Using Hadoop as a foundation for an EDH is a powerful option for businesses. Choosing the correct Hadoop distribution is the key to deploying a successful EDH. Be sure not to take shortcuts – especially in a production environment – as you will want to hold your Hadoop platform to the same high expectations you have of your existing enterprise systems.
“Inaccurate, inconsistent and disconnected supplier information prohibits us from doing accurate supplier spend analysis, leveraging discounts, comparing and choosing the best prices, and enforcing corporate standards.”
This is quotation from a manufacturing company executive. It illustrates the negative impact that poorly managed supplier information can have on a company’s ability to cut costs and achieve revenue targets.
Many supply chain and procurement teams at large companies struggle to see the total relationship they have with suppliers across product lines, business units and regions. Why? Supplier information is scattered across dozens or hundreds of Enterprise Resource Planning (ERP) and Accounts Payable (AP) applications. Too much valuable time is spent manually reconciling inaccurate, inconsistent and disconnected supplier information in an effort to see the big picture. All this manual effort results in back office administrative costs that are higher than they should be.
Do these quotations from supply chain leaders and their teams sound familiar?
“We have 500,000 suppliers. 15-20% of our supplier records are duplicates. 5% are inaccurate.”
“I get 100 e-mails a day questioning which supplier to use.”
“To consolidate vendor reporting for a single supplier between divisions is really just a guess.”
“Every year 1099 tax mailings get returned to us because of invalid addresses, and we play a lot of Schedule B fines to the IRS.”
“Two years ago we spent a significant amount of time and money cleansing supplier data. Now we are back where we started.”
Please join me and Naveen Sharma, Director of the Master Data Management (MDM) Practice at Cognizant for a Webinar, Supercharge Your Supply Chain Applications with Better Supplier Information, on Tuesday, July 29th at 11 am PT.
During the Webinar, we’ll explain how better managing supplier information can help you achieve the following goals:
- Accelerate supplier onboarding
- Mitiate the risk of supply disruption
- Better manage supplier performance
- Streamline billing and payment processes
- Improve supplier relationship management and collaboration
- Make it easier to evaluate non-compliance with Service Level Agreements (SLAs)
- Decrease costs by negotiating favorable payment terms and SLAs
I hope you can join us for this upcoming Webinar!
By now, the business benefits of effectively leveraging big data have become well known. Enhanced analytical capabilities, greater understanding of customers, and ability to predict trends before they happen are just some of the advantages. But big data doesn’t just appear and present itself. It needs to be made tangible to the business. All too often, executives are intimidated by the concept of big data, thinking the only way to work with it is to have an advanced degree in statistics.
There are ways to make big data more than an abstract concept that can only be loved by data scientists. Four of these ways were recently covered in a report by David Stodder, director of business intelligence research for TDWI, as part of TDWI’s special report on What Works in Big Data.
The time is ripe for experimentation with real-time, interactive analytics technologies, Stodder says. The next major step in the movement toward big data is enabling real-time or near-real-time delivery of information. Real-time data has been a challenge with BI data for years, with limited success, Stodder says. The good news is that Hadoop framework, originally built for batch processing, now includes interactive querying and streaming applications, he reports. This opens the way for real-time processing of big data.
Design for self-service
Interest in self-service access to analytical data continues to grow. “Increasing users’ self-reliance and reducing their dependence on IT are broadly shared goals,” Stodder says. “Nontechnical users—those not well versed in writing queries or navigating data schemas—are requesting to do more on their own.” There is an impressive array of self-service tools and platforms now appearing on the market. “Many tools automate steps for underlying data access and integration, enabling users to do more source selection and transformation on their own, including for data from Hadoop files,” he says. “In addition, new tools are hitting the market that put greater emphasis on exploratory analytics over traditional BI reporting; these are aimed at the needs of users who want to access raw big data files, perform ad-hoc requests routinely, and invoke transformations after data extraction and loading (that is, ELT) rather than before.”
Nothing gets a point across faster than having data points visually displayed – decision-makers can draw inferences within seconds. “Data visualization has been an important component of BI and analytics for a long time, but it takes on added significance in the era of big data,” Stodder says. “As expressions of meaning, visualizations are becoming a critical way for users to collaborate on data; users can share visualizations linked to text annotations as well as other types of content, such as pictures, audio files, and maps to put together comprehensive, shared views.”
Unify views of data
Users are working with many different data types these days, and are looking to bring this information into a single view – “rather than having to move from one interface to another to view data in disparate silos,” says Stodder. Unstructured data – graphics and video files – can also provide a fuller context to reports, he adds.