Category Archives: Big Data

If Data Projects Weather, Why Not Corporate Revenue?

Every fall Informatica sales leadership puts together its strategy for the following year.  The revenue target is typically a function of the number of sellers, the addressable market size and key accounts in a given territory, average spend and conversion rate given prior years’ experience, etc.  This straight forward math has not changed in probably decades, but it assumes that the underlying data are 100% correct. This data includes:

  • Number of accounts with a decision-making location in a territory
  • Related IT spend and prioritization
  • Organizational characteristics like legal ownership, industry code, credit score, annual report figures, etc.
  • Key contacts, roles and sentiment
  • Prior interaction (campaign response, etc.) and transaction (quotes, orders, payments, products, etc.) history with the firm

Every organization, no matter if it is a life insurer, a pharmaceutical manufacturer, a fashion retailer or a construction company knows this math and plans on getting somewhere above 85% achievement of the resulting target.  Office locations, support infrastructure spend, compensation and hiring plans are based on this and communicated.

data revenue

We Are Not Modeling the Global Climate Here

So why is it that when it is an open secret that the underlying data is far from perfect (accurate, current and useful) and corrupts outcomes, too few believe that fixing it has any revenue impact?  After all, we are not projecting the climate for the next hundred years here with a thousand plus variables.

If corporate hierarchies are incorrect, your spend projections based on incorrect territory targets, credit terms and discount strategy will be off.  If every client touch point does not have a complete picture of cross-departmental purchases and campaign responses, your customer acquisition cost will be too high as you will contact the wrong prospects with irrelevant offers.  If billing, tax or product codes are incorrect, your billing will be off.  This is a classic telecommunication example worth millions every month.  If your equipment location and configuration is wrong, maintenance schedules will be incorrect and every hour of production interruption will cost an industrial manufacturer of wood pellets or oil millions.

Also, if industry leaders enjoy an upsell ratio of 17%, and you experience 3%, data (assuming you have no formal upsell policy as it violates your independent middleman relationship) data will have a lot to do with it.

The challenge is not the fact that data can create revenue improvements but how much given the other factors: people and process.

Every industry laggard can identify a few FTEs who spend 25% of their time putting one-off data repositories together for some compliance, M&A customer or marketing analytics.  Organic revenue growth from net-new or previously unrealized revenue is what the focus of any data management initiative should be.  Don’t get me wrong; purposeful recruitment (people), comp plans and training (processes) are important as well.  Few people doubt that people and process drives revenue growth.  However, few believe data being fed into these processes has an impact.

This is a head scratcher for me. An IT manager at a US upstream oil firm once told me that it would be ludicrous to think data has a revenue impact.  They just fixed data because it is important so his consumers would know where all the wells are and which ones made a good profit.  Isn’t that assuming data drives production revenue? (Rhetorical question)

A CFO at a smaller retail bank said during a call that his account managers know their clients’ needs and history. There is nothing more good data can add in terms of value.  And this happened after twenty other folks at his bank including his own team delivered more than ten use cases, of which three were based on revenue.

Hard cost (materials and FTE) reduction is easy, cost avoidance a leap of faith to a degree but revenue is not any less concrete; otherwise, why not just throw the dice and see how the revenue will look like next year without a central customer database?  Let every department have each account executive get their own data, structure it the way they want and put it on paper and make hard copies for distribution to HQ.  This is not about paper versus electronic but the inability to reconcile data from many sources on paper, which is a step above electronic.

Have you ever heard of any organization move back to the Fifties and compete today?  That would be a fun exercise.  Thoughts, suggestions – I would be glad to hear them?

FacebookTwitterLinkedInEmailPrintShare
Posted in Banking & Capital Markets, Big Data, Business Impact / Benefits, Business/IT Collaboration, Data Governance, Data Integration, Data Quality, Data Warehousing, Enterprise Data Management, Governance, Risk and Compliance, Master Data Management, Product Information Management | Tagged , | Leave a comment

Has Hadoop Crossed The Chasm? Thoughts About Strata 2014

Well, it’s been a little over a week since the Strata conference so I thought I should give some perspective on what I learned.  I think it was summed up at my first meeting, on the first morning of the conference. The meeting was with a financial services company who has significance experience with Hadoop. The first words out of their mouths were, “Hadoop is hard.” 

Later in the conference, after a Western Union representative spoke about their Hadoop deployment, they were mobbed by end user questions and comments. The audience was thrilled to hear about an actual operational deployment: Not just a sandbox deployment, but an actual operational Hadoop deployment from a company that is over 160 years old.

The market is crossing the chasm from early adopters who love to hand code (and the macho culture of proving they can do the hard stuff) to more mainstream companies that want to use technology to solve real problems. These mainstream companies aren’t afraid to admit that it is still hard. For the early adopters, nothing is ever hard. They love hard. But the mainstream market doesn’t view it that way.  They don’t want to mess around in the bowels of enabling technology.  They want to use the technology to solve real problems.  The comment from the financial services company represents the perspective of the vast majority of organizations. It is a sign Hadoop is hitting the mainstream market.

More proof we have moved to a new phase?  Cloudera announced they were going from shipping six versions a year down to just three.  I have been saying for awhile that we will know that Hadoop is real when the distribution vendors stop shipping every 2 months and go to a more typical enterprise software release schedule.  It isn’t that Hadoop engineering efforts have slowed down.  It is still evolving very rapidly.  It is just that real customers are telling the Hadoop suppliers that they won’t upgrade as fast because they have real business projects running and they can’t do it.  So for those of you who are disappointed by the “slow down,” don’t be.  To me, this is news that Hadoop is reaching critical mass.

Technology is closing the gap to allow organizations to use Hadoop as a platform without having to actually have an army of Hadoop experts.  That is what Informatica does for data parsing, data integration,  data quality and data lineage (recent product announcement).  In fact, the number one demo at the Informatica booth at Strata was the demonstration of “end to end” data lineage for data, going from the original source all the way to how it was loaded and then transformed within Hadoop.  This is purely an enterprise-class capability that becomes more interesting and important when you actually go into true production.

Informatica’s goal is to hide the complexity of Hadoop so companies can get on with the work of using the platform with the skills they already have in house.  And from what I saw from all of the start-up companies that were doing similar things for data exploration and analytics and all the talk around the need for governance, we are finally hitting the early majority of the market.  So, for those of you who still drop down to the underlying UNIX OS that powers a Mac, the rest of us will keep using the GUI.   To the extent that there are “fit for purpose” GUIs on top of Hadoop, the technology will get used by a much larger market.

So congratulations Hadoop, you have officially crossed the chasm!

P.S. See me on theCUBE talking about a similar topic at: youtu.be/oC0_5u_0h2Q

FacebookTwitterLinkedInEmailPrintShare
Posted in Banking & Capital Markets, Big Data, Hadoop, Informatica Events | Tagged , , , | Leave a comment

Fast and Fasterer: Screaming Streaming Data on Hadoop

Hadoop

Guest Post by Dale Kim

This is a guest blog post, written by Dale Kim, Director of Product Marketing at MapR Technologies.

Recent published research shows that “faster” is better than “slower.” The point, ladies and gentlemen, is that speed, for lack of a better word, is good. But granted, you won’t always have the need for speed. My Lamborghini is handy when I need to elude the Bakersfield fuzz on I-5, but it does nothing for my Costco trips. There, I go with capacity and haul home my 30-gallon tubs of ketchup with my Ford F150. (Note: this is a fictitious example, I don’t actually own an F150.)

But if speed is critical, like in your data streaming application, then Informatica Vibe Data Stream and the MapR Distribution including Apache™ Hadoop® are the technologies to use together. But since Vibe Data Stream works with any Hadoop distribution, my discussion here is more broadly applicable. I first discussed this topic earlier this year during my presentation at Informatica World 2014. In that talk, I also briefly described architectures that include streaming components, like the Lambda Architecture and enterprise data hubs. I recommend that any enterprise architect should become familiar with these high-level architectures.

Data streaming deals with a continuous flow of data, often at a fast rate. As you might’ve suspected by now, Vibe Data Stream, based on the Informatica Ultra Messaging technology, is great for that. With its roots in high speed trading in capital markets, Ultra Messaging quickly and reliably gets high value data from point A to point B. Vibe Data Stream adds management features to make it consumable by the rest of us, beyond stock trading. Not surprisingly, Vibe Data Stream can be used anywhere you need to quickly and reliably deliver data (just don’t use it for sharing your cat photos, please), and that’s what I discussed at Informatica World. Let me discuss two examples I gave.

Large Query Support. Let’s first look at “large queries.” I don’t mean the stuff you type on search engines, which are typically no more than 20 characters. I’m referring to an environment where the query is a huge block of data. For example, what if I have an image of an unidentified face, and I want to send it to a remote facial recognition service and immediately get the identity? The image would be the query, the facial recognition system could be run on Hadoop for fast divide-and-conquer processing, and the result would be the person’s name. There are many similar use cases that could leverage a high speed, reliable data delivery system along with a fast processing platform, to get immediate answers to a data-heavy question.

Data Warehouse Onload. For another example, we turn to our old friend the data warehouse. If you’ve been following all the industry talk about data warehouse optimization, you know pumping high speed data directly into your data warehouse is not an efficient use of your high value system. So instead, pipe your fast data streams into Hadoop, run some complex aggregations, then load that processed data into your warehouse. And you might consider freeing up large processing jobs from your data warehouse onto Hadoop. As you process and aggregate that data, you create a data flow cycle where you return enriched data back to the warehouse. This gives your end users efficient analysis on comprehensive data sets.

Hopefully this stirs up ideas on how you might deploy high speed streaming in your enterprise architecture. Expect to see many new stories of interesting streaming applications in the coming months and years, especially with the anticipated proliferation of internet-of-things and sensor data.

To learn more about Vibe Data Stream you can find it on the Informatica Marketplace .


 

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Business Impact / Benefits, Data Services, Hadoop | Tagged , , , , | Leave a comment

Ebola: Why Big Data Matters

Ebola: Why Big Data Matters

Ebola: Why Big Data Matters

The Ebola virus outbreak in West Africa has now claimed more than 4,000 lives and has entered the borders of the United States. While emergency response teams, hospitals, charities, and non-governmental organizations struggle to contain the virus, could big data analytics help?

A growing number of Data Scientists believe so.

If you recall the Cholera outbreak of Haiti in 2010 after the tragic earthquake, a joint research team from Karolinska Institute in Sweden and Columbia University in the US analyzed calling data from two million mobile phones on the Digicel Haiti network. This enabled the United Nations and other humanitarian agencies to understand population movements during the relief operations and during the subsequent cholera outbreak. They could allocate resources more efficiently and identify areas at increased risk of new cholera outbreaks.

Mobile phones, widely owned even in the poorest countries in Africa. Cell phones are also a rich source of data irrespective of which region where other reliable sources are sorely lacking. Senegal’s Orange Telecom provided Flowminder, a Swedish non-profit organization, with anonymized voice and text data from 150,000 mobile phones. Using this data, Flowminder drew up detailed maps of typical population movements in the region.

Today, authorities use this information to evaluate the best places to set up treatment centers, check-posts, and issue travel advisories in an attempt to contain the spread of the disease.

The first drawback is that this data is historic. Authorities really need to be able to map movements in real time especially since people’s movements tend to change during an epidemic.

The second drawback is, the scope of data provided by Orange Telecom is limited to a small region of West Africa.

Here is my recommendation to the Centers for Disease Control and Prevention (CDC):

  1. Increase the area for data collection to the entire region of Western Africa which covers over 2.1 million cell-phone subscribers.
  2. Collect mobile phone mast activity data to pinpoint where calls to helplines are mostly coming from, draw population heat maps, and population movement. A sharp increase in calls to a helpline is usually an early indicator of an outbreak.
  3. Overlay this data over censuses data to build up a richer picture.

The most positive impact we can have is to help emergency relief organizations and governments anticipate how a disease is likely to spread. Until now, they had to rely on anecdotal information, on-the-ground surveys, police, and hospital reports.

FacebookTwitterLinkedInEmailPrintShare
Posted in B2B Data Exchange, Big Data, Business Impact / Benefits, Business/IT Collaboration, Data Governance, Data Integration | Tagged , , , , , , , | Leave a comment

Informatica’s Hadoop Connectivity Reaches for the Clouds

The Informatica Cloud team has been busy updating connectivity to Hadoop using the Cloud Connector SDK.  Updated connectors are available now for Cloudera and Hortonworks and new connectivity has been added for MapR, Pivotal HD and Amazon EMR (Elastic Map Reduce).

Informatica Cloud’s Hadoop connectivity brings a new level of ease of use to Hadoop data loading and integration.  Informatica Cloud provides a quick way to load data from popular on premise data sources and apps such as SAP and Oracle E-Business, as well as SaaS apps, such as Salesforce.com, NetSuite, and Workday, into Hadoop clusters for pilots and POCs.  Less technical users are empowered to contribute to enterprise data lakes through the easy-to-use Informatica Cloud web user interface.

Hadoop

Informatica Cloud’s rich connectivity to a multitude of SaaS apps can now be leveraged with Hadoop.  Data from SaaS apps for CRM, ERP and other lines of business are becoming increasingly important to enterprises. Bringing this data into Hadoop for analytics is now easier than ever.

Users of Amazon Web Services (AWS) can leverage Informatica Cloud to load data from SaaS apps and on premise sources into EMR directly.  Combined with connectivity to Amazon Redshift, Informatica Cloud can be used to move data into EMR for processing and then onto Redshift for analytics.

Self service data loading and basic integration can be done by less technical users through Informatica Cloud’s drag and drop web-based user interface.  This enables more of the team to contribute to and collaborate on data lakes without having to learn Hadoop.

Bringing the cloud and Big Data together to put the potential of data to work – that’s the power of Informatica in action.

Free trials of the Informatica Cloud Connector for Hadoop are available here: http://www.informaticacloud.com/connectivity/hadoop-connector.html

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Services, Hadoop, SaaS | Tagged , , , | Leave a comment

Go On, Flip Your Division of Labor: More Time Analyzing and Less Time Prepping Data

Are you in Sales Operations, Marketing Operations, Sales Representative/Manager, or Marketing Professional? It’s no secret that if you are, you benefit greatly from the power of performing your own analysis, at your own rapid pace. When you have a hunch, you can easily test it out by visually analyzing data in Tableau without involving IT. When you are faced with tight timeframes in which to gain business insight from data, being able to do it yourself in the time you have available and without technical roadblocks makes all the difference.

Self-service Business Intelligence is powerful!  However, we all know it can be even more powerful. When needing to put together an analysis, we know that you spend about 80% of your time putting together data, and then just 20% of your time analyzing data to test out your hunch or gain your business insight. You don’t need to accept this anymore. We want you to know that there is a better way!

We want to allow you to Flip Your Division of Labor and allow you to spend more than 80% of your time analyzing data to test out your hunch or gain your business insight and less than 20% of your time putting together data for your Tableau analysis! That’s right. You like it. No, you love it. No, you are ready to run laps around your chair in sheer joy!! And you should feel this way. You now can spend more time on the higher value activity of gaining business insight from the data, and even find copious time to spend with your family. How’s that?

Project Springbok is a visionary new product designed by Informatica with the goal of making data access and data quality obstacles a thing of the past.  Springbok is meant for the Tableau user, a data person would rather spend their time visually exploring information and finding insight than struggling with complex calculations or waiting for IT. Project Springbok allows you to put together your data, rapidly, for subsequent analysis in Tableau. Project Springbok tells you things about your data that even you may not have known. It does it through Intelligent Suggestions that it presents to the User.

Let’s take a quick tour:

  • Project Springbok tells you, that you have a date column and that you likely want to obtain the Year and Quarter for your analysis (Fig 1)., And if you so wish, by a single click, voila, you have your corresponding years and even the quarters. And it all happened in mere seconds. A far cry from the 45 minutes it would have taken a fluent user of Excel to do using VLOOKUPS.

data

                                                                      Fig. 1

VALUE TO A MARKETING CAMPAIGN PROFESSIONAL: Rapidly validate and accurately complete your segmentation list, before you analyze your segments in Tableau. Base your segments on trusted data that did not take you days to validate and enrich.

  • Then Project Springbok will tell you that you have two datasets that could be joined on a common key, email for example, in each dataset, and would you like to move forward and join the datasets (Fig 2)? If you agree with Project Springbok’s suggestion, voila, dataset joined in a mere few seconds. Again, a far cry from the 45 minutes it would have taken a fluent user of Excel to do using VLOOKUPS.

Data

  Fig. 2

VALUE TO A SALES REPRESENTATIVE OR SALES MANAGER: You can now access your Salesforce.com data (Fig 3) and effortlessly combine it with ERP data to understand your true quota attainment. Never miss quota again due to a revenue split, be it territory or otherwise. Best of all, keep your attainment datatset refreshed and even know exactly what datapoint changed when your true attainment changes.

Data

Fig. 3

  • Then, if you want, Project Springbok will tell you that you have emails in the dataset, which you may or may not have known, but more importantly it will ask you if you wish to determine which emails can actually be mailed to. If you proceed, not only will Springbok check each email for correct structure (Fig 4), but will very soon determine if the email is indeed active, and one you can expect a response from. How long would that have taken you to do?

VALUE TO A TELESALES REPRESENTATIVE OR MARKETING EMAIL CAMPAIGN SPECIALIST : Ever thought you had a great email list and then found out most emails bounced? Now, confidently determine which emails are truly ones will be able to email to, before you send the message. Email prospects who you know are actually at the company and be confident you have their correct email addresses. You can then easily push the dataset into Tableau to analyze the trends in email list health.

Data

Fig. 4

 And, in case you were wondering, there is no training or install required for Project Springbok. The 80% of your time you used to spend on data preparation is now shrunk considerably, and this is after using only a few of Springbok’s capabilities. One more thing: You can even directly export from Project Springbok into Tableau via the “Export to Tableau TDE” menu item (Fig 5).  Project Springbok creates a Tableau TDE file and you just double click on it to open Tableau to test out your hunch or gain your business insight.

Data

Fig. 5

Here are some other things you should know, to convince you that you, too, can only spend no more than 20% of you time on putting together data for your subsequent Tableau analysis:

  • Springbok Sign-Up is Free
  • Springbok automatically finds problems with your data, and lets you fix them with a single click
  • Springbok suggests useful ways for you to combine different datasets, and lets you combine them effortlessly
  • Springbok suggests useful summarizations of your data, and lets you follow through on the summarizations with a single click
  • Springbok allows you to access data from your cloud or on-premise systems with a few clicks, and the automatically keep it refreshed. It will even tell you what data changed from the last time you saw it
  • Springbok allows you to collaborate by sharing your prepared data with others
  • Springbok easily exports your prepared data directly into Tableau for immediate analysis. You do not have to tell Tableau how to interpret the prepared data
  • Springbok requires no training or installation

Go on. Shift your division of labor in the right direction, fast. Sign-Up for Springbok and stop wasting precious time on data preparation. http://bit.ly/TabBlogs

———-

Are you going to be at Dreamforce this week in San Francisco?  Interested in seeing Project Springbok working with Tableau in a live demonstration?  Visit the Informatica or Tableau booths and see the power of these two solutions working hand-in-hand.Informatica is Booth #N1216 and Booth #9 in the Analytics Zone. Tableau is located in Booth N2112.

FacebookTwitterLinkedInEmailPrintShare
Posted in B2B, Big Data, Business Impact / Benefits, Business/IT Collaboration, General | Tagged , , , | Leave a comment

The Pros and Cons: Data Integration from the Bottom-Up and the Top-Down

Data Integration from the Bottom-Up and the Top-Down

Data Integration from the Bottom-Up and the Top-Down

What are the first steps of a data integration project?  Most are at a loss.  There are several ways to approach data integration, and your approach depends largely upon the size and complexity of your problem domain.

With that said, the basic approaches to consider are from the top-down, or the bottom-up.  You can be successful with either approach.  However, there are certain efficiencies you’ll gain with a specific choice, and it could significantly reduce the risk and cost.  Let’s explore the pros and cons of each approach.

Top-Down

Approaching data integration from the top-down means moving from the high level integration flows, down to the data semantics.  Thus, you an approach, perhaps even a tool-set (using requirements), and then define the flows that are decomposed down to the raw data.

The advantages of this approach include:

The ability to spend time defining the higher levels of abstraction without being limited by the underlying integration details.  This typically means that those charged with designing the integration flows are more concerned with how they have to deal with the underlying source and target, and this approach means that they don’t have to deal with that issue until later, as they break down the flows.

The disadvantages of this approach include:

The data integration architect does not consider the specific needs of the source or target systems, in many instances, and thus some rework around the higher level flows may have to occur later.  That causes inefficiencies, and could add risk and cost to the final design and implementation.

Bottom-Up

For the most part, this is the approach that most choose for data integration.  Indeed, I use this approach about 75 percent of the time.  The process is to start from the native data in the sources and targets, and work your way up to the integration flows.  This typically means that those charged with designing the integration flows are more concerned with the underlying data semantic mediation than the flows.

The advantages of this approach include:

It’s typically a more natural and traditional way of approaching data integration.  Called “data-driven” integration design in many circles, this initially deals with the details, so by the time you get up to the integration flows there are few surprises, and there’s not much rework to be done.  It’s a bit less risky and less expensive, in most cases.

The disadvantages of this approach include:

Starting with the details means that you could get so involved in the details that you miss the larger picture, and the end state of your architecture appears to be poorly planned, when all is said and done.  Of course, that depends on the types of data integration problems you’re looking to solve.

No matter which approach you leverage, with some planning and some strategic thinking, you’ll be fine.  However, there are different paths to the same destination, and some paths are longer and less efficient than others.  As you pick an approach, learn as you go, and adjust as needed.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Aggregation, Data Integration | Tagged , , , | Leave a comment

The Case for Smart Data: When Big Data Isn’t Enough

Every two years, the typical company doubles the amount of data they store. However, this Data is inherently “dumb.” Acquiring more of it only seems to compound its lack of intellect.

When revitalizing your business, I won’t ask to look at your data – not even a little bit. Instead, we look at the process of how you use the data. What I want to know is this:

How much of your day-to-day operations are driven by your data?

The Case for Smart Data

I recently learned that 7-Eleven Japan has pushed decision-making down to the store level – in fact, to the level of clerks. Store clerks decide what goes on the shelves in their individual 7-Eleven stores. These clerks push incredible inventory turns. Some 70% of the products on the shelves are new to stores each year. As a result, this chain has been the most profitable Japanese retailer for 30 years running.

Click to enlarge

Click to enlarge

Instead of just reading the data and making wild guesses on why something works and why something doesn’t, these clerks acquired the skill of looking at the quantitative and the qualitative and connected dots. Data told them what people are talking about, how it’s related to their product and how much weight it carried. You can achieve this as well. To do so, you must introduce a culture that emphasizes discipline around processes. A disciplined process culture uses:

  1. A template approach to data with common processes, reuse of components, and a single face presented to customers
  2. Employees who consistently follow standard procedures

If you cannot develop such company-wide consistency, you will not gain benefits of ERP or CRM systems.

Make data available to the masses. Like at 7-Eleven Japan, don’t centralize the data decision-making process. Instead, push it out to the ranks. By putting these cultures and practices into play, businesses can use data to run smarter.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Business Impact / Benefits, Business/IT Collaboration, Data Synchronization, Data Transformation, Data Warehousing, Master Data Management, Retail | Tagged , , , , , | Leave a comment

Are You Looking For A New Information Governance Framework?

A few months ago, while addressing a room full of IT and business professional at an Information Governance conference, a CFO said – “… if we designed our systems today from scratch, they will look nothing like the environment we own.” He went on to elaborate that they arrived there by layering thousands of good and valid decisions on top of one another.

Similarly, Information Governance has also evolved out of the good work that was done by those who preceded us. These items evolve into something that only a few can envision today. Along the way, technology evolved and changed the way we interact with data to manage our daily tasks. What started as good engineering practices for mainframes gave way to data management.

Then, with technological advances, we encountered new problems, introduced new tasks and disciplines, and created Information Governance in the process. We were standing on the shoulders of data management, armed with new solutions to new problems. Now we face the four Vs of big data and each of those new data system characteristics have introduced a new set of challenges driving the need for Big Data Information Governance as a response to changing velocity, volume, veracity, and variety.

Information GovernanceDo you think we need a different framework?

Before I answer this question, I must ask you “How comprehensive is the framework you are using today and how well does it scale to address the new challenges?

While there are several frameworks out in the marketplace to choose from. In this blog, I will tell you what questions you need to ask yourself before replacing your old framework with a new one:

Q. Is it nimble?

The focus of data governance practices must allow for nimble responses to changes in technology, customer needs, and internal processes. The organization must be able to respond to emergent technology.

Q. Will it enable you to apply policies and regulations to data brought into the organization by a person or process?

  • Public company: Meet the obligation to protect the investment of the shareholders and manage risk while creating value.
  • Private company: Meet privacy laws even if financial regulations are not applicable.
  • Fulfill the obligations of external regulations from international, national, regional, and local governments.

Q. How does it Manage quality?

For big data, the data must be fit for purpose; context might need to be hypothesized for evaluation. Quality does not imply cleansing activities, which might mask the results.

Q. Does it understanding your complete business and information flow?

Attribution and lineage are very important in big data. Knowing what is the source and what is the destination is crucial in validating analytics results as fit for purpose.

Q. How does it understanding the language that you use, and can the framework manage it actively to reduce ambiguity, redundancy, and inconsistency?

Big data might not have a logical data model, so any structured data should be mapped to the enterprise model. Big data still has context and thus modeling becomes increasingly important to creating knowledge and understanding. The definitions evolve over time and the enterprise must plan to manage the shifting meaning.

Q. Does it manage classification?

It is critical for the business/steward to classify the overall source and the contents within as soon as it is brought in by its owner to support of information lifecycle management, access control, and regulatory compliance.

Q. How does it protect data quality and access?

Your information protection must not be compromised for the sake of expediency, convenience, or deadlines. Protect not just what you bring in, but what you join/link it to, and what you derive. Your customers will fault you for failing to protect them from malicious links. The enterprise must formulate the strategy to deal with more data, longer retention periods, more data subject to experimentation, and less process around it, all while trying to derive more value over longer periods.

Q. Does it foster stewardship?

Ensuring the appropriate use and reuse of data requires the action of an employee. E.g., this role cannot be automated, and it requires the active involvement of a member of the business organization to serve as the steward over the data element or source.

Q. Does it manage long-term requirements?

Policies and standards are the mechanism by which management communicates their long-range business requirements. They are essential to an effective governance program.

Q. How does it manage feedback?

As a companion to policies and standards, an escalation and exception process enables communication throughout the organization when policies and standards conflict with new business requirements. It forms the core process to drive improvements to the policy and standard documents.

Q. Does it Foster innovation?

Governance must not squelch innovation. Governance can and should make accommodations for new ideas and growth. This is managed through management of the infrastructure environments as part of the architecture.

Q. How does it control third-party content?

Third-party data plays an expanding role in big data. There are three types and governance controls must be adequate for the circumstances. They must consider applicable regulations for the operating geographic regions; therefore, you must understand and manage those obligations.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Governance, Master Data Management | Tagged , , , , , | Leave a comment

Build Your Modern Data Architecture with Hadoop and Informatica

Hortonworks, Hadoop and Informatica

Build Your Modern Data Architecture

 

This is a guest post by John Kreisa, Vice President Strategic Marketing, Hortonworks

Today, 80% of the efforts in Big Data projects are related to extracting, transforming and loading data (ETL). Hortonworks and Informatica have teamed-up to leverage the power of Informatica Big Data Edition to use their existing skills to improve the efficiency of these operations and better leverage their resources in a modern data architecture. (MDA)

Next Generation Data Management

The Hortonworks Data Platform and Informatica BDE enable organizations to optimize their ETL workloads with long-term storage and processing at scale in Apache Hadoop. With Hortonworks and Informatica, you can:

• Leverage all internal and external data to achieve the full predictive power that drives the success of modern data-driven businesses.
• Optimize the entire big data supply chain on Hadoop, turning data into actionable information to drive business value.

Key Advantages

Imagine a world where you would have access to your most strategic data in a timely fashion, no matter how old the data is, where it is stored, or under what format. By leveraging Hadoop’s power of distributed processing, organizations can lower costs of data storage and processing and support large data distribution with high through put and concurrency.

Overall, the alignment between business and IT grows. The Big Data solution based on Informatica and Hortonworks allows for a complete data pipeline to ingest, parse, integrate, cleanse, and prepare data for analysis natively on Hadoop thereby increasing developer productivity by 5x over hand-coding.

Where Do We Go From Here?

At the end of the day, Big Data is not about the technology. It is about the deep business and social transformation every organization will go through. The possibilities to make more informed decisions, identify patterns, proactively address fraud and threats, and predict pretty much anything are endless.

This transformation will happen as the technology is adopted and leveraged by more and more business users. We are already seeing the transition from 20-node clusters to 100-node clusters and from a handful of technology-savvy users relying on Hadoop to hundreds of business users. Informatica and Hortonworks are accelerating the delivery of actionable Big Data insights to business users by automating the entire data pipeline.

Try It For Yourself

On September 10, 2014, Informatica announced the 60-day trial version of the Informatica Big Data Edition into the Hortonworks Sandbox. This free trial enables you to download and test out the Big Data Edition on your notebook or spare computer and experience your own personal Modern Data Architecture (MDA).

If you happen to be at Strata this October 2014, please meet us at our booths: Informatica #352 and Hortonworks #117. Don’t forget to participate in our Passport Program and join our session at 5:45 pm ET on Thursday, October 16, 2014.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Hadoop | Tagged , , | Leave a comment