Tag Archives: Data Integration
It takes a village to build mainstream big data solutions. We often get so caught up in Hadoop use cases and customer successes that sometimes we don’t talk enough about the innovative partner technologies and integrations that enable our customers to put the enterprise data hub at the core of their data architecture and innovate with confidence. Cloudera and Informatica have been working together to integrate our products to enable new levels of productivity and lower deployment and production risk.
Going from Hadoop to an enterprise data hub, means a number of things. It means that you recognize the business value of capturing and leveraging all your data for exploration and analytics. It means you’re ready to make the move from Hadoop pilot project to production. And it means your data is important enough that it’s worth securing and making data pipelines visible. It’s the visibility layer, and in particular, the unique integration between Cloudera Navigator and Informatica that I want to focus on in this post.
The era of big data has ushered in increased regulations in a number of industries – banking, retail, healthcare, energy – most of which deal in how data is managed throughout its lifecycle. Cloudera Navigator is the only native end-to-end solution for governance in Hadoop. It provides visibility for analysts to explore data in Hadoop, and enables administrators and managers to maintain a full audit history for HDFS, HBase, Hive, Impala, Spark and Sentry then run reports on data access for auditing and compliance.The integration of Informatica Metadata Manager in the Big Data Edition and Cloudera Navigator extends this level of visibility and governance beyond the enterprise data hub.
Today, only Informatica and Cloudera provide end-to-end data lineage from source systems through Hadoop, and into BI/analytic and data warehouse systems. And you can view it from a single pane within Informatica.
This is important because Hadoop, and the enterprise data hub in particular, doesn’t function in a silo. It’s an integrated part of a larger enterprise-wide data management architecture. The better the insight into where data originated, where it traveled, who had access to it and what they did with it, the greater our ability to report and audit. No other combination of technologies provides this level of audit granularity.
But more so than that, the visibility Cloudera and Informatica provides our joint customers with the ability to confidently stand up an enterprise data hub as a part of their production enterprise infrastructure because they can verify the integrity of the data that undergirds their analytics. I encourage you to check out a demo of the Informatica-Cloudera Navigator integration at this link: http://infa.media/1uBpPbT
You can also check out a demo and learn a little more about Cloudera Navigator and the Informatica integration in the recorded TechTalk hosted by Informatica at this link:
Building an Enterprise Data Hub with proper Data IntegrationData flows into the enterprise from many sources, in many formats, sizes, and levels of complexity. And as enterprise architectures have evolved over the years, traditional data warehouses have become less of a final staging center for data, but rather, one component of the enterprise that interfaces with significant data flows. But since data warehouses should focus on being powerful engines for high value analytics, they should not be the central hub for data movement and data preparation (e.g. ETL/ELT), especially for the newer data types–such as social media, clickstream data, sensor data, internet-of-things-data, etc.–that are in use today.
When you start seeing data warehouse capacity consumed too quickly and performance degradation where end users are complaining about slower response times, and you risk not meeting your service-level agreements, then it might be time to consider an enterprise data hub (EDH). With an EDH, especially one built on Apache™ Hadoop®, you can plan a strategy around data warehouse optimization to get better use out of your entire enterprise architecture.
Of course, whenever you add another new technology to your data center, you care about interoperability. And since many systems in today’s architectures interoperate via data flows, it’s clear that sophisticated data integration technologies will be an important part of your EDH strategy. Today’s big data presents new challenges as relates to a wide variety of data types and formats, and the right technologies are needed to glue all the pieces together, whether those pieces are data warehouses, relational databases, Hadoop, or NoSQL databases.
Choosing a Data Integration Solution
Data integration software, at a high level, has one broad responsibility: to help you process and prepare your data with the right technology. This means it has to get your data to the right place in the right format in a timely manner. So it actually includes many tasks, but the end result is that timely, trusted data can be used for decision-making and risk management throughout the enterprise. You end up with a complete, ready-for-analysis picture of your business, as opposed to segmented snapshots based on a limited data set.
When evaluating a data integration solution for the enterprise, look for:
- Ease of use to boost developer productivity
- A proven track record in the industry
- Widely available technology expertise
- Experience with production deployments with newer technologies like Hadoop
- Ability to reuse data pipelines across different technologies (e.g. data warehouse, RDBMS, Hadoop, and other NoSQL databases)
Data integration is only part of the story. When you’re depending on data to drive business decisions and risk management, you clearly want to ensure the data is reliable. Data governance, data lineage, data quality, and data auditing remain as important topics in an EDH. Oftentimes, data privacy regulatory demands must be met, and the enterprise’s own intellectual property must be protected from accidental exposure.
To help ensure that data is sound and secure, look for a solution that provides:
- Centralized management and control
- Data certification prior to publication, transparent data and integration processes, and the ability to track data lineage
- Granular security, access controls, and data masking to protect data both in transit and at the source to prevent unauthorized access to specific data sets
Informatica is the data integration solution selected by many enterprises. Informatica’s family of enterprise data integration, data quality, and other data management products can manage data — of any format, complexity level, or size –from any business system, and then deliver that data across the enterprise at the desired speed.
Watch the latest Gartner video to see Todd Goldman, Vice President and General Manager for Enterprise Data Integration at Informatica, as well as executives from Cisco and MapR, give their perspective on how businesses today can gain even more value from big data.
How would you like to wake up to an extra billion dollars, or maybe nine, in the bank? This has happened to a teacher in India. He discovered to his astonishment a balance of $9.8 billion in his bank account!
How would you like to be the bank who gave the client an extra nine Billion dollars? Oh, to be a fly on the wall when the IT department got that call. How do you even begin to explain? Imagine the scrambling to track down the source of the data error.
This was a glaringly obvious error, which is easily caught. But there is potential for many smaller data errors. These errors may go undetected and add up hurting your bottom line. How could this type of data glitch happen? More importantly, how can you protect your organization from these types of errors in your data?
A primary source of data mistakes is insufficient testing during Data Integration. Any change or movement of data harbors risk to its integrity. Unfortunately there are often insufficient IT resources to adequately validate the data. Some organizations validate the data manually. This is a lengthy, unreliable process, fraught with data errors. Furthermore manual testing does not scale well to large data volumes or complex data changes. So the validation is often incomplete. Finally some organizations simply lack the resources to conduct any level of data validation altogether.
Many of our customers have been able to successfully address this issue via automated data validation testing. (Also known as DVO). In a recent TechValidate survey, Informatica customers have told us that they:
- Reduce costs associated with data testing.
- Reduce time associated with data testing.
- Increase IT productivity.
- Increase the business trust in the data.
Customers tell us some of the biggest potential costs relate to damage control which occurs when something goes wrong with their data. The tale above, of our fortunate man and not so fortunate bank, can be one example. Bad data can hurt a company’s reputation and lead to untold losses in market-share and customer goodwill. In today’s highly regulated industries, such as healthcare and financial services, consequences of incorrect data can be severe. This can include heavy fines or worse.
Using automated data validation testing allows customers to save on ongoing testing costs and deliver reliable data. Just as important, it prevents pricey data errors, which require costly and time-consuming damage control. It is no wonder many of our customers tell us they are able to recoup their investment in less than 12 months!
TechValidate survey shows us that customers are using data validation testing in a number of common use cases including:
- Regression (Unit) testing
- Application migration or consolidation
- Software upgrades (Applications, databases, PowerCenter)
- Production reconciliation
One of the most beneficial use cases for data validation testing has been for application migration and consolidation. Many SAP migration projects undertaken by our customers have greatly benefited from automated data validation testing. Application migration or consolidation projects are typically large and risky. A Bloor Research study has shown 38% of data migration projects fail, incurring overages or are aborted altogether. According to a Harvard Business Review article, 1 in 6 large IT projects run 200% over budget. Poor data management is one of the leading pitfalls in these types of projects. However, according to Bloor Research, Informatica’ s data validation testing is a capability they have not seen elsewhere in the industry.
A particularly interesting example of above use case is in the case of M&A situation. The merged company is required to deliver ‘day-1 reporting’. However FTC regulations forbid the separate entities from seeing each other’s data prior to the merger. What a predicament! The automated nature of data validation testing, (Automatically deploying preconfigured rules on large data-sets) enables our customers to prepare for successful day-1 reporting under these harsh conditions.
And what about you? What are the costs to your business for potentially delivering incorrect, incomplete or missing data? To learn more about how you can provide the right data on time, every time, please visit www.datavalidation.me
According to the article, in Hamilton County Ohio, it’s not unusual to see kids from the same neighborhoods coming to the hospital for asthma attacks. Thus, researchers wanted to know if it was fact or mistaken perception that an unusually high number of children in the same neighborhood were experiencing asthma attacks. The next step was to review existing data to determine the extent of the issues, and perhaps how to solve the problem altogether.
“The researchers studied 4,355 children between the ages of 1 and 16 who visited the emergency department or were hospitalized for asthma at Cincinnati Children’s between January 2009 and December 2012. They tracked those kids for 12 months to see if they returned to the ED or were readmitted for asthma.”
Not only were the researchers able to determine a sound correlation between the two data sets, but they were able to advance the research to predict which kids were at high-risk based upon where they live. Thus, some of the cause and the effects have been determined.
This came about when researchers began thinking out of the box, when it comes to dealing with traditional and non-traditional medical data. They integrated housing and census data, in this case, with that of the data from the diagnosis and treatment of the patients. These are data sets unlikely to find their way to each other, but together they have a meaning that is much more valuable than if they just stayed in their respective silos.
“Non-traditional medical data integration has begun to take place in some medical collaborative environments already. The New York-Presbyterian Regional Health Collaborative created a medical village, which ‘goes beyond the established patient-centered medical home mode.’ It not only connects an academic medical center with a large ambulatory network, medical homes, and other providers with each other, but community resources such as school-based clinics and specialty-care centers (the ones that are a part of NYP’s network).”
The fact of the matter is that data is the key to understanding what the heck is going on when cells of sick people begin to emerge. While researchers and doctors can treat the individual patients there is not a good understanding of the larger issues that may be at play. In this case, poor air quality in poor neighborhoods. Thus, they understand what problem needs to be corrected.
The universal sharing of data is really the larger solution here, but one that won’t be approached without a common understanding of the value, and funding. As we pass laws around the administration of health care, as well as how data is to be handled, perhaps it’s time we look at what the data actually means. This requires a massive deployment of data integration technology, and the fundamental push to share data with a central data repository, as well as with health care providers.
Why now? Because companies need help making sense of the data deluge, Salesforce’s CEO Marc Benioff said at Dreamforce: “Did you know 90% of the world’s data was created in the last two years? There’s going to be 10 times more mobile data by 2020, 19 times more unstructured data, and 50 times more product data by 2020.” Average business users want to understand what that data is telling them, he said. Given Salesforce’s marketing expertise, this could be the spark that gets mainstream businesses to adopt the Data-First perspective I’ve been talking about.
As I’ve said before, a Data First POV shines a light on important interactions so that everyone inside a company can see and understand what matters. As a trained process engineer, I can tell you, though, that good decisions depend on great data — and great data doesn’t just happen: At the most basic level, you have to clean it, relate it, connect and secure it — so that information from, say, SAP, can be viewed in the same context as data from Salesforce. Informatica obviously plays a role in this. If you want to find out more, click on this link to download our Salesforce Integration for Dummies brochure.
But that’s the basics for getting started. The bigger issue — and the one so many people seem to have trouble with — is deciding which metrics to explore. Say, for example, that the sales team keeps complaining about your marketing leads. Chances are, it’s a familiar complaint. How do you discover what’s really the problem?
One obvious place to start to first look at the conversation rates for every sales rep and group. Next explore the marketing leads they do accept such as deal size, product type or customer category. Now take it deeper. Examine which sales reps like to hunt for new customers and which prefer to mine their current base. That will tell you if you’re sending opportunities to the right profiles.
The key is never looking at the sales organization as a whole. If it’s EMEA, for instance, have a look to see how France is doing selling to emerging markets vs. the team in Germany. These metrics are digital trails of human behavior. Data First allows you to explore that behavior and either optimize it or change it.
But for this exploration to pay off, you actually have to do some of the work. You can’t just job it out to an analyst. This exercise doesn’t become meaningful until you are mentally engaged in the process. And that’s how it should be: If you are a Data First company, you have to be a Data First leader.
Q: What was the driver for this project?
A: The initiative fell out of a procure-to-pay (P2P) initiative. We engaged a consulting firm to help centralize Accounts Payable operations. One required deliverable was an executive P2P dashboard. This dashboard would provide enterprise insights by relying on the enterprise data warehousing and business intelligence platform.
Q: What did the dashboard illustrate?
The dashboard integrated data from many sources to provide a single view of information about all of our suppliers. By visualizing this information in one place, we were able to rapidly gain operational insights. There are approximately 30,000 suppliers in the supplier master who either manufacture, or distribute, or both over 150,000 unique products.
Q: From which sources is Informatica consuming data to power the P2P dashboard?
A: There are 8 sources of data:
3 ERP Systems:
- HBOC STAR
4 Enrichment Sources:
- Dun & Bradstreet – for associating suppliers together from disparate sources.
- GDSN – Global Data Pool for helping to cleanse healthcare products.
- McKesson Pharmacy Spend – spend file from third party pharmaceutical distributor Helps capture detailed pharmacy spend which we procure from this third party.
- Office Depot Spend – spend file from third party office supply distributor. Helps capture detailed pharmacy spend.
- MedAssets – third party group purchasing organization (GPO) who provides detailed contract pricing.
Q: Did you tackle clinical scenarios first?
A: No, well we certainly have many clinical scenarios we want to explore like cost per procedure per patient we knew that we should establish a few quick, operational wins to gain traction and credibility.
Q: Great idea – capturing quick wins is certainly the way we are seeing customers have the most success in these transformative projects. Where did you start?
A: We started with supply chain cost containment; increasing pressures on healthcare organizations to reduce cost made this low hanging fruit the right place to start. There may be as much as 20% waste to be eliminated through strategic and actionable analytics.
Q: What did you discover?
A: Through the P2P dashboard, insights were gained into days to pay on invoices as well as early payment discounts and late payment penalties. With the visualization we quickly saw that we were paying a large amount of late fees. With this awareness, we dug into why the late fees were so high. What was discovered is that, with one large supplier, the original payment terms were net 30 but that in later negotiations terms were changed to 20 days. Late fees were accruing after 20 days. Through this complete view we were able to rapidly hone in on the issue and change operations — avoiding costly late fees.
Q: That’s a great example of straight forward analytics powered by an integrated view of data, thank you. What’s a more complex use case you plan to tackle?
A: Now that we have the systems in place along with data stewardship, we will start to focus on clinical supply chain scenarios like cost per procedure per patient. We have all of the data in one data warehouse to answer questions like – which procedures are costing the most, do procedure costs vary by clinician? By location? By supply? – and what is the outcome of each of these procedures? We always want to take the right and best action for the patient.
We were also able to identify where negotiated payment discounts were not being taken advantage of or where there were opportunities to negotiate discounts.
These insights were revealed through the dashboard and immediate value was realized the first day.
Fueling knowledge with data is helping procurement negotiate the right discounts, i.e. they can seek discounts on the most used supplies vs discounts on supplies rarely used. Think of it this way… you don’t want to get a discount on OJ and if you are buying milk.
Q: Excellent example and metaphor. Let’s talk more about stewardship, you have a data governance organization within IT that is governing supply chain?
A: No, we have a data governance team within supply chain… Supply chain staff that used to be called “content managers” now “data stewards”. They were doing the stewardship work of defining data, its use, its source, its quality before but it wasn’t a formally recognized part of their jobs… now it is. Armed with Informatica Data Director they are managing the quality of supply chain data across four domains including suppliers/vendors, locations, contracts and items. Data from each of these domains resides in our EMR, our ERP applications and in our ambulatory EMR/Practice Management application creating redundancy and manual reconciliation effort.
By adding Master Data Management (MDM) to the architecture, we were able to centralize management of master data about suppliers/vendors, items, contracts and locations, augment this data with enrichment data like that from D&B, reduce redundancy and reduce manual effort.
MDM shares this complete and accurate information with the enterprise data warehouse and we can use it to run analytics against. Having a confident, complete view of master data allows us to trust analytical insights revealed through the P2P dashboard.
Q: What lessons learned would you offer?
A: Having recognized operational value, I’d encourage health systems to focus on data driven supply chain because there are savings opportunities through easier identification of unmanaged spend.
I really enjoyed learning more about this project with valuable, tangible and nearly immediate results. I will keep you posted as the customer moves onto the next phase. If you have comments or questions, leave them here.
From this analysis in “What’s Reasonable Security? A Moving Target,” IAPP extrapolated the best practices from the FTC’s enforcement actions.
While the white paper and article indicate that “reasonable security” is a moving target it does provide recommendations that will help organizations access and baseline their current data security efforts. Interesting is the focus on data centric security, from overall enterprise assessment to the careful control of access of employees and 3rd parties. Here some of the recommendations derived from the FTC’s enforcements that call for Data Centric Security:
- Perform assessments to identify reasonably foreseeable risks to the security, integrity, and confidentiality of personal information collected and stored on the network, online or in paper files.
- Limited access policies curb unnecessary security risks and minimize the number and type of network access points that an information security team must monitor for potential violations.
- Limit employee access to (and copying of) personal information, based on employee’s role.
- Implement and monitor compliance with policies and procedures for rendering information unreadable or otherwise secure in the course of disposal. Securely disposed information must not practicably be read or reconstructed.
- Restrict third party access to personal information based on business need, for example, by restricting access based on IP address, granting temporary access privileges, or similar procedures.
How does Data Centric Security help organizations achieve this inferred baseline?
- Data Security Intelligence (Secure@Source coming Q2 2015), provides the ability to “…identify reasonably foreseeable risks.”
- Data Masking (Dynamic and Persistent Data Masking) provides the controls to limit access of information to employees and 3rd parties.
- Data Archiving provides the means for the secure disposal of information.
Other data centric security controls would include encryption for data at rest/motion and tokenization for securing payment card data. All of the controls help organizations secure their data, whether a threat originates internally or externally. And based on the never ending news of data breaches and attacks this year, it is a matter of when, not if your organization will be significantly breached.
For 2015, “Reasonable Security” will require ongoing analysis of sensitive data and the deployment of reciprocal data centric security controls to ensure that the organizations keep pace with this “Moving Target.”
This is a guest author post by Philip Howard, Research Director, Bloor Research.
I recently posted a blog about an interview style webcast I was doing with Informatica on the uses and costs associated with data integration tools.
I’m not sure that the poet John Donne was right when he said that it was strange, let alone fatal. Somewhat surprisingly, I have had a significant amount of feedback following this webinar. I say “surprisingly” because the truth is that I very rarely get direct feedback. Most of it, I assume, goes to the vendor. So, when a number of people commented to me that the research we conducted was both unique and valuable, it was a bit of a thrill. (Yes, I know, I’m easily pleased).
There were a number of questions that arose as a result of our discussions. Probably the most interesting was whether moving data into Hadoop (or some other NoSQL database) should be treated as a separate use case. We certainly didn’t include it as such in our original research. In hindsight, I’m not sure that the answer I gave at the time was fully correct. I acknowledged that you certainly need some different functionality to integrate with a Hadoop environment and that some vendors have more comprehensive capabilities than others when it comes to Hadoop and the same also applies (but with different suppliers, when it comes to integrating with, say, MongoDB or Cassandra or graph databases). However, as I pointed out in my previous blog, functionality is ephemeral. And, just because a particular capability isn’t supported today, doesn’t mean it won’t be supported tomorrow. So that doesn’t really affect use cases.
However, where I was inadequate in my reply was that I only referenced Hadoop as a platform for data warehousing, stating that moving data into Hadoop was not essentially different from moving it into Oracle Exadata or Teradata or HP Vertica. And that’s true. What I forgot was the use of Hadoop as an archiving platform. As it happens we didn’t have an archiving use case in our survey either. Why not? Because archiving is essentially a form of data migration – you have some information lifecycle management and access and security issues that are relevant to archiving once it is in place but that is after the fact: the process of discovering and moving the data is exactly the same as with data migration. So: my bad.
Aside from that little caveat, I quite enjoyed the whole event. Somebody or other (there’s always one!) didn’t quite get how quantifying the number of end points in a data integration scenario was a surrogate measure for complexity (something we took into account) and so I had to explain that. Of course, it’s not perfect as a metric but it’s the only alternative to ask eye of the beholder type questions which aren’t very satisfactory.
Anyway, if you want to listen to the whole thing you can find it HERE:
According to Accenture – 2013 Global Consumer Pulse Survey, “85 percent of customers are frustrated by dealing with a company that does not make it easy to do business with them, 84 percent by companies promising one thing, but delivering another; and 58 percent are frustrated with inconsistent experiences from channel to channel.”
Consumers expect more from the companies they do business with. In response, many companies are shifting from managing their business based on an application-, account- or product-centric approach to a customer-centric approach. And this is one of the main drivers for master data management (MDM) adoption. According to a VP of Data Strategy & Services at one of the largest insurance companies in the world, “Customer data is the lifeblood of a company that is serious about customer-centricity.” So, better managing customer data, which is what MDM enables you to do, is a key to the success of any customer-centricity initiative. MDM provides a significant competitive differentiation opportunity for any organization that’s serious about improving customer experience. It enables customer-facing teams to assess the value of any customer, at the individual, household or organization level.
Amongst the myriad business drivers of a customer-centricity initiative, key benefits include delivering an enhanced customer experience – leading to higher customer loyalty and greater share of wallet, more effective cross-sell and upsell targeting to increase revenue, and improved regulatory compliance.
To truly achieve all the benefits expected from a customer-first, customer-centric strategy, we need to look beyond the traditional approaches of data quality and MDM implementations, which often consider only one foundational (yet important) aspect of the technology solution. The primary focus has always been to consolidate and reconcile internal sources of customer data with the hope that this information brought under a single umbrella of a database and a service layer will provide the desired single view of customer. But in reality, this data integration mindset misses the goal of creating quality customer data that is free from duplication and enriched to deliver significant value to the business.
Today’s MDM implementations need to take their focus beyond mere data integration to be successful. In the following section, I will explain 3 levels of customer views which can be built incrementally to be able to make most out of your MDM solution. When implemented fully, these customer views act as key ingredients for improving the execution of your customer-centric business functions.
Trusted Customer View
The first phase of the solution should cover creation of trusted customer view. This view empowers your organization with an ability to see complete, accurate and consistent customer information.
In this stage, you take the best information from all the applications and compile it into a single golden profile. You not only use data integration technology for this, but also employ data quality tools to ensure the correctness and completeness of the customer data. Advanced matching, merging and trust framework are used to derive the most up-to-date information about your customer. You also guarantee that the golden record you create is accessible to business applications and systems of choice so everyone who has the authority can leverage the single version of the truth.
At the end of this stage, you will be able to clearly say John D. who lives at 123 Main St and Johnny Doe at 123 Main Street, who are both doing business with you, are not really two different individuals.
Customer Relationships View
The next level of visibility is about providing a view into the customer’s relationships. It takes advantage of the single customer view and layers in all valuable family and business relationships as well as account and product information. Revealing these relationships is where the real value of multidomain MDM technology comes into action.
At the end of this phase, you not only see John Doe’s golden profile, but the products he has. He might have a personal checking from the Retail Bank, a mortgage from the Mortgage line of business, and brokerage and trust account with the Wealth Management division. You can see that John has his own consulting firm. You can see he has a corporate credit card and checking account with the Commercial division under the name John Doe Consulting Company.
At the end of this phase, you will have a consolidated view of all important relationship information that will help you evaluate the true value of each customer to your organization.
Customer Interactions and Transactions View
The third level of visibility is in the form of your customer’s interactions and transactions with your organization.
During this phase, you tie transactional information, historical data and social interactions your customer has with your organization to further enhance the system. Building this view provides you a whole new world of opportunities because you can see everything related to your customer in one central place. Once you have this comprehensive view, when John Doe calls your call center, you know how valuable he is to your business, which product he just bought from you (transactional data), what is the problem he is facing (social interactions).
A widely accepted rule of thumb holds that 80 percent of your company’s future revenue will come from 20 percent of your existing customers. Many organizations are trying to ensure they are doing everything they can to retain existing customers and grow wallet share. Starting with Trusted Customer View is first step towards making your existing customers stay. Once you have established all three states discussed here, you can arm your customer-facing teams with a comprehensive view of customers so they can:
- Deliver the best customer experiences possible at every touch point,
- Improve customer segmentation for tailored offers, boost marketing and sales productivity,
- Increase cross-sell and up-sell success, and
- Streamline regulatory reporting.
Achieving the 3 views discussed here requires a solid data management platform. You not only need an industry leading multidomain MDM technology, but also require tools which will help you integrate data, control the quality and connect all the dots. These technologies should work together seamlessly to make your implementation easier and help you gain rapid benefits. Therefore, choose your data management platform. To know more about MDM vendors, read recently released Gartner’s Magic Quadrant for MDM of Customer Data Solutions.
California reported a total of 167 data breaches in 2013, which is up 28 percent from the 2012. Two major data breaches caused most of this uptick, including the Target attack that was reported in December 2013, and the LivingSocial attack that occurred in April 2013. This year, you can add the Home Depot data breach to that list, as well as the recent breach at the US Post Office.
So, what the heck is going on? And how does this new impact data integration? Should we be concerned, as we place more and more data on public clouds, or within big data systems?
Almost all of these breaches were made possible by traditional systems with security technology and security operations that fell far enough behind that outside attackers found a way in. You can count on many more of these attacks, as enterprises and governments don’t look at security as what it is; an ongoing activity that may require massive and systemic changes to make sure the data is properly protected.
As enterprises and government agencies stand up cloud-based systems, and new big data systems, either inside (private) or outside (public) of the enterprise, there are some emerging best practices around security that those who deploy data integration should understand. Here are a few that should be on the top of your list:
First, start with Identity and Access Management (IAM) and work your way backward. These days, most cloud and non-cloud systems are complex distributed systems. That means IAM is is clearly the best security model and best practice to follow with the emerging use of cloud computing.
The concept is simple; provide a security approach and technology that enables the right individuals to access the right resources, at the right times, for the right reasons. The concept follows the principle that everything and everyone gets an identity. This includes humans, servers, APIs, applications, data, etc.. Once that verification occurs, it’s just a matter of defining which identities can access other identities, and creating policies that define the limits of that relationship.
Second, work with your data integration provider to identify solutions that work best with their technology. Most data integration solutions address security in one way, shape, or form. Understanding those solutions is important to secure data at rest and in flight.
Finally, splurge on monitoring and governance. Many of the issues around this growing number of breaches exist with the system managers’ inability to spot and stop attacks. Creative approaches to monitoring system and network utilization, as well as data access, will allow those in IT to spot most of the attacks and correct the issues before the ‘go nuclear.’ Typically, there are an increasing number of breach attempts that lead up to the complete breach.
The issue and burden of security won’t go away. Systems will continue to move to public and private clouds, and data will continue to migrate to distributed big data types of environments. And that means the need data integration and data security will continue to explode.