Category Archives: Data Integration
Every fall Informatica sales leadership puts together its strategy for the following year. The revenue target is typically a function of the number of sellers, the addressable market size and key accounts in a given territory, average spend and conversion rate given prior years’ experience, etc. This straight forward math has not changed in probably decades, but it assumes that the underlying data are 100% correct. This data includes:
- Number of accounts with a decision-making location in a territory
- Related IT spend and prioritization
- Organizational characteristics like legal ownership, industry code, credit score, annual report figures, etc.
- Key contacts, roles and sentiment
- Prior interaction (campaign response, etc.) and transaction (quotes, orders, payments, products, etc.) history with the firm
Every organization, no matter if it is a life insurer, a pharmaceutical manufacturer, a fashion retailer or a construction company knows this math and plans on getting somewhere above 85% achievement of the resulting target. Office locations, support infrastructure spend, compensation and hiring plans are based on this and communicated.
So why is it that when it is an open secret that the underlying data is far from perfect (accurate, current and useful) and corrupts outcomes, too few believe that fixing it has any revenue impact? After all, we are not projecting the climate for the next hundred years here with a thousand plus variables.
If corporate hierarchies are incorrect, your spend projections based on incorrect territory targets, credit terms and discount strategy will be off. If every client touch point does not have a complete picture of cross-departmental purchases and campaign responses, your customer acquisition cost will be too high as you will contact the wrong prospects with irrelevant offers. If billing, tax or product codes are incorrect, your billing will be off. This is a classic telecommunication example worth millions every month. If your equipment location and configuration is wrong, maintenance schedules will be incorrect and every hour of production interruption will cost an industrial manufacturer of wood pellets or oil millions.
Also, if industry leaders enjoy an upsell ratio of 17%, and you experience 3%, data (assuming you have no formal upsell policy as it violates your independent middleman relationship) data will have a lot to do with it.
The challenge is not the fact that data can create revenue improvements but how much given the other factors: people and process.
Every industry laggard can identify a few FTEs who spend 25% of their time putting one-off data repositories together for some compliance, M&A customer or marketing analytics. Organic revenue growth from net-new or previously unrealized revenue is what the focus of any data management initiative should be. Don’t get me wrong; purposeful recruitment (people), comp plans and training (processes) are important as well. Few people doubt that people and process drives revenue growth. However, few believe data being fed into these processes has an impact.
This is a head scratcher for me. An IT manager at a US upstream oil firm once told me that it would be ludicrous to think data has a revenue impact. They just fixed data because it is important so his consumers would know where all the wells are and which ones made a good profit. Isn’t that assuming data drives production revenue? (Rhetorical question)
A CFO at a smaller retail bank said during a call that his account managers know their clients’ needs and history. There is nothing more good data can add in terms of value. And this happened after twenty other folks at his bank including his own team delivered more than ten use cases, of which three were based on revenue.
Hard cost (materials and FTE) reduction is easy, cost avoidance a leap of faith to a degree but revenue is not any less concrete; otherwise, why not just throw the dice and see how the revenue will look like next year without a central customer database? Let every department have each account executive get their own data, structure it the way they want and put it on paper and make hard copies for distribution to HQ. This is not about paper versus electronic but the inability to reconcile data from many sources on paper, which is a step above electronic.
Have you ever heard of any organization move back to the Fifties and compete today? That would be a fun exercise. Thoughts, suggestions – I would be glad to hear them?
The post is by Philip Howard, Research Director, Bloor Research.
One of the standard metrics used to support buying decisions for enterprise software is total cost of ownership. Typically, the other major metric is functionality. However functionality is ephemeral. Not only does it evolve with every new release but while particular features may be relevant to today’s project there is no guarantee that those same features will be applicable to tomorrow’s needs. A broader metric than functionality is capability: how suitable is this product for a range of different project scenarios and will it support both simple and complex environments?
Earlier this year Bloor Research published some research into the data integration market, which exactly investigated these issues: how often were tools reused, how many targets and sources were involved, for what sort of projects were products deemed suitable? And then we compared these with total cost of ownership figures that we also captured in our survey. I will be discussing the results of our research live with Kristin Kokie, who is the interim CIO of Informatica, on Guy Fawkes’ day (November 5th). I don’t promise anything explosive but it should be interesting and I hope you can join us. The discussions will be vendor neutral (mostly: I expect that Kristin has a degree of bias).
To Register for the Webinar, click Here.
In his recent article: “The catalog is dead – long live the catalog,” Informatica’s Ben Rund spoke about how printed catalogs are positioned as a piece of the omnichannel puzzle and are a valuable touch point on the connected customer’s informed purchase journey. The overall response was far greater than what we could have hoped for; we would like to thank all those that participated. Seeing how much interest this topic generated, we decided to investigate further, in order to find out which factors can help in making print publishing successful.
5 key Factors for Successful Print Publishing Projects
Today’s digital world impacts every facet of our lives. Deloitte recently reported that approximately 50% of purchases are influenced by our digital environment. Often, companies have no idea how much savings can be generated through the production of printed catalogues that leverage pre-existing data sources. The research at www.pim-roi.com talks of several such examples. After looking back at many successful projects, Michael and his team realized the potential to generate substantial savings when the focus is to
optimize “time to market.” (If, of course, business teams operate asynchronously!)
For this new blog entry, we interviewed Michael Giesen, IT Consultancy and Project Management at Laudert to get his thoughts and opinion on the key factors behind the success of print publishing projects. We asked Michael to share his experience and thoughts on the leading factors in running successful print publishing projects. Furthermore, Michael also provides insight on which steps to prioritize and which pitfalls to avoid at all costs, in order to ensure the best results.
1. Publication Analysis
How are objects in print (like products) structured today? What about individual topics and design of creative pages? How is the placement of tables, headings, prices and images organized nowadays? Are there standards? If so, what can be standardized and how? To get an overall picture, you have to thoroughly examine these points. You must do so for all the content elements involved in the layout, ensuring that, in the future, they can be used for Dynamic Publishing. It is conceivable that individual elements, such as titles or pages used in subject areas, could be excluded and reused in separate projects. Gaining the ability to automate catalog creation potentially requires to compromise in certain areas. We shall discuss this later. In the future, product information will probably be presented with very little need to apply changes, 4 instead of 24 table types, for example. Great, now we are on the right path!
2. Data Source Analysis
Where is the data used in today’s printed material being sourced from? If possible or needed, are there several data sources that require to be combined? How is pricing handled? What about product attributes or the structure of product description tables in the case of an individual item? Is all the marketing content and subsequent variations included as well? What about numerous product images or multiple languages? What about seasonally adjusted texts that pull from external sources?
This case requires a very detailed analysis, leading us to the following question:
What is the role and the value of storing product information using a standardized method in print publishing?
The benefits of utilizing such processes should be clear by now: The more standards are in place, the greater the amount of time you will save and the greater your ability to generate positive ROI. Companies that currently operate with complex systems supporting well-structured data are already ahead in the game. Furthermore, yielding positive results doesn’t necessarily require them to start from scratch and rebuild from the ground up. As a matter of fact, companies that have already invested in database systems (E.g. MSSQL) can leverage their existing infrastructures.
3. Process Analysis
In this section of our analysis, we will be getting right down to the details: What does the production process look like, from the initial layout phase to the final release process? Who is responsible for the “how? Who maintains the linear progression? Who has the responsibilities and release rights? Lastly, where are the bottlenecks? Are there safeguards mechanisms in place? Once all these roles and processes have been put in place and have received the right resources we can advance to the next step of our analysis. You are ready to tackle the next key factor: Implementation.
Here you should be adventurous, creative and open minded, seeing that compromise might be needed. If your existing data sources do not meet the requirements, a solution must be found! A certain technical creative pragmatism may facilitate the short and medium planning (see point 2). You must extract and prepare your data sources for printed medium, such as a catalog, for example. The priint:suite of WERK II has proven itself as a robust all-round solution for Database Publishing and Web2Print. All-inclusive PIM solutions, such as Informatica PIM, already has a standard interface to priint:suite available. Depending on the specific requirements, an important decision must then be made: Is there a need for an InDesign Server? Simply put, it enables the fully automatic production of large-volume objects and offers accurate data preview. While slightly less featured, the use of WERK II PDF renderers offers similar functionalities but at a significantly more affordable price.
Based on the software and interfaces selected, an optimized process which supports your system can be developed and be structured to be fully automated if needed.
For individual groups of goods, templates can be defined, placeholders and page layouts developed. Production can start!
5. Selecting an Implementation Partner
In order to facilitate a smooth transition from day one, the support of a partner to carry out the implementation should be considered from the beginning. Since not only technology, but more importantly practical expertise provides maximum process efficiency, it is recommended that you inquire about a potential partner’s references. Getting insight from existing customers will provide you with feedback about their experience and successes. Any potential partner will be pleased to put you in touch with their existing customers.
What are Your Key Factors for Successful Print Publishing?
I would like to know what your thoughts are on this topic. Has anyone tried PDF renderers other than WERK II, such as Codeware’s XActuell? Furthermore, if there are any other factors you think are important in managing successful print publishing, feel free to mention them in the comments here. I’d be happy to discuss here or on twitter at @nicholasgoupil.
Key findings from the report include:
- 65% of organizations cite data processing and integration as hampering distribution capability, with nearly half claiming their existing software and ERP is not suitable for distribution.
- Nearly two-thirds of enterprises have some form of distribution process, involving products or services.
- More than 80% of organizations have at least some problem with product or service distribution.
- More than 50% of CIOs in organizations with distribution processes believe better distribution would increase revenue and optimize business processes, with a further 38% citing reduced operating costs.
The core findings: “With better data integration comes better automation and decision making.”
This report is one of many I’ve seen over the years that come to the same conclusion. Most of those involved with the operations of the business don’t have access to key data points they need, thus they can’t automate tactical decisions, and also cannot “mine” the data, in terms of understanding the true state of the business.
The more businesses deal with building and moving products, the more data integration becomes an imperative value. As stated in this survey, as well as others, the large majority cite “data processing and integration as hampering distribution capabilities.”
Of course, these issues goes well beyond Australia. Most enterprises I’ve dealt with have some gap between the need to share key business data to support business processes, and decision support, and what current exists in terms of data integration capabilities.
The focus here is on the multiple values that data integration can bring. This includes:
- The ability to track everything as it moves from manufacturing, to inventory, to distribution, and beyond. You to bind these to core business processes, such as automatic reordering of parts to make more products, to fill inventory.
- The ability to see into the past, and to see into the future. The emerging approaches to predictive analytics allow businesses to finally see into the future. Also, to see what went truly right and truly wrong in the past.
While data integration technology has been around for decades, most businesses that both manufacture and distribute products have not taken full advantage of this technology. The reasons range from perceptions around affordability, to the skills required to maintain the data integration flow. However, the truth is that you really can’t afford to ignore data integration technology any longer. It’s time to create and deploy a data integration strategy, using the right technology.
This survey is just an instance of a pattern. Data integration was considered optional in the past. With today’s emerging notions around the strategic use of data, clearly, it’s no longer an option.
A growing number of Data Scientists believe so.
If you recall the Cholera outbreak of Haiti in 2010 after the tragic earthquake, a joint research team from Karolinska Institute in Sweden and Columbia University in the US analyzed calling data from two million mobile phones on the Digicel Haiti network. This enabled the United Nations and other humanitarian agencies to understand population movements during the relief operations and during the subsequent cholera outbreak. They could allocate resources more efficiently and identify areas at increased risk of new cholera outbreaks.
Mobile phones, widely owned even in the poorest countries in Africa. Cell phones are also a rich source of data irrespective of which region where other reliable sources are sorely lacking. Senegal’s Orange Telecom provided Flowminder, a Swedish non-profit organization, with anonymized voice and text data from 150,000 mobile phones. Using this data, Flowminder drew up detailed maps of typical population movements in the region.
Today, authorities use this information to evaluate the best places to set up treatment centers, check-posts, and issue travel advisories in an attempt to contain the spread of the disease.
The first drawback is that this data is historic. Authorities really need to be able to map movements in real time especially since people’s movements tend to change during an epidemic.
The second drawback is, the scope of data provided by Orange Telecom is limited to a small region of West Africa.
Here is my recommendation to the Centers for Disease Control and Prevention (CDC):
- Increase the area for data collection to the entire region of Western Africa which covers over 2.1 million cell-phone subscribers.
- Collect mobile phone mast activity data to pinpoint where calls to helplines are mostly coming from, draw population heat maps, and population movement. A sharp increase in calls to a helpline is usually an early indicator of an outbreak.
- Overlay this data over censuses data to build up a richer picture.
The most positive impact we can have is to help emergency relief organizations and governments anticipate how a disease is likely to spread. Until now, they had to rely on anecdotal information, on-the-ground surveys, police, and hospital reports.
Informatica Cloud Powers a New Era in Cloud Analytics with Salesforce Wave Analytics Cloud at Dreamforce 2014
We are halfway through Dreamforce and it’s been an eventful and awesome couple of days so far. The biggest launch by far was the announcement of Wave, the Salesforce Analytics Cloud, Salesforce’s new entry into Cloud analytics and business intelligence. Informatica has been the integration leader for enterprise analytics for 20 years, and our leadership continues with Cloud analytics, as our Informatica Cloud portfolio is the only solution that Completes Salesforce Analytics Cloud for Big Data, fully enabling companies to use Salesforce Analytics Cloud to understand their customers like never before. But don’t take our word for it, view the Analytics Cloud Keynote from Dreamforce 2014, and see Alex Dayon uniquely call out Informatica as their key integration partner during his keynote.
The Informatica Cloud Portfolio delivers a broad set of analytics-centric services for the Salesforce Analytics Cloud, including bulk and real time application integration, data integration, data preparation, test data management, data quality and master data management (MDM) services. The portfolio is designed for high volume data sets from transactional applications such as SAP, cloud applications like Workday and new data sources such as Hadoop, Microsoft Azure and Amazon Web Services.
We have a great booth in the Analytics Zone, Moscone West, 3rd floor, where you can see demos of Informatica Cloud for Salesforce Wave Analytics and get lots more details from product experts.
And, you don’t need to wait till Dreamforce is over to try out Informatica Cloud for Salesforce Analytics. The free trial of Informatica Cloud, including Springbok, for Salesforce Analytics Cloud is available now. Trial users have unlimited usage of Informatica Cloud capabilities for Salesforce Analytics Cloud for 60 days, free of charge.
Aside from new product launches, and tons of partner activities going on, we’ve also got some great customers speaking at DF. Today, we have a great session on “Get Closer to Your Customers Using Agile Data Management with Salesforce” with executive speakers from BT, Dolby and Travel Corporation explaining how they achieve customer insight with use cases ranging from integrating 9 Salesforce orgs into a single business dashboard to unifying 30+ acquired travel brands into a single customer view.
On Monday, we had Qualcomm and Warranty Group present how their companies have moved to the Cloud using Salesforce and Informatica Cloud to meet the agility needs of their businesses while simultaneously resolving the challenges of data scaling, organization complexity and evolving technology strategy to make it all happen.
Drop by our main booth in Moscone North, N1216 to see live demos showcasing solutions for Customer Centricity, Salesforce Data Lifecycle and Analytics Cloud. If you want a preview of our Informatica Cloud solutions for the Salesforce ecosystem, click here.
During Dreamforce, we also announced a significant milestone for Informatica Cloud, which now processes over 100 Billion transactions per month, on behalf of our 3,000+ joint customers with Salesforce.
Oh, and one more thing we announced at DF: the Informatica Cloud Data Wizard, our next-generation data loader for Salesforce, that delivers a beautifully simple user experience, natively inside Salesforce for non-technical business analysts and admins to easily bring external data into Salesforce with a one-touch UI, really!
For more information on how you can connect with Informatica at Dreamforce 2014, get all the details at informaticacloud.com/dreamforce
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part II
- Do you need to protect data at rest (in storage), during transmission, and/or when accessed?
- Do some privileged users still need the ability to view the original sensitive data or does sensitive data need to be obfuscated at all levels?
- What is the granularity of controls that you need?
- Datafile level
- Table level
- Row level
- Field / column level
- Cell level
- Do you need to be able to control viewing vs. modification of sensitive data?
- Do you need to maintain the original characteristics / format of the data (e.g. for testing, demo, development purposes)?
- Is response time latency / performance of high importance for the application? This can be the case for mission critical production applications that need to maintain response times in the order of seconds or sub-seconds.
In order to help you determine which method of control is appropriate for your requirements, the following table provides a comparison of the different methods and their characteristics.
A combination of protection method may be appropriate based on your requirements. For example, to protect data in non-production environments, you may want to use persistent data masking to ensure that no one has access to the original production data, since they don’t need to. This is especially true if your development and testing is outsourced to third parties. In addition, persistent data masking allows you to maintain the original characteristics of the data to ensure test data quality.
In production environments, you may want to use a combination of encryption and dynamic data masking. This is the case if you would like to ensure that all data at rest is protected against unauthorized users, yet you need to protect sensitive fields only for certain sets of authorized or privileged users, but the rest of your users should be able to view the data in the clear.
The best method or combination of methods will depend on each scenario and set of requirements for your environment and organization. As with any technology and solution, there is no one size fits all.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part I
- Which types of data should be protected?
- Which data should be classified as “sensitive?”
- Where is this sensitive data located?
- Which groups of users should have access to this data?
Because these questions come up frequently, it seems ideal to share a few guidelines on this topic.
When protecting the confidentiality and integrity of data, the first level of defense is Authentication and access control. However, data with higher levels of sensitivity or confidentiality may require additional levels of protection, beyond regular authentication and authorization methods.
There are a number of control methods for securing sensitive data available in the market today, including:
- Persistent (Static) Data Masking
- Dynamic Data Masking
- Retention management and purging
Encryption is a cryptographic method of encoding data. There are generally, two methods of encryption: symmetric (using single secret key) and asymmetric (using public and private keys). Although there are methods of deciphering encrypted information without possessing the key, a good encryption algorithm makes it very difficult to decode the encrypted data without knowledge of the key. Key management is usually a key concern with this method of control. Encryption is ideal for mass protection of data (e.g. an entire data file, table, partition, etc.) against unauthorized users.
Persistent or static data masking obfuscates data at rest in storage. There is usually no way to retrieve the original data – the data is permanently masked. There are multiple techniques for masking data, including: shuffling, substitution, aging, encryption, domain-specific masking (e.g. email address, IP address, credit card, etc.), dictionary lookup, randomization, etc. Depending on the technique, there may be ways to perform reverse masking - this should be used sparingly. Persistent masking is ideal for cases where all users should not see the original sensitive data (e.g. for test / development environments) and field level data protection is required.
Dynamic data masking de-identifies data when it is accessed. The original data is still stored in the database. Dynamic data masking (DDM) acts as a proxy between the application and database and rewrites the user / application request against the database depending on whether the user has the privilege to view the data or not. If the requested data is not sensitive or the user is a privileged user who has the permission to access the sensitive data, then the DDM proxy passes the request to the database without modification, and the result set is returned to the user in the clear. If the data is sensitive and the user does not have the privilege to view the data, then the DDM proxy rewrites the request to include a masking function and passes the request to the database to execute. The result is returned to the user with the sensitive data masked. Dynamic data masking is ideal for protecting sensitive fields in production systems where application changes are difficult or disruptive to implement and performance / response time is of high importance.
Tokenization substitutes a sensitive data element with a non-sensitive data element or token. The first generation tokenization system requires a token server and a database to store the original sensitive data. The mapping from the clear text to the token makes it very difficult to reverse the token back to the original data without the token system. The existence of a token server and database storing the original sensitive data renders the token server and mapping database as a potential point of security vulnerability, bottleneck for scalability, and single point of failure. Next generation tokenization systems have addressed these weaknesses. However, tokenization does require changes to the application layer to tokenize and detokenize when the sensitive data is accessed. Tokenization can be used in production systems to protect sensitive data at rest in the database store, when changes to the application layer can be made relatively easily to perform the tokenization / detokenization operations.
Retention management and purging is more of a data management method to ensure that data is retained only as long as necessary. The best method of reducing data privacy risk is to eliminate the sensitive data. Therefore, appropriate retention, archiving, and purging policies should be applied to reduce the privacy and legal risks of holding on to sensitive data for too long. Retention management and purging is a data management best practices that should always be put to use.
With that said, the basic approaches to consider are from the top-down, or the bottom-up. You can be successful with either approach. However, there are certain efficiencies you’ll gain with a specific choice, and it could significantly reduce the risk and cost. Let’s explore the pros and cons of each approach.
Approaching data integration from the top-down means moving from the high level integration flows, down to the data semantics. Thus, you an approach, perhaps even a tool-set (using requirements), and then define the flows that are decomposed down to the raw data.
The advantages of this approach include:
The ability to spend time defining the higher levels of abstraction without being limited by the underlying integration details. This typically means that those charged with designing the integration flows are more concerned with how they have to deal with the underlying source and target, and this approach means that they don’t have to deal with that issue until later, as they break down the flows.
The disadvantages of this approach include:
The data integration architect does not consider the specific needs of the source or target systems, in many instances, and thus some rework around the higher level flows may have to occur later. That causes inefficiencies, and could add risk and cost to the final design and implementation.
For the most part, this is the approach that most choose for data integration. Indeed, I use this approach about 75 percent of the time. The process is to start from the native data in the sources and targets, and work your way up to the integration flows. This typically means that those charged with designing the integration flows are more concerned with the underlying data semantic mediation than the flows.
The advantages of this approach include:
It’s typically a more natural and traditional way of approaching data integration. Called “data-driven” integration design in many circles, this initially deals with the details, so by the time you get up to the integration flows there are few surprises, and there’s not much rework to be done. It’s a bit less risky and less expensive, in most cases.
The disadvantages of this approach include:
Starting with the details means that you could get so involved in the details that you miss the larger picture, and the end state of your architecture appears to be poorly planned, when all is said and done. Of course, that depends on the types of data integration problems you’re looking to solve.
No matter which approach you leverage, with some planning and some strategic thinking, you’ll be fine. However, there are different paths to the same destination, and some paths are longer and less efficient than others. As you pick an approach, learn as you go, and adjust as needed.