Category Archives: Enterprise Data Management
With the increasing importance of enterprise analytics, the question becomes who should own the analytics and data agenda. This question really matters today because, according to Thomas Davenport, “business processes are among the last remaining points of differentiation.” For this reason, Davenport even suggests that businesses that create a sustainable right to win use analytics to “wring every last drop of value from their processes”.
The CFO is the logical choice?
In talking with CIOs about both enterprise analytics and data, they are clear that they do not want to become their company’s data steward. They insist instead that they want to be an enabler of the analytics and data function. So what business function then should own enterprise analytics and data? Last week an interesting answer came from a CFO Magazine Article by Frank Friedman. Frank contends that CFOs are “the logical choice to own analytics and put them to work to serve the organization’s needs”.
To justify his position, Frank made the following claims:
- CFOs own most of the unprecedented quantities of data that businesses create from supply chains, product processes, and customer interactions
- Many CFOs already use analytics to address their organization’s strategic issues
- CFOs uniquely can act as a steward of value and an impartial guardian of truth across the organizations. This fact gives them the credibility and trust needed when analytics produce insights that effectively debunk currently accepted wisdom
Frank contends as well that owning the analytics agenda is a good thing because it allows CFOs to expand their strategic leadership role in doing the following:
- Growing top line revenue
- Strengthening their business ties
- Expanding the CFO’s influence outside the finance function.
Frank suggests as well that analytics empowers the CFO to exercise more centralized control of operational business decision making. The question is what do other CFOs think about Frank’s position?
CFOs clearly have an opinion about enterprise analytics and data
A major Retail CFO says that finance needs to own “the facts for the organization”—the metrics and KPIs. And while he honestly admits that finance organizations in the past have not used data well, he claims finance departments need to make the time to become truly data centric. He said “I do not consider myself a data expert, but finance needs to own enterprise data and the integrity of this data”. This CFO claims as well that “finance needs to use data to make sure that resources are focused on the right things; decisions are based on facts; and metrics are simple and understandable”. A Food and Beverage CFO agrees with the Retail CFO by saying that almost every piece of data is financial in one way or another. CFOs need to manage all of this data since they own operational performance for the enterprise. CFOs should own the key performance indicators of the business.
CIOs should own data, data interconnect, and system selection
A Healthcare CFO said he wants, however, the CIO to own data systems, data interconnect, and system selection. However, he believes that the finance organization is the recipient of data. “CFOs have a major stake in data. CFOs need to dig into operational data to be able to relate operations to internal accounting and to analyze things like costs versus price”. He said that “the CFOs can’t function without good operational data”.
An Accounting Firm CFO agreed with the Healthcare CFO by saying that CIOs are a means to get data. She said that CFOs need to make sense out of data in their performance management role. CFOs, therefore, are big consumers of both business intelligence and analytics. An Insurance CFO concurred by saying CIOs should own how data is delivered.
CFOs should be data validators
The Insurance CFOs said, however, CFOs need to be validators of data and reports. They should, as a result, in his opinion be very knowledgeable on BI and Analytics. In other words, CFOs need to be the Underwriters Laboratory (UL) for corporate data.
Now it is your chance
So the question is what do you believe? Does the CFO own analytics, data, and data quality as a part of their operational performance role? Or is it a group of people within the organization? Please share your opinions below.
Solution Brief: The Intelligent Data Platform
CFOs Move to Chief Profitability Officer
CFOs Discuss Their Technology Priorities
The CFO Viewpoint upon Data
How CFOs can change the conversation with their CIO?
New type of CFO represents a potent CIO ally
Competing on Analytics
The Business Case for Better Data Connectivity
ERP systems were a true competitive advantage 20+ years ago, but not so today. ERP systems are a tool that gave people the best view into their business, but that is when there really were only ERP systems and Databases, but today that critical data resides in so many other areas. There are several reasons why ERP systems act as a data trap: technical factors, out of date management theory, and big data trends. First, let’s talk about management theory.
There are two fundamental concepts that have been driving much of the strategic planning in modern organizations in recent decades. The idea of economies of scale is deeply embedded in our thinking. The concept was first introduced by Adam Smith in the 18th century and reinforced throughout the 20th century by contemporaries such as Bruce Henderson. In 1968 Henderson wrote “”Costs characteristically decline by 20-30% in real terms each time accumulated experience doubles.“ The basic idea is that bigger is better. (more…)
The title of this article may seem counterintuitive, but the reality is that the business doesn’t care about data. They care about their business processes and outcomes that generate real value for the organization. All IT professionals know there is huge value in quality data and in having it integrated and consistent across the enterprise. The challenge is how to prove the business value of data if the business doesn’t care about it. (more…)
Every fall Informatica sales leadership puts together its strategy for the following year. The revenue target is typically a function of the number of sellers, the addressable market size and key accounts in a given territory, average spend and conversion rate given prior years’ experience, etc. This straight forward math has not changed in probably decades, but it assumes that the underlying data are 100% correct. This data includes:
- Number of accounts with a decision-making location in a territory
- Related IT spend and prioritization
- Organizational characteristics like legal ownership, industry code, credit score, annual report figures, etc.
- Key contacts, roles and sentiment
- Prior interaction (campaign response, etc.) and transaction (quotes, orders, payments, products, etc.) history with the firm
Every organization, no matter if it is a life insurer, a pharmaceutical manufacturer, a fashion retailer or a construction company knows this math and plans on getting somewhere above 85% achievement of the resulting target. Office locations, support infrastructure spend, compensation and hiring plans are based on this and communicated.
So why is it that when it is an open secret that the underlying data is far from perfect (accurate, current and useful) and corrupts outcomes, too few believe that fixing it has any revenue impact? After all, we are not projecting the climate for the next hundred years here with a thousand plus variables.
If corporate hierarchies are incorrect, your spend projections based on incorrect territory targets, credit terms and discount strategy will be off. If every client touch point does not have a complete picture of cross-departmental purchases and campaign responses, your customer acquisition cost will be too high as you will contact the wrong prospects with irrelevant offers. If billing, tax or product codes are incorrect, your billing will be off. This is a classic telecommunication example worth millions every month. If your equipment location and configuration is wrong, maintenance schedules will be incorrect and every hour of production interruption will cost an industrial manufacturer of wood pellets or oil millions.
Also, if industry leaders enjoy an upsell ratio of 17%, and you experience 3%, data (assuming you have no formal upsell policy as it violates your independent middleman relationship) data will have a lot to do with it.
The challenge is not the fact that data can create revenue improvements but how much given the other factors: people and process.
Every industry laggard can identify a few FTEs who spend 25% of their time putting one-off data repositories together for some compliance, M&A customer or marketing analytics. Organic revenue growth from net-new or previously unrealized revenue is what the focus of any data management initiative should be. Don’t get me wrong; purposeful recruitment (people), comp plans and training (processes) are important as well. Few people doubt that people and process drives revenue growth. However, few believe data being fed into these processes has an impact.
This is a head scratcher for me. An IT manager at a US upstream oil firm once told me that it would be ludicrous to think data has a revenue impact. They just fixed data because it is important so his consumers would know where all the wells are and which ones made a good profit. Isn’t that assuming data drives production revenue? (Rhetorical question)
A CFO at a smaller retail bank said during a call that his account managers know their clients’ needs and history. There is nothing more good data can add in terms of value. And this happened after twenty other folks at his bank including his own team delivered more than ten use cases, of which three were based on revenue.
Hard cost (materials and FTE) reduction is easy, cost avoidance a leap of faith to a degree but revenue is not any less concrete; otherwise, why not just throw the dice and see how the revenue will look like next year without a central customer database? Let every department have each account executive get their own data, structure it the way they want and put it on paper and make hard copies for distribution to HQ. This is not about paper versus electronic but the inability to reconcile data from many sources on paper, which is a step above electronic.
Have you ever heard of any organization move back to the Fifties and compete today? That would be a fun exercise. Thoughts, suggestions – I would be glad to hear them?
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part II
- Do you need to protect data at rest (in storage), during transmission, and/or when accessed?
- Do some privileged users still need the ability to view the original sensitive data or does sensitive data need to be obfuscated at all levels?
- What is the granularity of controls that you need?
- Datafile level
- Table level
- Row level
- Field / column level
- Cell level
- Do you need to be able to control viewing vs. modification of sensitive data?
- Do you need to maintain the original characteristics / format of the data (e.g. for testing, demo, development purposes)?
- Is response time latency / performance of high importance for the application? This can be the case for mission critical production applications that need to maintain response times in the order of seconds or sub-seconds.
In order to help you determine which method of control is appropriate for your requirements, the following table provides a comparison of the different methods and their characteristics.
A combination of protection method may be appropriate based on your requirements. For example, to protect data in non-production environments, you may want to use persistent data masking to ensure that no one has access to the original production data, since they don’t need to. This is especially true if your development and testing is outsourced to third parties. In addition, persistent data masking allows you to maintain the original characteristics of the data to ensure test data quality.
In production environments, you may want to use a combination of encryption and dynamic data masking. This is the case if you would like to ensure that all data at rest is protected against unauthorized users, yet you need to protect sensitive fields only for certain sets of authorized or privileged users, but the rest of your users should be able to view the data in the clear.
The best method or combination of methods will depend on each scenario and set of requirements for your environment and organization. As with any technology and solution, there is no one size fits all.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part I
- Which types of data should be protected?
- Which data should be classified as “sensitive?”
- Where is this sensitive data located?
- Which groups of users should have access to this data?
Because these questions come up frequently, it seems ideal to share a few guidelines on this topic.
When protecting the confidentiality and integrity of data, the first level of defense is Authentication and access control. However, data with higher levels of sensitivity or confidentiality may require additional levels of protection, beyond regular authentication and authorization methods.
There are a number of control methods for securing sensitive data available in the market today, including:
- Persistent (Static) Data Masking
- Dynamic Data Masking
- Retention management and purging
Encryption is a cryptographic method of encoding data. There are generally, two methods of encryption: symmetric (using single secret key) and asymmetric (using public and private keys). Although there are methods of deciphering encrypted information without possessing the key, a good encryption algorithm makes it very difficult to decode the encrypted data without knowledge of the key. Key management is usually a key concern with this method of control. Encryption is ideal for mass protection of data (e.g. an entire data file, table, partition, etc.) against unauthorized users.
Persistent or static data masking obfuscates data at rest in storage. There is usually no way to retrieve the original data – the data is permanently masked. There are multiple techniques for masking data, including: shuffling, substitution, aging, encryption, domain-specific masking (e.g. email address, IP address, credit card, etc.), dictionary lookup, randomization, etc. Depending on the technique, there may be ways to perform reverse masking - this should be used sparingly. Persistent masking is ideal for cases where all users should not see the original sensitive data (e.g. for test / development environments) and field level data protection is required.
Dynamic data masking de-identifies data when it is accessed. The original data is still stored in the database. Dynamic data masking (DDM) acts as a proxy between the application and database and rewrites the user / application request against the database depending on whether the user has the privilege to view the data or not. If the requested data is not sensitive or the user is a privileged user who has the permission to access the sensitive data, then the DDM proxy passes the request to the database without modification, and the result set is returned to the user in the clear. If the data is sensitive and the user does not have the privilege to view the data, then the DDM proxy rewrites the request to include a masking function and passes the request to the database to execute. The result is returned to the user with the sensitive data masked. Dynamic data masking is ideal for protecting sensitive fields in production systems where application changes are difficult or disruptive to implement and performance / response time is of high importance.
Tokenization substitutes a sensitive data element with a non-sensitive data element or token. The first generation tokenization system requires a token server and a database to store the original sensitive data. The mapping from the clear text to the token makes it very difficult to reverse the token back to the original data without the token system. The existence of a token server and database storing the original sensitive data renders the token server and mapping database as a potential point of security vulnerability, bottleneck for scalability, and single point of failure. Next generation tokenization systems have addressed these weaknesses. However, tokenization does require changes to the application layer to tokenize and detokenize when the sensitive data is accessed. Tokenization can be used in production systems to protect sensitive data at rest in the database store, when changes to the application layer can be made relatively easily to perform the tokenization / detokenization operations.
Retention management and purging is more of a data management method to ensure that data is retained only as long as necessary. The best method of reducing data privacy risk is to eliminate the sensitive data. Therefore, appropriate retention, archiving, and purging policies should be applied to reduce the privacy and legal risks of holding on to sensitive data for too long. Retention management and purging is a data management best practices that should always be put to use.
In 2012, Forbes published an article predicting an upcoming problem.
The Need for Scalable Enterprise Analytics
Specifically, increased exploration in Big Data opportunities would place pressure on the typical corporate infrastructure. The generic hardware used to run most tech industry enterprise applications was not designed to handle real-time data processing. As a result, the explosion of mobile usages, and the proliferation of social networks, was increasing the strain on the system. Most companies now faced real-time processing requirements beyond what the traditional model was designed to handle.
In the past two years, the volume of data and speed of data growth has grown significantly. As a result, the problem has become more severe. It is now clear that these challenges can’t be overcome by simply doubling or tripling their IT spending on infrastructure sprawl. Today, enterprises seek consolidated solutions that offer scalability, performance and ease of administration. The present need is for scalable enterprise analytics.
A Clear Solution Is Available
Informatica PowerCenter and Data Quality is the market leading data integration and data quality platform. This platform has now been certified by Oracle as an optimal solution for both the Oracle Exadata Database Machine and the Oracle SuperCluster.
As the high-speed on-ramp for data into Oracle Exadata, PowerCenter and Data Quality deliver up-to five times faster performance on data load, query, profiling and cleansing tasks. Informatica’s data integration customers can now easily reuse data integration code, skills and resources to access and transform any data from any data source and load it into Exadata, with the highest throughput and scalability.
Customers adopting Oracle Exadata for high-volume, high-speed analytics can now be confident with Informatica PowerCenter and Data Quality. With these products, they can ingest, cleanse and transform all types of data into Exadata with the highest performance and scale required to maximize the value of their Exadata investment.
Proving the Value of Scalable Enterprise Analytics
In order to demonstrate the efficacy of their partnership, the two companies worked together on a Proof Of Value (POV) project. The goal is to prove that using PowerCenter with Exadata would improve both performance and scalability. The project involved PowerCenter and Data Quality 9.6.1 and x4-2 Exadata Machine. Oracle 11g was considered for both standard Oracle and Exadata versions.
The first test conducted a 1TB load test to Exadata and standard Oracle in a typical PowerCenter use case. The second test consisted of querying 1TB profiling warehouse database in Data Quality use case scenario. Performance data was collected for both tests. The scalability factor was also captured. A variant of the TPCH dataset was used to generate the test data. The results were significantly higher than prior Exabyte 1TB test. In particular:
- The data query tests achieved 5x performance.
- The data load tests achieved a 3x-5x speed increase.
- Linear scalability was achieved with read/write tests on Exadata.
What Business Benefits Could You Expect?
Informatica PowerCenter and Data Quality, along-with Oracle Exadata, now provide the best-of-breed combination of software and hardware, optimized to deliver the highest possible total system performance. These comprehensive tools drive agile reporting and analytics, while empowering IT organizations to meet SLAs and quality goals like never before.
- Extend Oracle Exadata’s access to even more business critical data sources. Utilize optimized out-of-the-box Informatica connectivity to easily access hundreds of data sources, including all the major databases, on-premise and cloud applications, mainframe, social data and Hadoop.
- Get more data, more quickly into Oracle Exadata. Move higher volumes of trusted data quickly into Exadata to support timely reporting with up-to-date information (i.e. up to 5x performance improvement compared to Oracle database).
- Centralize management and improve insight into large scale data warehouses. Deliver the necessary insights to stakeholders with intuitive data lineage and a collaborative business glossary. Contribute to high quality business analytics, in a timely manner across the enterprise.
- Instantly re-direct workloads and resources to Oracle Exadata without compromising performance. Leverage existing code and programming skills to execute high-performance data integration directly on Exadata by performing push down optimization.
- Roll-out data integration projects faster and more cost-effectively. Customers can now leverage thousands of Informatica certified developers to execute existing data integration and quality transformations directly on Oracle Exadata, without any additional coding.
- Efficiently scale-up and scale-out. Customers can now maximize performance and lower the costs of data integration and quality operations of any scale by performing Informatica workload and push down optimization on Oracle Exadata.
- Save significant costs involved in administration and expansion. Customers can now easily and economically manage large-scale analytics data warehousing environments with a single point of administration and control, and consolidate a multitude of servers on one rack.
- Reduce risk. Customers can now leverage Informatica’s data integration and quality platform to overcome the typical performance and scalability limitations seen in databases and data storage systems. This will help reduce quality-of-service risks as data volumes rise.
Oracle Exadata is a well-engineered system that offers customers out-of-box scalability and performance on demand. Informatica PowerCenter and Data Quality are optimized to run on Exadata, offering customers business benefits that speed up data integration and data quality tasks like never before. Informatica’s certified, optimized, and purpose-built solutions for Oracle can help you enable more timely and trustworthy reporting. You can now benefit from Informatica’s optimized solutions for Oracle Exadata to make better business decisions by unlocking the full potential of the most current and complete enterprise data available. As shown in our test results, you can attain up to 5x performance by scaling Exadata. Informatica Data Quality customers can perform profiling 1TB datasets, which is unheard before. We urge you to deploy the combined solution to solve your data integration and quality problems today while achieving high speed business analytics in these days of big data exploration and Internet Of Things.
Listen to what Ash Kulkarni, SVP, at OOW14 has to say on how @InformaticaCORP PowerCenter and Data Quality certified by Oracle as optimized for Exadata can deliver up-to five times faster performance improvement on data load, query, profiling, cleansing and mastering tasks, for Exadata.
Do We Really Need Another Information Framework?
The EIM Consortium is a group of nine companies that formed this year with the mission to:
“Promote the adoption of Enterprise Information Management as a business function by establishing an open industry reference architecture in order to protect and optimize the business value derived from data assets.”
That sounds nice, but we do really need another framework for EIM or Data Governance? Yes we do, and here’s why. (more…)