Category Archives: Application ILM
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
Informatica announced, once again, that it is listed as a leader in the industry’s second Gartner Magic Quadrant for Data Masking Technology. With data security continuing to grow as one of the fastest segments in the enterprise software market, technologies such as data masking are becoming the solution of choice for data-centric security.
Increased fear of cyber-attacks and internal data breaches has made predictions that 2014 is the year of preventative and tactical measures to ensure corporate data assets are safe. Data masking should be included in those measures. According to Gartner,
“Security program managers need to take a strategic approach with tactical best-practice technology configurations in order to properly address the most common advanced targeted attack scenarios to increase both detection and prevention capabilities.”
Without these measures, the cost of an attack or breach is growing every year. The Ponemon Institute posted in a recent study:
“The 2013 Cost of Cyber Crime Study states that the average annualized cost of cybercrime incurred by a benchmark sample of US organizations was $11.56 million, nearly 78% more than the cost estimated in the first analysis conducted 4 years ago.”
Informatica believes that the best preventative measures include a layered approach for data security but without sacrificing agility or adding unnecessary costs. Data Masking delivers data-centric security with improved productivity and reduced overall costs.
Data Masking prevents internal data theft and abuse of sensitive data by hiding it from users. Data masking techniques include replacing some fields with similar-looking characters, masking characters (for example, “x”), substituting real last names with fictional last names and shuffling data within columns – to name a few. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization . Call it what you like, but without it – organizations may continue to expose sensitive data to those with mal intentions.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
This is the first in a series of articles where I will take an in-depth look at how state and local governments are affected by data breaches and what they should be considering as part of their compliance, risk-avoidance and remediation plans.
Each state has one or more agencies that are focused on the lives, physical and mental health and overall welfare of their citizens. The mission statement of the Department of Public Welfare of Pennsylvania, my home state is typical, it reads “Our vision is to see Pennsylvanians living safe, healthy and independent lives. Our mission is to improve the quality of life for Pennsylvania’s individuals and families. We promote opportunities for independence through services and supports while demonstrating accountability for taxpayer resources.”
Just as in the enterprise, over the last couple of decades the way an agency deals with citizens has changed dramatically. No longer is everything paper-based and manually intensive – each state has made enormous efforts not just to automate more and more of their processes but more lately to put everything online. The combination of these two factors has led to the situation where just about everything a state knows about each citizen is stored in numerous databases, data warehouses and of course accessed through the Web.
It’s interesting that in the PA mission statement two of the three focus areas are safety and health– I am sure when written these were meant in the physical sense. We now have to consider what each state is doing to safeguard and promote the digital safety and health of its citizens. You might ask what digital safety and health means – at the highest level this is quite straightforward – it means that each state must ensure the data it holds about its’ citizens is safe from inadvertent or deliberate exposure or disclosure. It seems that each week we read about another data breach – high profile data breach infographic - either accidental (a stolen laptop for instance) or deliberate (hacking as an example) losses of data about people – the citizens. Often that includes data contents that can be used to identify the individuals, and once an individual citizen is identified they are at risk of identity theft, credit card fraud or worse.
Of the 50 states, 46 now have a series of laws and regulations in place about when and how they need to report on data breaches or losses – this is all well and good, but is a bit like shutting the stable door after the horse has bolted – but with higher stakes as there are potentially dire consequences to the digital safety and health of their citizens.
In the next article I will look at the numerous areas that are often overlooked when states establish and execute their data protection and data privacy plans.
Certainly, it is easy to see how it would be preferable to manage a database that is 5 TB rather than 40 TB in size, particularly when it comes to critical tasks like backup and recovery, disaster recovery, off-site backups and historical analytics. Today, however, I want to focus on another benefit of Informatica Data Vault that is less obvious but still very important: data modeling flexibility for data warehouses and data marts.Informatica Data Vault permits organizations to keep a much greater amount of useful data accessible, without requiring compromises on SLAs, TCO and reporting performance. This in turn makes a variety of flexible data modeling options available.
The Physical Table Partitioning Model
The first of these new data modeling options is based on physical table partitioning. The largest tables in a data warehouse or data mart can be physically divided between an online component and the archive counterpart. This allows the existing data model to be maintained, while introducing a “right-sizing” concept where only the regularly accessed data is kept online, and all data that doesn’t require such an expensive and/or hard to manage environment is put into the Informatica Data Vault solution. A typical rule of thumb for defining partition boundaries for data warehouses is based on the 90-day aging principle, so that any static data older than 90 days is migrated from the online warehouse to the Informatica Data Vault repository.
Now, many forms of enterprise data, such as CDR, POS, Web, Proxy or Log data, are static by definition, and are furthermore usually the main sources of data warehouse growth. This is very good news, because it means that as soon as the data is captured, it can be moved to the Informatica Data Vault store (in fact, it is conceivable that this kind of data could be fed directly to Informatica Data Vault from the source system – but that is a topic for another post). Because of the large volumes involved, this kind of detail data has usually been aggregated at one or more levels in the enterprise data warehouse. Users generally query the summary table in order to identify trends, only drilling down into the details for a specific range of records when specific needs or opportunities are identified. This data access technique is well known, and has been in use for quite some time.
The Online Summary Table Model
This leads me to the second novel design option offered by Informatica Data Vault : the ability to store all static detail data in the archive store, and then use this as the basis for building online summary tables, with the ability to quickly drill to detail in the Informatica Data Vault when required. More specifically, the Informatica Data Vault can be used to feed the online system’s summary tables directly because the data structures and SQL access remain intact. The advantage of this implementation is that it substantially reduces the size of the online database, optimizes its performance, and permits trend analysis on even very long periods. This is particularly useful when looking for emerging trends (positive or negative) related to specific products or offerings, because it gives managers the chance to analyze and respond to issues and opportunities within a realistic time frame.
Some organizations are already building this type of data hierarchy, using Data Marts or analytic cubes fed by the main Data Warehouse. I call this kind of architecture “data pipelining”. Informatica Data Vault can play an important role in such an implementation, since its repository can be shared between all the analytic platforms. This not only reduces data duplication, management/operational overhead, and requirements for additional hardware and software, it also relieves pressure on batch windows and lowers the risk of data being out of synch. Furthermore, this implementation can assist organizations with data governance and Master Data Management while improving overall data quality.
The Just-In-Case Data Model
Another important data modeling option offered by Informatica Data Vault relates to what we can call “just-in-case” data. In many cases, certain kinds of data will also be maintained outside the warehouse just in case an analyst requires ad hoc access to it for a specific study. Sometimes, for convenience, this “exceptional” data is stored in the data warehouse. However, keeping this data in an expensive storage and software environment, or even storing it on tape or inexpensive disks as independent files, can create a data management nightmare. At the same time, studies demonstrate that a very large portion of the costs associated with ad hoc analysis are concentrated in the data preparation phase. As part of this phase, the analyst needs to “shop” for the just-in-case data to be analyzed, meaning that he or she needs to find, “slice”, clean, transform and use it to build a temporary analytic platform, sometimes known as an Exploration Warehouse or “Exploration Mart.
Informatica Data Vault can play a very important role in such a scenario. Just-in-case data can be stored in the archive store, and analysts can then query it directly using standard SQL-based front-end tools to extract, slice and prepare the data for analytic use. Since much less time is spent on data preparation, far more time is available for data analysis — and there is no impact on the performance of the main reporting system. This acceleration of the data preparation phase results from the availability of a central catalog describing all the available data. The archive repository can be used to directly feed the expert’s preferred analytic platform, generally resulting in a substantial improvement in analyst productivity. Analysts can focus on executing their analyses, and on bringing more value to the enterprise, rather than on struggling to get access to clean and reliable data.
Informatica announced yesterday the Informatica ILM Nearline product is SAP-certified. ILM Nearline helps IT organizations reduce costs of managing data growth in existing implementations of the SAP NetWeaver Business Warehouse (SAP NetWeaver BW) and SAP HANA. By doing so, customers can leverage freed budgets and resources to invest in its application landscape and data center modernization initiatives. Informatica ILM Nearline v6.1A for use with SAP NetWeaver BW and SAP HANA, available today, is purpose-built for SAP environments leveraging native SAP interfaces.
Data volumes are growing the fastest in data warehouse and reporting applications, yet a significant amount of it is rarely used or infrequently accessed. In deployments of SAP NetWeaver BW, standard SAP archiving can reduce the size of a production data warehouse database to help preserve its performance, but if users ever want to query or manipulate the archived data, the data needs to be loaded back into the production system disrupting data analytics processes and extending time to insight. The same holds true for SAP HANA.
To address this, ILM Nearline enables IT to migrate large volumes of largely inactive SAP NetWeaver BW or SAP HANA data from the production database or in memory store to online, secure, highly compressed, immutable files in a near-line system while maintaining end-user access. The result is a controlled environment running SAP NetWeaver BW or SAP HANA with predictable, ongoing hardware, software and maintenance costs. This helps ensure service-level agreements (SLAs) can be met while freeing up ongoing budget and resources so IT can focus on innovation.
Informatic ILM Nearline for use with SAP NetWeaver BW and SAP HANA has been certified with the following interfaces:
- NW-BW-NLS Nearline Storage SAP NetWeaver BW 7.30 on SAP HANA for Informatica Data Archive 6.1A
- NW-BW-NLS 7.30 – Nearline Storage – SAP NetWeaver BW 7.30 for Informatica Data Archive 6.1A
- BC-HCS 6.20 – HTTP Content Server 6.20 for Interface for Informatica Data Archive 6.1
“Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is all about reducing the costs of data while keeping the data easily accessible and thus valuable,” said Adam Wilson, general manager, ILM, Informatica. “As data volumes continue to soar, the solution is especially game-changing for organizations implementing SAP HANA as they can use the Informatica-enabled savings to help offset and control the costs of their SAP HANA licenses without disrupting the current SAP NetWeaver BW users’ access to the data.”
Specific advantages of Informatica ILM Nearline include:
- Industry-leading compression rates – Informatica ILM Nearline’s compression rates exceed standard database compression rates by a sizable margin. Customers typically achieve rates in excess of 90 percent, and some have reported rates as high as 98 percent.
- Easy administration and data access – No database administration is required for data archived by Informatica ILM Nearline. Data is accessible from the user’s standard SAP application screen without any IT interventions and is efficiently stored to simplify backup, restore and data replication processes.
- Limitless capacity – Highly scalable, the solution is designed to store limitless amounts of data without affecting data access performance.
- Easy storage tiering – As data is stored in a highly compressed format, the nearline archive can be easily migrated from one storage location to another in support of a tiered storage strategy.
Available now, Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is based on intellectual property acquired from Sand Technology in Q4 2011 and enhanced by Informatica.
 Informatica Survey Results, January 23, 2013 (citation from Enterprise Data Archive for Hybrid IT Webinar)
The Oracle Application User Group (OAUG) Archive and Purge Special Interest Group (SIG) held its semi-annual session first thing in the morning, Sunday September 22, 2013 – 8:00am. The chairman of the SIG, Brian Bent, must have lost in the drawing straws contest for session times. Regardless, attendance was incredibly strong and the topic, ‘Cleaning up your Oracle E-Business Suite Mess’, was well received.
From the initial audience survey, most attendees have made the jump to OEBS R12 and very few have implemented an Information Lifecycle Management (ILM) strategy. As organizations migrate to the latest version, the rate of data growth increases significantly such that performance takes a plunge, costs for infrastructure and storage spike, and DBAs are squeezed with trying to make due.
The bulk of the discussion was on what Oracle offers for purging Concurrent Programs. The focus was on system tables – not functional archive and purge routines, like General Ledger or Accounts Receivable. That will be a topic of another SIG day.
For starters, Oracle provides Concurrent Programs to purge administrative data. Look for ‘Big Tables’ owned by APPLSYS for more candidates and search for the biggest tables / indexes. Search for ‘PURGE’ on MyOracleSupport (MOS) – do your homework to decide if the Purge programs apply to you. If you are concerned about deleting data, you can create an archive table, add an ‘on delete’ trigger to the original table, run the purge and automatically save the data in the archive table (Guess what? This is a CUSTOMIZATION).
Some areas to look at include FND_Concurrent Requests and FND_LOBS.
- Most customers purge data older than 7-30 days
- Oracle recommends keeping this table under 25,000 rows
- Consider additional Purges that delete data about concurrent requests that run frequently
- DBAs do not delete from FND_LOBS; the only way to get rid of them is for Oracle to provide a concurrent Program for the module that users used to load them up
- Can take an enormous amount of space and make exporting and importing your database take a long time
- You can also look to store FND_LOBS as secure files, but requires advanced compression licenses
- Log enhancement requests for more concurrent programs to clean up FND_LOBS
- Look to third party solutions, such as Informatica
Other suggestions include WORKFLOW, but this requires more research.
For more information, join the Oracle Application User Group and sign up for the Archive and Purge Special Interest Group.
In my last blog on this topic, I discussed several areas where a database archiving solution can complement or help you to better leverage the Oracle In-Database Archiving feature. For an introduction of what the new In-Database Archiving feature in Oracle 12c is, refer to Part 1 of my blog on this topic.
Here, I will discuss additional areas where a database archiving solution can complement the new Oracle In-Database Archiving feature:
- Graphical UI for ease of administration – In database archiving is currently a technical feature of Oracle database, and not easily visible or mange-able outside of the DBA persona. This is where a database archiving solution provides a more comprehensive set of graphical user interfaces (GUI) that makes this feature easier to monitor and manage.
- Enabling application of In-Database Archiving for packaged applications and complex data models – Concepts of business entities or transactional records composed of related tables to maintain data and referential integrity as you archive, move, purge, and retain data, as well as business rules to determine when data has become inactive and can therefore be safely archived allow DBAs to apply this new Oracle feature to more complex data models. Also, the availability of application accelerators (prebuilt metadata of business entities and business rules for packaged applications) enables the application of In-Database Archiving to packaged applications like Oracle E-Business Suite, PeopleSoft, Siebel, and JD Edwards
- Data growth monitoring and analysis – available in some database archiving solution to enable monitoring and tracking of data growth trends and the identification of which tables, modules, and business entities are the largest and fastest growing to focus your ILM policies on.
- Performance monitoring and analysis – also available in some database archiving solution — allows Oracle administrators to easily and more meaningfully monitor and analyze database and application performance. They can identify the root cause of performance issues, and from there, administrators can define smart partitioning policies to segment data (i.e. mark them as inactive) and monitor the impact of the policy on improving query performance. This capability helps you to identify which set of records should potentially be “marked as inactive” and segmented.
- Automatic purging of unused or aged data based on policies – database archiving solutions allow administrators to define ILM policies to automate the purging of records that are truly no longer used and have been in the inactive state for some time.
- Optimal data organization, placement, and purging, leveraging Oracle partitioning – a database archiving solution like Informatica Data Archive is optimized to leverage Oracle partitioning to optimally move data to inactive tablespaces, and purge inactive data by dropping or truncating partitions. All of these actions are automated based on policies, again eliminating the need for scripting by the DBA.
- Extreme compression to reduce cost and storage capacity consumption – up to 98% (90%-95% on average) compression is available in some database archiving solutions as compared to the 30%-60% compression available in native database compression.
- Compliance management – Enforcement of retention and disposal policies with the ability to apply legal holds on archived data are part of a comprehensive database archiving solution.
- Central policy management, across heterogeneous databases – a database archiving solution helps you to manage data growth, improve performance, reduce costs, ensure compliance to retention regulations, and define and apply data management policies across multiple heterogeneous database types, beyond Oracle.
In my last post, I discussed how our Informatica ILM Nearline allows vast amounts of detail data to be accessed at speeds that rival the performance of online systems, which in turn gives business analysts and application managers the power to assess and fine-tune important business initiatives on the basis of actual historical facts. We saw that the promise of Informatica ILM Nearline is basically to give you all the data you want, when and how you want it — without compromising the performance of existing data warehouse and business reporting systems.
Today, I want to consider what this capability means specifically for a business. What are the concrete benefits of implementing Informatica ILM Nearline? Here are a few of the most important ones.
Informatica ILM Nearline enables you to keep all your valuable data available for analysis.
Having more data accessible – more details, covering longer periods – enables a number of improvements in Business Intelligence processes:
- A clearer understanding of emerging trends in the business – what will go well in the future as well as what is now “going south”
- Better support for iterative analyses, enabling more intensive Business Performance Management (BPM)
- Better insight into customer behavior over the long term
- More precise target marketing, bringing a three- to five-fold improvement in campaign yield
Informatica ILM Nearline enables you to dramatically increase information storage and maintain service levels without increasing costs or administration requirements.
- Extremely high compression rates give the ability to store considerably more information in a given hardware configuration
- A substantially reduced data footprint means much faster data processing, enabling effective satisfaction of Service Level Agreements without extensive investments in processing power
- Minimal administration requirements bring reductions in resource costs, and ensure that valuable IT and business resources will not be diverted from important tasks just to manage and maintain the Informatica ILM Nearline implementation
- High data compression also substantially reduces the cost of maintaining a data center by reducing requirements for floor space, air conditioning and so on.
Informatica ILM Nearline simplifies and accelerates Disaster Recovery scenarios.
A reduced data footprint means more data can be moved across existing networks, making Informatica ILM Nearline an ideal infrastructure for implementing and securing an offsite backup process for massive amounts of data,
Informatica ILM Nearline keeps all detail data in an immutable form, available for delivery on request.
Having read-only detail data available on-demand enables quick response to audit requests, avoiding the possibility of costly penalties for non-compliance. Optional security packages can be used to control user access and data privacy.
Informatica ILM Nearline makes it easy to offload data from the online database before making final decisions about what is to be moved to an archiving solution.
The traditional archiving process typically involves extensive analysis of data usage patterns in order to determine what should be moved to relatively inaccessible archival storage. With an Informatica ILM Nearline solution, it’s a simple matter to move large amounts of data out of the online database — thereby improving performance and guaranteeing satisfaction of SLA’s, — while still keeping the data available for access when required. Data that is determined to be no longer used, but which still needs to be kept around to comply with data retention policies or regulations, can then be easily moved into an archiving solution.
Taken together, these benefits make a strong case for implementing an Informatica ILM Nearline solution when the data tsunami threatens to overwhelm the enterprise data warehouse. In future posts, I will be investigating each of these in more detail.
In today’s post, I want to write about the “Informatica ILM Nearline 6.1A″. Although this Nearline concept is not new, it is still not very known and represents the logical evolution of business applications, data warehouses and information lifecycle approaches that have struggled to maintain acceptable performance levels in the face of the increasingly intense “data tsunami” that looms over today’s business world. Whereas older archiving solutions based their viability on the declining prices of hardware and storage, ILM Nearline 6.1A embraces the dynamism of a software and services approach to fully leverage the potential of large enterprise data architectures.
Looking back, we can now see that the older data management solutions presented a paradox: in order to mitigate performance issues and meet Service Level Agreements (SLA) with users, they actually prevented or limited ad-hoc access to data. On the basis of system monitoring and usage statistics, this inaccessible data was then declared to be unused, and this was cited as an excuse for locking it away entirely. In effect, users were told: “Since you can’t get at it, you can’t use it, and therefore we’re not going to give it to you”!
ILM Nearline 6.1A, by contrast, allows historical data to be accessed with near-online speeds, empowering business analysts to measure and perfect key business initiatives through analysis of actual historical details. In other words, ILM Nearline 6.1A gives you all the data you want, when and how you want it (without impacting the performance of existing warehouse reporting systems!).
Aside from the obvious economic and environmental benefits of this software-centric approach and the associated best practices, the value of ILM Nearline 6.1A can be assessed in terms of the core proposition cited by Tim O’Reilly when he coined the term “Web 2.0″:
“The value of the software is proportional to the scale and dynamism of the data it helps to manage.”
In this regard, ILM Nearline 6.1A provides a number of important advantages over prior methodologies:
Keeps data accessible: ILM Nearline 6.1A enables optimal performance from the online database while keeping all data easily accessible. This massively reduces the work required to identify, access and restore archived data, while minimizing the performance hit involved in doing so in a production environment.
Keeps the online database “lean”: Because data archived to the ILM Nearline 6.1A can still be easily accessed by users at near-online speeds, it allows for much more recent data to be moved out of the online system than would be possible with archiving. This results in far better online system performance and greater flexibility to further support user requirements without performance trade-offs. It is also a big win for customers moving their systems to HANA.
Relieves data management stress: Data can be moved to ILM Nearline 6.1A without the substantial ongoing analysis of user access patterns that is usually required by archiving products. The process is typically based on a rule as simple as “move all data older than x months from the ten largest InfoProviders”.
Mitigates administrative risk: Unlike archived data, ILM Nearline 6.1A data requires little or no additional ongoing administration, and no additional administrative intervention is required to access it.
Lets analysts be analysts: With ILM Nearline 6.1A, far less time is taken up in gaining access to key data and “cleansing it”, so much more time can be spent performing “what if” scenarios before recommending a course of action for the company. This improves not only the productivity but also the quality of work of key business analysts and statistical gurus.
Copes with data structure changes: ILM Nearline 6.1A can easily deal with data model changes, making it possible to query data structured according to an older model alongside current data. With archive data, this would require considerable administrative work.
Leverages existing storage environments: Compared to older archiving products/strategies, the high degree of compression offered by ILM Nearline 6.1A greatly increases the amount of information that can be stored as well as the speed at which it can be accessed.
Keeps data private and secure: ILM Nearline 6.1A has privacy and security features that protect key information from being seen by ad-hoc business analysts (for example: names, social security numbers, credit card information).
In short, ILM Nearline 6.1A offers a significant advantage over other nearline and archiving technologies. When data needs be removed from the online database in order to improve performance, but still needs to be readily accessible by users to conduct long-term analysis, historical reporting, or to rebuild aggregates/KPIs/InfoCubes for period-over-period analysis, ILM Nearline 6.1A is currently the only workable solution available.
In my next post, I’ll discuss more specifically how implementing the ILM Nearline 6.1A solution can benefit your business apps, data warehouses and your business processes.