Category Archives: Data Archiving
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
Informatica announced yesterday the Informatica ILM Nearline product is SAP-certified. ILM Nearline helps IT organizations reduce costs of managing data growth in existing implementations of the SAP NetWeaver Business Warehouse (SAP NetWeaver BW) and SAP HANA. By doing so, customers can leverage freed budgets and resources to invest in its application landscape and data center modernization initiatives. Informatica ILM Nearline v6.1A for use with SAP NetWeaver BW and SAP HANA, available today, is purpose-built for SAP environments leveraging native SAP interfaces.
Data volumes are growing the fastest in data warehouse and reporting applications, yet a significant amount of it is rarely used or infrequently accessed. In deployments of SAP NetWeaver BW, standard SAP archiving can reduce the size of a production data warehouse database to help preserve its performance, but if users ever want to query or manipulate the archived data, the data needs to be loaded back into the production system disrupting data analytics processes and extending time to insight. The same holds true for SAP HANA.
To address this, ILM Nearline enables IT to migrate large volumes of largely inactive SAP NetWeaver BW or SAP HANA data from the production database or in memory store to online, secure, highly compressed, immutable files in a near-line system while maintaining end-user access. The result is a controlled environment running SAP NetWeaver BW or SAP HANA with predictable, ongoing hardware, software and maintenance costs. This helps ensure service-level agreements (SLAs) can be met while freeing up ongoing budget and resources so IT can focus on innovation.
Informatic ILM Nearline for use with SAP NetWeaver BW and SAP HANA has been certified with the following interfaces:
- NW-BW-NLS Nearline Storage SAP NetWeaver BW 7.30 on SAP HANA for Informatica Data Archive 6.1A
- NW-BW-NLS 7.30 – Nearline Storage – SAP NetWeaver BW 7.30 for Informatica Data Archive 6.1A
- BC-HCS 6.20 – HTTP Content Server 6.20 for Interface for Informatica Data Archive 6.1
“Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is all about reducing the costs of data while keeping the data easily accessible and thus valuable,” said Adam Wilson, general manager, ILM, Informatica. “As data volumes continue to soar, the solution is especially game-changing for organizations implementing SAP HANA as they can use the Informatica-enabled savings to help offset and control the costs of their SAP HANA licenses without disrupting the current SAP NetWeaver BW users’ access to the data.”
Specific advantages of Informatica ILM Nearline include:
- Industry-leading compression rates – Informatica ILM Nearline’s compression rates exceed standard database compression rates by a sizable margin. Customers typically achieve rates in excess of 90 percent, and some have reported rates as high as 98 percent.
- Easy administration and data access – No database administration is required for data archived by Informatica ILM Nearline. Data is accessible from the user’s standard SAP application screen without any IT interventions and is efficiently stored to simplify backup, restore and data replication processes.
- Limitless capacity – Highly scalable, the solution is designed to store limitless amounts of data without affecting data access performance.
- Easy storage tiering – As data is stored in a highly compressed format, the nearline archive can be easily migrated from one storage location to another in support of a tiered storage strategy.
Available now, Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is based on intellectual property acquired from Sand Technology in Q4 2011 and enhanced by Informatica.
 Informatica Survey Results, January 23, 2013 (citation from Enterprise Data Archive for Hybrid IT Webinar)
The Oracle Application User Group (OAUG) Archive and Purge Special Interest Group (SIG) held its semi-annual session first thing in the morning, Sunday September 22, 2013 – 8:00am. The chairman of the SIG, Brian Bent, must have lost in the drawing straws contest for session times. Regardless, attendance was incredibly strong and the topic, ‘Cleaning up your Oracle E-Business Suite Mess’, was well received.
From the initial audience survey, most attendees have made the jump to OEBS R12 and very few have implemented an Information Lifecycle Management (ILM) strategy. As organizations migrate to the latest version, the rate of data growth increases significantly such that performance takes a plunge, costs for infrastructure and storage spike, and DBAs are squeezed with trying to make due.
The bulk of the discussion was on what Oracle offers for purging Concurrent Programs. The focus was on system tables – not functional archive and purge routines, like General Ledger or Accounts Receivable. That will be a topic of another SIG day.
For starters, Oracle provides Concurrent Programs to purge administrative data. Look for ‘Big Tables’ owned by APPLSYS for more candidates and search for the biggest tables / indexes. Search for ‘PURGE’ on MyOracleSupport (MOS) – do your homework to decide if the Purge programs apply to you. If you are concerned about deleting data, you can create an archive table, add an ‘on delete’ trigger to the original table, run the purge and automatically save the data in the archive table (Guess what? This is a CUSTOMIZATION).
Some areas to look at include FND_Concurrent Requests and FND_LOBS.
- Most customers purge data older than 7-30 days
- Oracle recommends keeping this table under 25,000 rows
- Consider additional Purges that delete data about concurrent requests that run frequently
- DBAs do not delete from FND_LOBS; the only way to get rid of them is for Oracle to provide a concurrent Program for the module that users used to load them up
- Can take an enormous amount of space and make exporting and importing your database take a long time
- You can also look to store FND_LOBS as secure files, but requires advanced compression licenses
- Log enhancement requests for more concurrent programs to clean up FND_LOBS
- Look to third party solutions, such as Informatica
Other suggestions include WORKFLOW, but this requires more research.
For more information, join the Oracle Application User Group and sign up for the Archive and Purge Special Interest Group.
In previous posts, we introduced the concept of the Informatica ILM Nearline and discussed how Informatica ILM Nearline could help your business. To recapitulate: the major advantage of Informatica ILM Nearline is its superior data access performance, which enables a more aggressive approach to migrating huge volumes of data out of the online repository to an accessible, highly compressed archive (on inexpensive 2nd and 3rd tier storage infrastructure).
Today, I will be considering the question of when an enterprise should consider implementing Informatica ILM Nearline. Broadly speaking, such implementations fall into two categories: they either offer a “cure” for an existing data management problem or represent a proactive implementation of data best practices within the organization.
Cure or Prevention?
The “cure” type of implementation is typically associated with a data warehouse or business application “rescue” project. This is undertaken when the production system grows to a point where database size causes major performance problems and affects the ability to meet Service Level Agreements (SLAs) and manage business processes in a timely manner. In these kinds of situation, it is mainly the operations division of the organization that is affected, and who demand an immediate fix that can take the form of an Informatica ILM Nearline implementation. The question here is: How quickly can the “cure” implementation stabilize performance and ensure satisfaction of SLAs?
On the other hand, the best practice approach, much like current practices related to healthy living, focuses on prevention rather than on curing. In this respect, best practices dictate that the Informatica ILM Nearline implementation should start as soon as some of the data in the production system becomes “infrequently accessed”, or “cold”. In data warehouses and data marts where the current month or two is being analyzed most often, this means data older than 90 days. For transactional systems the archiving cutoff may be a year or two, depending on typical length of your business processes. The main idea is to keep the size production databases from inflating for no good business reason and ‘nearlining’ the data as soon as possible without interrupting business operations or hurting the value of your data. Ultimately this should work to protect the enterprise from an operational crisis arising from deteriorating performance and unmet SLAs.
In order to better judge the impact of using either of these two approaches, it is important to understand the various steps involved in the “Nearlining” process. What do we find when we “dissect” the process of leveraging the Informatica ILM Nearline?
Dissecting the “Informatica ILM Nearline” Process
Informatica Informatica ILM Nearline involves multiple processes, whose performance characteristics can significantly influence the speed at which data is migrated out of the online database. The various processes are managed by the overall integrated nearline solution of Informatica coupled with a SAP Business Warehouse system:
- The first step is to lock the data that is targeted by the archiving process, in order to ensure that the data is not modified while the process is going on. SAP Business Warehouse does it automatically and you execute Data Archive Processes (DAP) for the cold data.
- Next comes the extraction of the data to be migrated. This is usually achieved via an SQL statement based on business rules for data migration. Often, the extraction can be performed using multiple extraction/consumer processes working in parallel.
- The next step is to secure the newly extracted data, so that it is recoverable.
- Then, the integrity of the extracted data must be validated (normally by comparing it to its online counterpart).
- Next, delete the online data that has been moved to nearline.
- Then, reorganize the tablespace of the deleted data.
- Finally, rebuild/reorganize the index associated with the online table from which data has been nearlined.
The Database Housekeeping process is often the slowest part of a Data Nearlining process, and thus can dictate the pace and scheduling of the implementation. In a production environment, the database housekeeping process is frequently decoupled from ongoing operations and performed over a weekend. It may be surprising to learn that deleting data can be a more expensive process than inserting it, but just ask an enterprise DBA about what is involved in deleting 1 TB from an Enterprise Data Warehouse and see what answer you get: for many, the task of fitting such a process into standard Batch Windows would be a nightmare.
So, it is easy to see that starting earlier in implementing Informatica ILM Nearline as a best practice can help to massively reduce not only the cost of the implementation, but also the time required to perform it. Therefore, the main recommendation to take away from this discussion is: Don’t wait too long to consider embarking on your Informatica ILM Nearline strategy!
That’s it for today. In my next post, I will take up the topic of which data should be initially considered as a candidate for migration.
In today’s post, I want to write about the “Informatica ILM Nearline 6.1A″. Although this Nearline concept is not new, it is still not very known and represents the logical evolution of business applications, data warehouses and information lifecycle approaches that have struggled to maintain acceptable performance levels in the face of the increasingly intense “data tsunami” that looms over today’s business world. Whereas older archiving solutions based their viability on the declining prices of hardware and storage, ILM Nearline 6.1A embraces the dynamism of a software and services approach to fully leverage the potential of large enterprise data architectures.
Looking back, we can now see that the older data management solutions presented a paradox: in order to mitigate performance issues and meet Service Level Agreements (SLA) with users, they actually prevented or limited ad-hoc access to data. On the basis of system monitoring and usage statistics, this inaccessible data was then declared to be unused, and this was cited as an excuse for locking it away entirely. In effect, users were told: “Since you can’t get at it, you can’t use it, and therefore we’re not going to give it to you”!
ILM Nearline 6.1A, by contrast, allows historical data to be accessed with near-online speeds, empowering business analysts to measure and perfect key business initiatives through analysis of actual historical details. In other words, ILM Nearline 6.1A gives you all the data you want, when and how you want it (without impacting the performance of existing warehouse reporting systems!).
Aside from the obvious economic and environmental benefits of this software-centric approach and the associated best practices, the value of ILM Nearline 6.1A can be assessed in terms of the core proposition cited by Tim O’Reilly when he coined the term “Web 2.0″:
“The value of the software is proportional to the scale and dynamism of the data it helps to manage.”
In this regard, ILM Nearline 6.1A provides a number of important advantages over prior methodologies:
Keeps data accessible: ILM Nearline 6.1A enables optimal performance from the online database while keeping all data easily accessible. This massively reduces the work required to identify, access and restore archived data, while minimizing the performance hit involved in doing so in a production environment.
Keeps the online database “lean”: Because data archived to the ILM Nearline 6.1A can still be easily accessed by users at near-online speeds, it allows for much more recent data to be moved out of the online system than would be possible with archiving. This results in far better online system performance and greater flexibility to further support user requirements without performance trade-offs. It is also a big win for customers moving their systems to HANA.
Relieves data management stress: Data can be moved to ILM Nearline 6.1A without the substantial ongoing analysis of user access patterns that is usually required by archiving products. The process is typically based on a rule as simple as “move all data older than x months from the ten largest InfoProviders”.
Mitigates administrative risk: Unlike archived data, ILM Nearline 6.1A data requires little or no additional ongoing administration, and no additional administrative intervention is required to access it.
Lets analysts be analysts: With ILM Nearline 6.1A, far less time is taken up in gaining access to key data and “cleansing it”, so much more time can be spent performing “what if” scenarios before recommending a course of action for the company. This improves not only the productivity but also the quality of work of key business analysts and statistical gurus.
Copes with data structure changes: ILM Nearline 6.1A can easily deal with data model changes, making it possible to query data structured according to an older model alongside current data. With archive data, this would require considerable administrative work.
Leverages existing storage environments: Compared to older archiving products/strategies, the high degree of compression offered by ILM Nearline 6.1A greatly increases the amount of information that can be stored as well as the speed at which it can be accessed.
Keeps data private and secure: ILM Nearline 6.1A has privacy and security features that protect key information from being seen by ad-hoc business analysts (for example: names, social security numbers, credit card information).
In short, ILM Nearline 6.1A offers a significant advantage over other nearline and archiving technologies. When data needs be removed from the online database in order to improve performance, but still needs to be readily accessible by users to conduct long-term analysis, historical reporting, or to rebuild aggregates/KPIs/InfoCubes for period-over-period analysis, ILM Nearline 6.1A is currently the only workable solution available.
In my next post, I’ll discuss more specifically how implementing the ILM Nearline 6.1A solution can benefit your business apps, data warehouses and your business processes.
Under the hood: decommissioning an SAP system with Informatica Data Archive for Application Retirement
If you reached this blog, you are already familiar with the reasons why you need to do a house cleaning on your old applications. If not, this subject has been explored in other discussions, like this one from Claudia Chandra.
All the explanations below are based on Informatica Data Archive for application retirement.
Very often, customers are surprised to know that Informatica’s solution for application retirement can also decommission SAP system. The market has the feeling that SAP is different, or “another beast”. And it really is!
A typical SAP requires software licenses, maintenance contracts, and hardware for the transactional application itself, the corresponding data warehouse and databases, operating systems, server, storage, and any additional software and hardware licenses that you may have on top of the application. Your company may want to retire older versions of the application or consolidate multiple instances in order to save costs. Our engineering group has some very experienced SAP resources, including myself here, with more than 16 years of hands-on work with SAP technology. And we were able to simplify the SAP retirement process in a way that makes the Informatica Data Archive solution decommission SAP as any other type of application.
Next are the steps to decommission an SAP system using Informatica Data Archive.
Let’s start with some facts: SAP has some “special” tables which can only be read by the SAP kernel itself. In a typical SAP ECC 6.0, around 9% of these tables fall in these categories, representing around 6,000 tables.
More specifically, these tables are known as “clusters”, “pools” and I created a third category with transparent tables which have a binary column, or RAW data type, which only SAP application can unravel.
In this step, we will get all the metadata of the SAP system being retired, including all transparent, cluster and pools tables, all columns with data types. This metadata will be kept with the data in the optimized archive.
2) Extraction from source
Informatica Data Archive 6.1.x is able to connect to all database servers certified by SAP, to retrieve rows from the transparent tables.
On the SAP system, it is required to install an ABAP agent, which has the programs developed by Informatica to read all the rows from the special tables and archive files and to pull all the attachments in its original format. These programs are delivered as an SAP transport, which is imported in the SAP system prior to the beginning of the decommissioning process.
Leveraging the Java connector publicly available through the SAP portal (SAPJCo), Informatica Data Archive connects to an SAP application server on the system being decommissioned and make calls to the programs imported though the transport. The tasks are performed using background threads and the process is monitored from the Informatica Data Archive environment, including all the logging, status and monitoring of the whole retirement process happening in the SAP system.
Extraction of table rows in database
Below you can see what all SAP table types are and how our solution deals with it:
|Table type||Table name in SAP
|Table name in the database(Physical table)||How we handle it?|
|Cluster tables||BSEG||RFBLG||The engine reads all the rows from the logical tables by connecting to the SAP application level and store in the archive store as if the table existed in the database as a physical table.The engine also reads all rows of the physical tables and stores as they are, as a policy insurance only, since the data cannot be read without an SAP system up and running|
|Transparent tables with RAW field||PCL2STXL||PCL2STXL||The engine creates a new table in the archive store and read all rows from the original table, but the RAW field is unraveled.The engine reads all rows of the physical tables and store as they are, as a policy insurance only, since the data cannot be read without an SAP system up and running
The engine also reads all rows of the original table PCL2 or STXL and stores as they are, as a policy insurance only, since the data cannot be read without an SAP system up and running
The Informatica Data Archive will extract the data of all tables, independently of their types.
Table rows in archive files
Another source of table rows is the archived data. SAP has its own archiving framework, which is based on a creation of archiving files, also known as ADK files. These files store table rows in an SAP proprietary compacted form, which can only be read by ABAP code running in a SAP system.
Once created, these files are located in the file system and can be stored in an external storage using an ArchiveLink implementation.
The Informatica Data Archive engine also reads the table rows from all ADK files, independent of their location, as long as the files are accessible by the SAP application being retired. These table rows will be stored in the archive store as well, along with the original table.
Very important: After the SAP system is retired, any implementation or ArchiveLink can be retired as well, along with the storage that was holding the ADK files.
Business transactions in SAP systems have the ability to have attachments linked to them. The SAP Generic Object Services (GOS) is a way to upload documents, add notes to a transaction, add URLs relevant to the document, all still referencing a business document, like a purchase order or a financial document. Some other SAP applications, like CRM, have its own mechanism of attaching documents, complementing GOS features.
All these methods can store the attachments in the SAP database, or at SAP Knowledge Provider (KPro) or externally in storages, leveraging an ArchiveLink implementation.
Informatica’s engine is able to download all the attachment files, notes and URLs as discrete files, independent of where they are stored, keeping the relationship to the original business document. The relationship is stored in a table created by Informatica in the archive store, which contains the key of the business document and the link to the attachments, notes and URLs that were assigned to it in the original SAP system.
All these files are stored in the archive store, along with the structured data – or tables.
4) Load into optimized archive
All data and attachments are then loaded into Informatica’s optimized archive,. The archival store will compress the archived data up to 98%
5) Search and data visualization
All structured data are accessible though JDBC/ODBC, as any other relational database. The user has the option to use the search capability that comes with the product, which allows users to run simple queries and view data as business entities.
Another option is to use the integrated reporting, capability within the product, which allows users to create pixel-perfect reports, using drag and drop technology, querying the data using SQL and displaying the data as business entities, which are defined in prebuilt SAP application accelerators. .
Informatica also has a collection of reports for SAP to display data for customers, vendors, general ledger accounts, assets and financial documents.
Some customers prefer to use their own corporate standard 3rd party reporting tool. That is also possible as long as the tool can connect to JDBC/ODBC sources, which is a market standard for connecting to databases.
Hopefully this blog helped you to understand what Informatica Data Archive for Application Retirement does to decommission an SAP system. If you need any further information, please comment below. Thank you.
Healthcare organizations are currently engaged in major transformative initiatives. The American Recovery and Reinvestment Act of 2009 (ARRA) provided the healthcare industry incentives for the adoption and modernization of point-of-care computing solutions including electronic medical and health records (EMRs/EHRs). Funds have been allocated, and these projects are well on their way. In fact, the majority of hospitals in the US are engaged in implementing EPIC, a software platform that is essentially the ERP for healthcare.
These Cadillac systems are being deployed from scratch with very little data being ported from the old systems into the new. The result is a dearth of legacy applications running in aging hospital data centers, consuming every last penny of HIS budgets. Because the data still resides on those systems, hospital staff continues to use them making it difficult to shut down or retire.
Most of these legacy systems are not running on modern technology platforms – they run on systems such as HP Turbo Image, Intercache Mumps, and embedded proprietary databases. Finding people who know how to manage and maintain these systems is costly and risky – risky in that if data residing in those applications is subject to data retention requirements (patient records, etc.) and the data becomes inaccessible.
A different challenge for CFOs of these hospitals is the ROI on these EPIC implementations. Because these projects are multi-phased, multi-year, boards of directors are asking about the value realized from these investments. Many are coming up short because they are maintaining both applications in parallel. Relief will come when systems can be retired – but getting hospital staff and regulators to approve a retirement project requires evidence that they can still access data while adhering to compliance needs.
Many providers have overcome these hurdles by successfully implementing an application retirement strategy based on the Informatica Data Archive platform. Several of the largest pediatrics’ children’s hospitals in the US are either already saving or expecting to save $2 Million or more annually from retiring legacy applications. The savings come from:
- Eliminating software maintenance and license costs
- Eliminate hardware dependencies and costs
- Reduced storage requirements by 95% (data archived is stored in a highly compressed, accessible format)
- Improved efficiencies in IT by eliminating specialized processes or skills associated with legacy systems
- Freed IT resources – teams can spend more of their time working on innovations and new projects
Informatica Application Retirement Solutions for Healthcare provide hospitals with the ability to completely retire legacy applications, retire and maintain access to archive data for hospital staff. And with built in security and retention management, records managers and legal teams are satisfying compliance requirements. Contact your Informatica Healthcare team for more information on how you can get that EPIC ROI the board of directors is asking for.
What is In-Database Archiving in Oracle 12c and Why You Still Need a Database Archiving Solution to Complement It (Part 2)
In my last blog on this topic, I discussed several areas where a database archiving solution can complement or help you to better leverage the Oracle In-Database Archiving feature. For an introduction of what the new In-Database Archiving feature in Oracle 12c is, refer to Part 1 of my blog on this topic.
Here, I will discuss additional areas where a database archiving solution can complement the new Oracle In-Database Archiving feature:
- Graphical UI for ease of administration – In database archiving is currently a technical feature of Oracle database, and not easily visible or mange-able outside of the DBA persona. This is where a database archiving solution provides a more comprehensive set of graphical user interfaces (GUI) that makes this feature easier to monitor and manage.
- Enabling application of In-Database Archiving for packaged applications and complex data models – Concepts of business entities or transactional records composed of related tables to maintain data and referential integrity as you archive, move, purge, and retain data, as well as business rules to determine when data has become inactive and can therefore be safely archived allow DBAs to apply this new Oracle feature to more complex data models. Also, the availability of application accelerators (prebuilt metadata of business entities and business rules for packaged applications) enables the application of In-Database Archiving to packaged applications like Oracle E-Business Suite, PeopleSoft, Siebel, and JD Edwards
What is In-Database Archiving in Oracle 12c and Why You Still Need a Database Archiving Solution to Complement It (Part 1)
What is the new In-Database Archiving in the latest Oracle 12c release?
On June 25, 2013, Oracle introduced a new feature called In-Database Archiving with its new release of Oracle 12. “In-Database Archiving enables you to archive rows within a table by marking them as inactive. These inactive rows are in the database and can be optimized using compression, but are not visible to an application. The data in these rows is available for compliance purposes if needed by setting a session parameter. With In-Database Archiving you can store more data for a longer period of time within a single database, without compromising application performance. Archived data can be compressed to help improve backup performance, and updates to archived data can be deferred during application upgrades to improve the performance of upgrades.”
This is an Oracle specific feature and does not apply to other databases.