Tag Archives: Data archiving
The Oracle Application User Group Archive and Purge Special Interest Group held its semi-annual meeting on Sunday September 30th at Oracle OpenWorld. Once again, this session was very well attended – but more so this year because of the expert panel which included: Admed Alomari – Founder of Cybermoor, Isam Alyousfi – Seniorr Director, Lead Oracle Applications Tuning Group, Sameer Barakat, Oracle Applications Tuning Group, and Ziyad Dahbour – now at Informatica (Founder of TierData, Founder of Outerbay). (more…)
The need for more robust data retention management and enforcement is more than just good data management practice. It is a legal requirement for financial services organizations across the globe to comply with the myriad of local, federal, and international laws that mandate the retention of certain types of data for example:
- Dodd-Frank Act: Under Dodd-Frank, firms are required to maintain records for no less than five years.
- Basel Accord: The Basel guidelines call for the retention of risk and transaction data over a period of three to seven years. Noncompliance can result in significant fines and penalties.
- MiFiD II: Transactional data must also be stored in such a way that it meets new records retention requirements for such data (which must now be retained for up to five years) and easily retrieved, in context, to prove best execution.
- Bank Secrecy Act: All BSA records must be retained for a period of five years and must be filed or stored in such a way as to be accessible within a reasonable period of time.
- Payment Card Industry Data Security Standard (PCI): PCI requires card issuers and acquirers to retain an audit trail history for a period that is consistent with its effective use, as well as legal regulations. An audit history usually covers a period of at least one year, with a minimum of three months available on-line.
- Sarbanes-Oxley:Section 103 requires firms to prepare and maintain, for a period of not less than seven years, audit work papers and other information related to any audit report, in sufficient detail to support the conclusions reached and reported to external regulators.
Each of these laws have distinct data collection, analysis, and retention requirements that must be factored into existing information management practices. Unfortunately, existing data archiving methods including traditional database and tape backup methods lack the required capabilities to effectively enforce and automate data retention policies to comply with industry regulations. In addition, a number of internal and external challenges make it even more difficult for financial institutions to archive and retain required data due to the following trends: (more…)
Data warehouses are applications– so why not manage them like one? In fact, data grows at a much faster rate in data warehouses, since they integrate date from multiple applications and cater to many different groups of users who need different types of analysis. Data warehouses also keep historical data for a long time, so data grows exponentially in these systems. The infrastructure costs in data warehouses also escalate quickly since analytical processing on large amounts of data requires big beefy boxes. Not to mention the software license and maintenance costs of such a large amount of data. Imagine how many backup media is required to backup tens to hundreds of terabytes of data warehouses on a regular basis. But do you really need to keep all that historical data in production?
One of the challenges of managing data growth in data warehouses is that it’s hard to determine which data is actually used, which data is no longer being used, or even if the data was ever used at all. Unlike transactional systems where the application logic determines when records are no longer being transacted upon, the usage of analytical data in data warehouses has no definite business rules. Age or seasonality may determine data usage in data warehouses, but business users are usually loath to let go of the availability of all that data at their fingertips. The only clear cut way to prove that some data is no longer being used in data warehouses is to monitor its usage.
As a final part of our series, Architecting A Database Archiving Solution, we will review a process I use to assess a client’s existing Total Cost of Ownership of their database application and how to justify a database archiving solution. The key metrics I begin with are listed below and explained:
During this series of “Architecting a Database Archiving Solution”, we discussed the Anatomy of A Database Archiving Solution and End User Access Requirements. In this post we will review the archive repository options at a very high level. Each option has its pros and cons and needs to be evaluated in more detail to determine which will be the best fit for your situation.
Series: Architecting A Database Archiving Solution Part 3: End User Access & Performance Expectations
In my previous blog as part of the series, architecting a database archiving solution, we discussed the major architecture components. In this session, we will focus on how end user access requirements and expected performance service levels drive the core of an architecture discussion.
End user access requirements can be determined by answering the following questions. When data is archived from a source database:
- How long does the archived data need to be retained? The longer the retention period, the more the solution architecture needs to account for potentially significant data volumes and technology upgrades or obsolescence. This will determine cost factors of keeping data online in a database or an archive file, versus nearline or offline on other media such as tape. (more…)
Before we can go into more details on how to architect a database archiving solution, let’s review at a high level the major components of a database archiving solution. In general, a database archiving solution is comprised of four key pieces – application metadata, a policy engine, an archive repository and an archive access layer.
Application Metadata – This component contains information that is used to define what tables will participate in a database archiving activity. It stores the relationships between those tables, including database or application level constraints and any criteria that needs to be considered when selecting data that will be archived. The metadata for packaged applications, such as Oracle E-Business Suite, PeopleSoft, or SAP can usually be purchased in pre-populated repositories, such as Informatica’s Application Accelerators for Data Archive to speed implementation times.
Policy Engine – This component is where business users define their retention policies in terms of time durations and possibly other related rules (i.e. keep all financial data for current quarter plus seven years and the general and sub ledgers must have a status of “Closed”). The policy engine is also responsible for executing the policy within the database, and moving data to a configured archive repository. This involves translating the policy and metadata into structured query language that the database understands (SELECT * from TABLE A where COLUMN 1 > 2 years and COLUMN 2 = “Closed”). Depending on the policy, users may want to move the data to an archive (meaning it is removed from the source application) or just create a copy in the archive. The policy engine takes care of all those steps.
Archive Repository – This stores the database archive records. The choices for the repository vary and will be determined based on a number of factors typically driven from end user archive access requirements (we will discuss this in the next blog). Some of these choices include another archive database, highly compressed query-able archive files, XML files to name a few.
Archive Access Layer – This is the mechanism that makes the database archive accessible either to a native application, a standard business reporting tool, or a data discovery portal. Again, these options vary and will be determined based on the end user access requirements and the technology standards in the organizations data center.
In the next series, we will discuss how End User Access and Performance Requirements impact the selection of these components in further detail.
Julie Lockner, Founder, www.CentricInfo.com
Classifying databases data for an ILM project requires a process for categorizing and classifying that involves the business owners, Records Management, Security, IT, DBA’s and developers. In an ideal scenario, a company has documented every single business process down to data flows and database tables. IT can map database tables to the underlying infrastructure. Since most of us work in realistic scenarios, here is one approach you can take to classify information without knowing all the interrelations.�
Many of my clients struggle with how to design a database archiving solution. Database archiving is not as clean as email or file archiving. Project owners who have done their research understand why they need an archiving solution: either to address performance degradation or increased costs (or both) due to uncontrolled data volume growth in their production databases. Where help is appreciated is during the planning phase of a project and defining what requirements are critical and how those requirements translate into an archive architecture.