To Partition or to Archive

Both partitioning and archiving are alternative methods of improving database and application performance. Depending on a database administrator’s comfort level for one technology or method over another, either partitioning or archiving could be implemented to address performance issues due to data growth in production applications. But what are the best practices for utilizing one or the other method and how can they be used better together?

Partitioning is ideal for improving performance in production systems, when data is still actively used, and therefore needs to be kept in the production database. Partitioning should also be used if reducing storage capacity and cost is not the primary concern as data grows. Partitioning can be implemented entirely under the purview of the database administrator, without the need to involve the functional users of the application. Therefore, from an approval process perspective, it is easier to implement since you need to involve fewer people.

On the other hand, there are limits to the effectiveness of partitioning as a method of improving performance, since continuous monitoring and tuning of partitioning strategies is required as data grows and as query patterns change. Also, since data still stays in the production system, it is not as easy to implement storage tiering strategies involving multiple storage subsystems. Partitioning also has limited impact on reducing maintenance windows and effort.

As data becomes inactive and infrequently used, then archiving is the ideal solution for improving performance and reducing overall storage, infrastructure and maintenance costs, by moving the data out of the production database completely.  However, archiving usually does require support from the business, since data is being moved out of the production database.  This means that the database administrator needs approval from the business users and application owners for the retention rules for keeping data in production vs. archive. This approval process becomes much easier if archived data can still be easily accessed from reporting tools without restoring the data or can be seamlessly accessed from the same application interface as before, causing minimal or no impact to the business users.

Archiving is also the most cost-effective method for long term retention of data that is governed by retention regulations. This is because some archiving software offers extreme compression of data, reducing storage capacity requirements. Some archiving solutions can also help to automate enforcement of retention and disposal policies, as well as applying legal holds to lock down data relevant to eDiscovery requests.

Since partitioning is ideal for improving database performance when data is still actively used, while archiving is ideal for improving performance and reducing costs when data has become inactive, as a best practice, both should be used as complementary methods to manage data growth.  Since partitioning doesn’t require involvement from the business, while archiving usually does, partitioning could also be used first while archiving can be used as the longer term solution for managing data retention and compliance. Data growth management solutions that allow you to manage and automate both partitioning and archiving may also support faster and more optimal archiving of entire partitions and enable you to more easily define complementary retention policies.

This entry was posted in Application ILM, Database Archiving and tagged , , , , , . Bookmark the permalink.

4 Responses to To Partition or to Archive

  1. Naveen Sidda says:

    MapReduce might actually act like an archive with an added advantage to access it when you need it..?

    • Naveen Sidda says:

      Is Informatica considering MapReduce for any background processes?

    • Interesting question, Naveen. MapReduce/HDFS/Hadoop do certainly play a role in enterprise data architecture. Whether they’d be an alternative to partitioning/archiving depends on your objectives.

      If the goal is analysis of enormous quantities of data, especially unstructured data like log files, etc., the MR model is terrific. What it generally will not do is reduce storage requirements (since data is stored redundantly) or provide seamless, application-level access to the archived data. It also isn’t usually tooled up to support retention/deletion policies, guaranteed immutability, and other requirements that are unique to data archiving.

      If the goals are reduced storage requirements, transparent, application-level access to archived data, retention policy enforcement, and improved application performance (and the flip-side: reduced hardware upgrade requirements), then partitioning and archiving technologies are a more likely choice.

      These distinct objectives and requirements are why there are distinct archive/performance and “Big Data” technologies. Depending on what you’re trying to accomplish you might need one, the other, or both. Having them on a common data integration platform, as INFA does, makes the solution economical and simplifies life for adminstrators, operators and developers.

  2. Naveen,
    I’m not sure how your comment regarding MapReduce and background processes relates to partitioning and archiving?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>