In a recent InformationWeek blog, “Big Data A Big Backup Challenge”, George Crump aptly pointed out the problems of backing up big data and outlined some best practices that should be applied to address them, including:
- Identifying which data can be re-derived and therefore doesn’t need to be backed up
- Eliminating redundancy, file de-duplication, and applying data compression
- Using storage tiering and the combination of online disk and tapes to reduce storage cost and optimize performance
In addition to those best practices, complementing your backup strategies with archiving will also reduce the size of data that needs to be backed up and the associated storage costs. Archiving allows you to move inactive data to a lower cost infrastructure, supporting the storage tiering strategy above. By moving inactive data out of your production system, the amount of data that needs to be backed up on a more frequent basis is reduced, thus reducing storage requirements. The inactive data that has been moved out of production systems can follow a much less frequent backup program, since this data is no longer being modified.
Archiving solutions will also support the de-duplication and compression strategy above. Many archiving technologies have advanced file de-duplication and compression techniques. Specifically, a database archiving solution supports moving and converting the inactive data from production databases to an optimized file archive that is highly compressed, and can be as high as up to 98% compression. So backing up the archive to support disaster recovery will require much less storage space – a 100 Terabytes of data may only need 2 – 5 Terabytes of archive and backup space each. This is critical in the age of big data.
In addition, archiving solutions allow you to keep the data online, maintaining easy access to the data without the need to locate it on tape and restoring it. Backups, on the other hand can be used for disaster recovery of both production and archived data. By ensuring online access to data, archiving solutions support compliance and eDiscovery by reducing the time it takes to find and produce data for reviews. With more data to cull through, it’s important that you can find all relevant data in a timely manner to reduce the potential fines due to delayed responses. There are many cases in the press, especially in the financial industry, where the inability to produce all relevant data in a timely manner costs companies hundreds of millions of dollars in fines.
A database archiving solution may also offer retention management capabilities, allowing customers to enforce retention and disposal policies, ensuring that data is held long enough but not too long that it becomes a legal risk.
Implementing archiving strategies for big data is not only critical to address the big data backup challenge, by reducing backup windows and reducing overall storage costs, but they have the added benefit of supporting regulatory compliance and reducing the risk of fines during eDiscovery and audits.