SAP’s data warehouse solution (SAP BW) provides enterprises the ability to easily build a warehouse over their existing operational systems with pre-defined extraction and reporting objects and methods. Data that is loaded into SAP BW is stored in a layered architecture which encourages reusability of data throughout the system in a standardized way. SAP’s implementation also enables easy audits of data delivery mechanisms that are used to produce various reports within the system.
To allow enterprises to achieve this level of standardization and auditability, SAP BW must persistently store large amounts of data within different layers of their architecture. Managing the size of the objects within these layers will become increasingly important as the system grows to insure high levels of performance for end-user queries and data delivery.
The question is how to best manage data growth while taking under consideration the data access requirements of SAP BW.
An easy way of addressing data growth is to constantly add more hardware to the system. This “band aid” approach can be quite costly in the long run when considering the overall cost of high-end storage systems that a production system requires (which includes: high availability, disaster recovery, mirroring, electricity, cooling, floor space, etc…). The growth of an SAP BW system will also affect the time required to perform administrative tasks (such as backup and restores, indexing, database re-organization) that will eventually not fit within the system’s batch windows. When considering on top of this that most SAP BW customers replicate their production system to multiple QA or development environments the hardware approach will greatly increase costs of the overall SAP BW environment while not addressing performance and administration issues associated with data growth.
SAP provides a better approach to manage size and growth of a system than the traditional hardware approach. Integrated in SAP BW are predefined mechanisms to archive data from its various objects. Archiving allows customers to greatly reduce the size of their system by sending data that is infrequently used to alternate storage.
As most data access is done on a very small percentage of data that is stored within a data warehouse, keeping objects within the production system small by focusing on the data that is regularly being used will increase the overall system’s access speed. Another important aspect to take under consideration is data loads. Data that is loaded into SAP BW must go through multiple layers before it is reportable. Some objects within these layers require activation steps which can be quite time consuming when the object is large. Having smaller objects will not only improve on query speed but also improve the speed at which data can be delivered to end-users.
SAP’s standard ADK archiving strategy will enable customers to reduce the size of objects within their SAP BW system. However, one of the drawbacks of the SAP’s standard ADK based archiving solution is that once data has been pushed to archive; this data is no longer accessible by SAP BW. In SAP BW, analysis is often done on wide segments of data that include trend comparisons in between new and old data. A solution that does not provide data accessibility to older data other than by reloading it into the system will limit the ability of end-users to perform their required analysis. Also, as SAP’s layered architecture promotes reusability of data; if this data is no longer available because it has been pushed to an archive, this will also require a reload of data if a target objects need to be built or modified based on this data. In both of these cases, the standard SAP ADK archive approach is not the best fit.
To overcome this issue, SAP BW has integrated into their archive strategy the possibility to send data either to a standard ADK archive or to a nearline repository. Nearline will provide all the size reduction benefits that is provided by the standard SAP ADK strategy but will also provide direct transparent access to nearline data via the same access methods currently available in SAP BW (which includes query access and data delivery processes). The data federation that is required to provide seamless combined access to production and nearline data is integrated directly in SAP BW layer. Because of this, SAP BW does not depend on the underlying database or an external tool to provide this access. This results in a completely transparent combined access to production and nearline data without having to build or use alternate access methods (which greatly reduce the cost of implementing such a solution).
The nearline repository itself is provided by third party vendors which must be certified to SAP Generic Nearline Storage Interface. All data movements (archives, reloads, and queries) are handled by the SAP BW system while the third party vendor is responsible to store and provide immediate access to the nearline data when required.
Having a nearline solution that can compress data at a rate of 98% and provide fast direct access to the data will not only reduce the cost of storage related to the SAP BW landscape (Dev, QA, prod), but also increase the performance of the system for queries and data delivery while reducing operational strains on the system (batch windows, backups, etc…). In the case of SAP’s HANA or BWA, the reduction in object size will greatly reduce the cost of implementing these solutions as less memory and CPUs will be required to operate the production environment. The storage savings alone, or in the case of HANA, the memory and CPU savings, can provide an ROI within 6 to 12 months of the initial implementation.
A nearline solution can therefore help large SAP BW environments to run more smoothly and thinking about such a solution early in the process of implementing SAP BW can avoid having issues related to growth and size altogether.