Tag Archives: DBA
Columnar Deduplication and Column Tokenization: Improving Database Performance, Security and Interoperability
For some time now, a special technique called columnar deduplication has been implemented by a number of commercially available relational database management systems. In today’s blog post, I discuss the nature and benefits of this technique, which I will refer to as column tokenization for reasons that will become evident.
Column tokenization is a process in which a unique identifier (called a Token ID) is assigned to each unique value in a column, and then employed to represent that value anywhere it appears in the column. Using this approach, data size reductions of up to 50% can be achieved, depending on the number of unique values in the column (that is, on the column’s cardinality). Some RDBMSs use this technique simply as a way of compressing data; the column tokenization process is integrated into the buffer and I/O subsystems, and when a query is executed, each row needs to be materialized and the token IDs replaced by their corresponding values. At Informatica for the File Archive Service (FAS) part of the Information Lifecycle Management product family, column tokenization is the core of our technology: the tokenized structure is actually used during query execution, with row materialization occurring only when the final result set is returned. We also use special compression algorithms to achieve further size reduction, typically on the order of 95%.
Many of us read reports from industry pundits as well as the latest industry rags and hear the “experts” describe the latest trends. A lot of these for the most part, don’t come to fruition. The most unfortunate part is that our bosses and internal customers read these reports as gospel and we spend our time debunking a lot of the myths.
An area that is much hyped is the Cloud. The question is whether or not this is fact or fiction. Today, I will focus on SaaS in particular, as even the word Cloud conjures up conflicting images. In my world, SaaS is real. The most compelling metric is that in 2010, the number of Cloud apps exceeded our on-premise apps, and the impact on our business and IT has been profound. Let me repeat, SaaS exceeds on-premise apps! (more…)