Data Masking: What It Is, Techniques and Examples
Did you spell out the heading of this post correctly? I bet you did not. I am talking about “opjuqzsdof bube vs da** ma*****”. Let’s move ahead and we will revisit it again. The main agenda of this post is to understand the intricacy of this jumbled and starred raw text.
In the data security world, encryption and masking are the most efficacious and powerful techniques to protect un-authorized access of sensitive data. Before moving ahead without any doubt, say loudly, “Encryption and Masking are different ways of data protection.” Encryption is neither the same as masking, nor is masking the same as encryption. Both are intended to solve different data security problems. Let’s start with encryption followed by masking techniques and understand how these two are different.
What is data encryption and how does it help to achieve data security?
The process of transforming original data into coded format by using encryption techniques (symmetric key encryption and public key encryption), so that only authorized users can decode the encoded message and prevent unauthorized eavesdropping, is termed encryption. And the process of retrieving original data from encoded data using an encryption key/decryption key is termed decryption.
In general, the encryption technique uses a key (termed encryption key) to encode original data and authorized users have access to this encryption key/decryption key. Using this key, encoded data is converted back to original data. It highlights one of the most fundamental salient features of the encryption algorithm: all encryption algorithms are reversible (provided you are authorized and have the encryption key).
Original data is encoded using an encryption algorithm and key. Encoded data is reverted back to original data using the same algorithm and encryption/decryption key. See the following diagram, which shows the sequence of events and summarizes the understanding of encryption fundamentals:
For the sake of simplicity, we have considered the simplest encryption method. (Yes, it is not a deterministic algorithm.) We also agreed with the other party to use a symmetric key (same for encryption and decryption) – just decrement 1 from each digit of SSN and get the original SSN data from the encoded one.
Let’s walk through the events that occur in this encryption and decryption. Joye requests the employee’s details from Bob, who is authorized to access the data source, and both agreed to use the encryption/decryption key as agreed upon earlier. Bob encodes the sensitive data, like SSN, by incrementing each digit by 1and sends it to Joye. Joye knows the decryption key, so she decodes the SSN and gets original data.
Note: In principle, it is not impossible to break the encryption algorithm. By a brute force approach, we can try all possible key combinations and break the encryption algorithm. However, quantum of time to search all possible keys is a very huge order of 10 to the power 27 years (if we are using a 256 bit encryption key). It is the size of the key that makes breaking the encryption algorithm harder and harder. In 1997, a 40-bit RC4 key was cracked in only 3.5 hours and in the year 2000, a 56-bit DES key was cracked in less than 4 days. So, the strength of the encryption algorithm is expressed in terms of how much time it will take to beak a particular algorithm.
What is data masking and how does it help to achieve data security?
The process of providing a safeguard to original data without transforming it to intermediate data, just providing obscured data to the user, is termed masking and the data sent is called masked data.
In other words, in masking methodology we do not have to re-construct original data from any intermediate data. It points out the most fundamental difference between encryption (original data is transformed into encoded data and original data is retrieved from it) and masking (no transformation is allowed, just original data is protected). The most significant property of masking is: masking methodology is not reversible. The strength of masking methodology lies in the fact that masking should be done in such a way that there should not be any way to retrieve original data from masked data.
Let’s understand masking methodology with the following example. Suppose we have two different types of users: Administrator and business analyst, and the system and data access level privileges are different for them. Administrators can see and edit original data; for business analysts, SSN is not relevant (employee ID is enough to maintain referential integrity) and the system is designed in such a way that after working hours (9 AM to 5 PM), the business analyst cannot see original data such as bank account number and SSN.
In order to make it simpler, we will use the simplest masking methodology (sometimes termed as applying a masking rule): replace the original data with “XXXXX” if the user is not authorized to see original data.
From the above diagram, we can easily figure out what data masking does. For the business analyst, SSN is masked (XXX displayed instead of original data) and other sensitive information is masked after/before business hours. Even though it is the most rudimentary form of data masking, the fundamental concept is the same – obscure data from unauthorized users by applying a masking rule/masking algorithm and the masking is irreversible (from masked data we should not be able to retrieve original data).
Before concluding this article, we will go back to the heading of the article and understand the representation complexity: “opjuqzsdof bube” – reverse this string and shift one alphabet character back for each, we will get the original string, data encryption. And similarly, da** ma***** is the masked version of the original string “data masking”.
Encryption: Original data is encoded (Intermediate encrypted data) and from this encoded data the original data is retrieved.
Masking: Original data is masked/obscured and there should not be any provision to retrieve the original data from the obscured data