Data Masking and Encryption Are Different

Data Masking and Encryption Are Different

A common misconception within the data community is that encryption is considered a form of data masking. Even worse is that there are some that erroneously identify both as one and the same. Data masking and Data encryption are two technically distinct processes. Encryption at the field level is considered a data masking function.

There are many similarities between data masking and data encryption, although the differences are substantial. Each of them is designed to ensure data protection, which can be substantially improved when both are used in synergy.

Fundamental Difference: For encryption, reversibility is required, and for masking, reversibility is a weakness.

If a masking algorithm is reversible. then it is potentially weak as it still contains the original information but in a different form. The best methods for data masking are those that are not based on the original data at all.

Data masking hides data elements that users of certain roles should not see and replaces them with similar-looking fake data, which are typically characters that will meet the requirements of a system designed to test or still work with the masked results. Masking ensures vital parts of personally identifiable information (PII), like the first 5 digits of a social security number, are obscured or otherwise de-identified. There is a niche for the data masking. Dynamic data masking can transform the data on the fly based on the user role (privileges). It is used to secure real time transactional systems and speeds up data privacy, compliance implementation, and maintenance. Data masking does not encrypt information. We can see all data records in their native form and no decryption key is necessary. But we will see only what we are allowed to see today and not a byte more. And tomorrow we may see even less, if the rules change overnight. The best ciphers can be cracked (maybe in a million years using today’s technology), while masked data cannot be unmasked. The resulting data set does not contain any references to the original information. That makes it absolutely useless for the attackers.

Data encryption involves converting and transforming data into scrambled, often unreadable, cipher-text using non-readable mathematical calculations and algorithms. Restoring the message requires a corresponding decryption algorithm and the original encryption key. Data encryption is the process of transforming information by using some algorithm (a cipher) to make it unreadable to anyone except those possessing a key. It is widely used to protect files on a local, network or cloud disk drives, network communications, or just web/email traffic protection.

When would we choose to use data masking versus data encryption?

Data masking is often used by those who need to test with sensitive data or perform research and development on sensitive projects. Companies commonly request production data for testing. Because this sensitive data is passed through many hands, it is at great risk of theft or misuse. Through the process of redacting (stripping, covering over, or removing) the important elements of the data set, such as names, addresses, patient information are protected. This process, however, is often irreversible.

Common terms such as anonymization and de-identification also refer to such processes that irreversibly sever the identifying information in the data set. They prevent future identification of the original data even by the people conducting the research or testing. For example, one cannot discern or re-identify a social security number that presents with its first 5 digits covered by X’s.

Data encryption is often used to protect data that is transferred between computers or networks so that it can be later restored. Data like this, whether in transit or at rest, can be extremely vulnerable to a breach. Conversion of data into non-readable gibberish (or even format preserved cipher text which is hard to crack) creates highly secure results. The only way to gain access to the data is to unlock it with a key or password which only those authorized can access.

Security Perspective: From a data security point of view, the best masking solution is random generation as it is completely independent of the underlying data. Encryption does not constitute good masking. We do not need reversibility. We can abandon the concept of one input one output (a 1‐1 map) and the concept of determinism. Abandoning these two core principles of encryption allows for more secure data masking solutions.

To summarize examination of encryption and masking, if we want to protect our production data from unauthorized entry, but the data is important in its current context, then use encryption. However, if we require need to use our production data in a test environment, where the actual content of the data is meaningless, then we use masking. Not only is masking more secure than encryption, we may also find it to be a much more efficient process. It may be easy to think of data masking and data encryption as the same things, since they are both data-centric means of protecting sensitive data. However, it is their inherent procedures and purposes that differentiate them. Both technologies are relatively easy to implement, for as long as we know what and how to do it right. And an investment into a secure environment will preserve company reputation and customer loyalty for years to come.


  • ramanKC

    Hi Rishu, good job man.