Tag Archives: tokenization
Column-oriented Database Management Systems (CDBMS), also referred to as columnar databases and CBAT, have been getting a lot of attention recently in the data warehouse marketplace and trade press. Interestingly, some of the newer companies offering CDBMS-based products give the impression that this is an entirely new development in the RDBMS arena. This technology has actually been around for quite a while. But the market has only recently started to recognize the many benefits of CDBMS. So, why is CDBMS now coming to be recognized as the technology that offers the best support for very large, complex data warehouses intended to support ad hoc analytics? In my opinion, one of the fundamental reasons is the reduction in I/O workload that it enables. (more…)
Columnar Deduplication and Column Tokenization: Improving Database Performance, Security and Interoperability
For some time now, a special technique called columnar deduplication has been implemented by a number of commercially available relational database management systems. In today’s blog post, I discuss the nature and benefits of this technique, which I will refer to as column tokenization for reasons that will become evident.
Column tokenization is a process in which a unique identifier (called a Token ID) is assigned to each unique value in a column, and then employed to represent that value anywhere it appears in the column. Using this approach, data size reductions of up to 50% can be achieved, depending on the number of unique values in the column (that is, on the column’s cardinality). Some RDBMSs use this technique simply as a way of compressing data; the column tokenization process is integrated into the buffer and I/O subsystems, and when a query is executed, each row needs to be materialized and the token IDs replaced by their corresponding values. At Informatica for the File Archive Service (FAS) part of the Information Lifecycle Management product family, column tokenization is the core of our technology: the tokenized structure is actually used during query execution, with row materialization occurring only when the final result set is returned. We also use special compression algorithms to achieve further size reduction, typically on the order of 95%.
Personally Identifiable Information is under attack like never before. In the news recently two prominent organizations—institutions—were attacked. What happened:
- A data breach at a major U.S. Insurance company exposed over a million of their policyholders to identity fraud. The data stolen included Personally Identifiable information such as names, Social Security numbers, driver’s license numbers and birth dates. In addition to Nationwide paying million dollar identity fraud protection to policyholders, this breach is creating fears that class action lawsuits will follow. (more…)
In a May 2012 survey by the Ponemon Institute, 66 percent said they are not confident their organization would be able to detect the loss or theft of sensitive personal information contained in systems operated by third parties, including cloud providers. In addition, the majority are not confident that their organization would be able detect the loss or theft of sensitive personal information in their company’s production environment.
Which aspect of data security for your cloud solution is most important?
1. Is it to protect the data in copies of production/cloud applications used for test or training purposes? For example, do you need to secure data in your Salesforce.com Sandbox?
2. Is it to protect the data so that a user will see data based on her/his role, privileges, location and data privacy rules?
3. Is it to protect the data before it gets to the cloud?
As compliance continues to drive people to action, compliance with contractual agreements, especially for the cloud infrastructure continues to drive investment. In addition, many organizations are supporting Salesforce.com as well as packaged solutions such as Oracle eBusiness, Peoplesoft, SAP, and Siebel.
Of the available data protection solutions, tokenization has been used and is well known for supporting PCI data and preserving the format and width of a table column. But because many tokenization solutions today require creating database views or changing application source code, it has been difficult for organizations to support packaged applications that don’t allow these changes. In addition, databases and applications take a measurable performance hit to process tokens.
What might work better is to dynamically tokenize data before it gets to the cloud. So there would be a transparent layer between the cloud and on-premise data integration that would replace the sensitive data with tokens. In this way, additional code to the application would not be required.
In the Ponemon survey, most said the best control is to dynamically mask sensitive information based on the user’s privilege level. After dynamically masking sensitive data, people said encrypting all sensitive information contained in the record is the best option.
The strange thing is that people recognize there is a problem but are not spending accordingly. In the same survey from Ponemon, 69% of organizations find it difficult to restrict user access to sensitive information in IT and business environments. However, only 33% say they have adequate budgets to invest in the necessary solutions to reduce the insider threat.
Is this an opportunity for you?
Hear Larry Ponemon discuss the survey results in more detail during a CSOonline.com/Computerworld webinar, Data Privacy Challenges and Solutions: Research Findings with Ponemon Institute, on Wednesday, June 13.
Recently, Oracle announced that its latest April critical patch update does not address the TNS Poison vulnerability uncovered by a researcher 4 years ago. In addition to this vulnerability from an attacker, organizations face data breaches from internal negligence and insiders. In a May 2012 survey by the Ponemon Institute, 50% say sensitive data contained in databases and applications has been compromised or stolen by malicious insiders such as privileged users. On top of that 68% find it difficult to restrict user access to sensitive information in IT and business environments.
While databases offer basic security features that can be programmed and configured to protect data, it may not be enough and may not scale with your growing organizations. The problem stems from the fact that application development and DBA teams need to have a solid understanding of database vendor specific offerings in order to ensure that the security feature has been properly set up and deployed. If your organization has a number of different databases (Oracle, DB2, Microsoft SQL Server) and that number is growing, it can be costly to maintain all the database specific solutions. Many Informatica customers have faced this problem and looked to Informatica to provide a complete, end-to-end solution that addresses database security on an enterprise-wide level.
Come talk to us at Informatica World and hear from our customers about how they’ve used Informatica to minimize the risk of breaches across a number of use cases including:
- Test data management
- Production support in off-shore projects
- Dynamically protecting PII or PHI data for research portals
- Dynamically protecting data in cross-border applications
At Informatica, you can meet us in our sessions on Thursday, May 17, at the Aria in Las Vegas:
10:10 – 11:10 – Ensuring Data Privacy for Warehouses and Applications with Informatica Data Masking in Room Juniper 3
11:20 – 12:20 – Protecting Sensitive Data Using Informatica’s Test Data Management Solution in Room Starvine 12
Also come to the Informatica Data Privacy booth and lab for in depth demonstrations and presentations of our data privacy solutions and customer deployments.
Data breaches in healthcare have increased 32 percent in the past year and have cost the industry an estimated $6.5 billion annually according to the Ponemon Institute. Responsible for these breaches were largely employee handling of data and the increasing use of mobile devices. Forty-one percent of healthcare executive surveyed attributed data breaches related to protected health information (PHI) to employee mistakes. Half of the respondents said their organization does nothing to protect the information contained on mobile devices. “Healthcare data breaches are an epidemic,” said Dr. Larry Ponemon, chairman and founder, Ponemon Institute, in an announcement of the study results.
Why are healthcare data breaches becoming more common?
PHI data is in all production and test systems, as well as numerous copies that are created of production systems for test, training and application development purposes. In addition to these production systems, PHI data lives in servers inside and outside of the organization. As more mobile devices are used to access critical patient data, and doctors are using their mobile devices to address medical issues from all over the country (if not the world), more sensitive patient data is exposed. In addition to PHI data such as social security number, a lot of sensitive data that healthcare organizations have is contained in textual notes. So the textual data also needs to be protected. But patient data needs to be protected not only within the hospital or healthcare organization. As patient data is used for clinical trial and research purposes, it is important to protect the data that leaves the organization.
To address these concerns, Informatica has seen organizations move towards an end-to-end, enterprise wide data privacy solution that enables them to:
- Consistently define sensitive data and set data privacy policies
- Identify where sensitive data lives throughout the organization
- Create subsets of production data for testing purposes, greatly reducing costs of managing test data (reducing hardware and software)
- Mask data according to all required PHI rules
- Report / provide audit trail that data has been masked and data is secure
Maintaining many, individual privacy solutions can be both costly and risky. An enterprise wide solution centralizes data privacy management, streamlining development and ongoing maintenance.
For more information on healthcare privacy challenges and how to address them, please join us in our upcoming webinar.