Tag Archives: security
Columnar Deduplication and Column Tokenization: Improving Database Performance, Security and Interoperability
For some time now, a special technique called columnar deduplication has been implemented by a number of commercially available relational database management systems. In today’s blog post, I discuss the nature and benefits of this technique, which I will refer to as column tokenization for reasons that will become evident.
Column tokenization is a process in which a unique identifier (called a Token ID) is assigned to each unique value in a column, and then employed to represent that value anywhere it appears in the column. Using this approach, data size reductions of up to 50% can be achieved, depending on the number of unique values in the column (that is, on the column’s cardinality). Some RDBMSs use this technique simply as a way of compressing data; the column tokenization process is integrated into the buffer and I/O subsystems, and when a query is executed, each row needs to be materialized and the token IDs replaced by their corresponding values. At Informatica for the File Archive Service (FAS) part of the Information Lifecycle Management product family, column tokenization is the core of our technology: the tokenized structure is actually used during query execution, with row materialization occurring only when the final result set is returned. We also use special compression algorithms to achieve further size reduction, typically on the order of 95%.
Informatica recently hosted a webinar on Enterprise Data Archiving Best Practices with guest speakers, Tony Baer from Ovum and Murali Rathnam from Symantec IT. With over 600 registrations, I would say that enterprise data archiving is not hot, it is white hot. At least for Informatica. With Big Data entering the data center, organizations are looking for ways to make room – either in the budget or in the data center itself. Archiving is a proven approach that achieves both. Given the complexities and interconnections of enterprise applications, Enterprise Data Archive solutions based on market leading technologies such as Informatica Data Archive, can deliver on the value proposition while meeting tough requirements. (more…)
In this video, Rob Karel, vice president of product strategy, Informatica, outlines the Informatica Data Governance Framework, highlighting the 10 facets that organizations need to focus on for an effective data governance initiative:
- Vision and Business Case to deliver business value
- Tools and Architecture to support architectural scope of data governance
- Policies that make up data governance function (security, archiving, etc.)
- Measurement: measuring the level of influence of a data governance initiative and measuring its effectiveness (business value metrics, ROI metrics, such as increasing revenue, improving operational efficiency, reducing risk, reducing cost or improving customer satisfaction)
- Change Management: incentives to workforce, partners and customers to get better quality data in and potential repercussions if data is not of good quality
- Organizational Alignment: how the organization will work together across silos
- Dependent Processes: identifying data lifecycles (capturing, reporting, purchasing and updating data into your environment), all processes consuming the data and processes to store and manage the data
- Program Management: effective program management skills to build out communication strategy, measurement strategy and a focal point to escalate issues to senior management when necessary
- Define Processes that make up the data governance function (discovery, definition, application and measuring and monitoring).
For more information from Rob Karel on the Informatica Data Governance Framework, visit his Perspectives blogs.
This week the EMC World 2012 conference is taking place in Las Vegas. Informatica is participating as a partner continuing its commitment to the EMC Select Partnership for the Informatica ILM and MDM solutions. Informatica has continued to expand its partnership to include support for its Greenplum Hadoop distribution – mostly to support organizations needs for big data integration while making big data manageable and secure. (more…)
As a routine matter of delivering care, billing for services and operating their hospitals and physician practices, healthcare providers deal with patient’s protected health information all day, every day. Dealing with the data becomes routine and it’s easy for sometimes onerous security and privacy policies and procedures to be overlooked. While we’d all like that not to be the case, delivering healthcare (and getting paid for it) is a hugely complex undertaking and focusing exclusively on human processes and calling for constant vigilance and attention to detail can only go so far. (more…)
Security is a work-in-progress for the Apache Hadoop project and sub-projects, as I discuss as part of an O’Reilly Hadoop tutorial, “Get started with Hadoop: from evaluation to your first production cluster”. Below are several of the security tips and best practices that I discuss in that article. (more…)
Enterprises use Hadoop in data-science applications that improve operational efficiency, grow revenues or reduce risk. Many of these data-intensive applications use Hadoop for log analysis, data mining, machine learning or image processing.
Commercial, open source or internally developed data-science applications have to tackle a lot of semi-structured, unstructured or raw data. They benefit from Hadoop’s combination of storage and processing in each data node spread across a cluster of cost-effective commodity hardware. Hadoop’s lack of fixed-schema works particularly well for answering ad-hoc queries and exploratory “what if” scenarios.
On Friday June 10th USA Today’s front page had an article about companies being required to report data breaches entitled: Citigroup latest to report data breach.
This was on the heels of Citigroup’s acknowledgement a day earlier on a major data breach of customer account information that occurred. Thus far in 2011, there have been 251 reported data breaches, which is on track to meet or exceed last year’s total of 597 data breaches. The article went on to reference a recent survey by Symantec and the Ponemon Institute which included 51 data breaches and indicated that each data breach costs an average of $7.2 million, and the costs continue to climb. (more…)
Informatica World pre-conference session kicked off on Monday at the Gaylord National Harbor, Washington DC. One of the four sessions was “Leveraging the Flexibility of Informatica MDM – An Architecture Deep Dive”. Dmitri Korablev, VP of MDM Strategy, Ron Matusof, VP of MDM Solution Architecture, and Steve Hoskin, MDM Chief Architect conducted the session. The key objective was to explain the advanced architectural concepts of Informatica MDM relating to security, high performance, high availability, concurrency, and integration.
Dmitri started off quoting Albert Einstein “make everything as simple as possible, but not simpler” as the guiding principle that drove the design of Informatica MDM. He and Ron presented the six-step process to design an MDM solution – defining usage scenarios, selecting solution options, evaluating consumption patterns, defining data model, defining solution architecture, and applying non-functional requirements to the solution. Deeper conversations in each of the steps emphasized the guiding principle – keep MDM solution design simple! (more…)