Category Archives: Data Integration
Join us this year at Informatica World!
We have a great line up of speakers and events to help you become a data driven healthcare organization… I’ve provided a few highlights below:
Participate in the Informatica World Keynote sessions with Sohaib Abbasi and Rick Smolan who wrote “The Human Face of Big Data” — learn more via this quick YouTube video: http://www.youtube.com/watch?v=7K5d9ArRLJE&feature=player_embedded
With more than 100 interactive and in-depth breakout sessions, spanning 6 different tracks, (Platform & Products, Architecture, Best Practices, Big Data, Hybrid IT and Tech Talk), Informatica World is an excellent way to ensure you are getting the most from your Informatica investment. Learn best practices from organizations who are realizing the potential of their data like: Ochsner Health, Sutter Health, UMass Memorial, Qualcomm and Paypal.
Finally, we want you to balance work with a little play… we invite you to network with industry peers at our Healthcare Cocktail Reception on the evening of Wednesday, June 5th and again during our Data Driven Healthcare Breakfast Roundtable on Thursday, June 6th.
See you there!
In my recent blog posts, we have looked at ways that master data management can become an integral component to the enterprise architecture, and I would be remiss if I did not look at how MDM dovetails with an emerging data management imperative: big data and big data analytics. Fortunately, the value of identity resolution and MDM has the potential for both contributing to performance improvement while enabling efficient entity extraction and recognition. (more…)
In my previous blog, I explained how Column-oriented Database Management Systems (CDBMS), also known as columnar databases or CBAT, offer a distinct advantage over the traditional row-oriented RDBMS in terms of I/O workload, deriving primarily from basing the granularity of I/O operations on the column rather than the entire row. This technological advantage has a direct impact on the complexity of data modeling tasks and on the end-user’s experience of the data warehouse, and this is what I will discuss in today’s post. (more…)
We’ve been spending a lot of time here at Informatica preparing for Informatica World. That means taking a big step back to take the broader view of all the change happening in the world of information management and data integration today. New data sources and new data technologies are emerging almost daily, and the pace is only accelerating. Our mission is to help our customers and our market not only cope with all this change, but to harness it for competitive advantage.
But even as we’re putting together the latest take on the big picture, we’re also zooming in on the technology “secret sauce” which makes it possible to manage all this change. Informatica has the “secret sauce”– it’s what makes our architecture unique, and it’s what allows us to deliver the most value to our customers.
I’m not going to tell you what the “secret sauce” is now– you have to come to Informatica World to find out. Our executives including Sohaib Abbasi, Ivan Chong and James Markarian will be laying out the big picture, as well as revealing the “secret sauce.” And I’ll be diving in to more details in my Informatica Platform overview breakout session.
I hope to see you in Vegas next month. (by the way, the special hotel rate ends this Friday May 3rd, so register today!)
According to Doug Henschen, Executive Editor at InformationWeek, “Despite the weak economy and zero growth in many IT salary categories, business intelligence (BI), analytics, information-integration and data warehousing professionals are seeing a slow-but-steady rise in income.” (more…)
One of the biggest issues I have with most MDM implementation is the sacrificing of assessing data consumer requirements in deference to data consolidation. In general, my complaint is that the creation of a master data repository does not guarantee any creation or improvement of value to the organization unless there are clearly-defined ways in which the master data sets are to be used. (more…)
Column-oriented Database Management Systems (CDBMS), also referred to as columnar databases and CBAT, have been getting a lot of attention recently in the data warehouse marketplace and trade press. Interestingly, some of the newer companies offering CDBMS-based products give the impression that this is an entirely new development in the RDBMS arena. This technology has actually been around for quite a while. But the market has only recently started to recognize the many benefits of CDBMS. So, why is CDBMS now coming to be recognized as the technology that offers the best support for very large, complex data warehouses intended to support ad hoc analytics? In my opinion, one of the fundamental reasons is the reduction in I/O workload that it enables. (more…)
On a recent trip to a new city, someone said that the easiest way from the airport to the hotel was to use the Metro. I could speak the language, but reading it was another matter. I was surprised by how quickly I navigated to the hotel by following the Metro map. The Metro map is based on the successful design of the London Underground map.
Harry Beck was not a cartographer. He was an engineering draftsman. He started drawing a different type of map in his spare time. Beck believed that the passengers were not worried about the distance accuracy of the map. He reduced the map to straight lines and sharp angles, which produced a map closer to an electrical schematic diagram rather than a more common geographic map. The company that ran the London Underground was skeptical of Beck’s map since it was radically different and they had not commissioned the project. (more…)
Columnar Deduplication and Column Tokenization: Improving Database Performance, Security and Interoperability
For some time now, a special technique called columnar deduplication has been implemented by a number of commercially available relational database management systems. In today’s blog post, I discuss the nature and benefits of this technique, which I will refer to as column tokenization for reasons that will become evident.
Column tokenization is a process in which a unique identifier (called a Token ID) is assigned to each unique value in a column, and then employed to represent that value anywhere it appears in the column. Using this approach, data size reductions of up to 50% can be achieved, depending on the number of unique values in the column (that is, on the column’s cardinality). Some RDBMSs use this technique simply as a way of compressing data; the column tokenization process is integrated into the buffer and I/O subsystems, and when a query is executed, each row needs to be materialized and the token IDs replaced by their corresponding values. At Informatica for the File Archive Service (FAS) part of the Information Lifecycle Management product family, column tokenization is the core of our technology: the tokenized structure is actually used during query execution, with row materialization occurring only when the final result set is returned. We also use special compression algorithms to achieve further size reduction, typically on the order of 95%.
One theme I briefly touched on in my previous post was the desire for consistency and synchronization of shared master data for the community of consuming business processes and applications. However, in our experience, from a business process perspective, synchronization goes beyond timeliness or currency of the data associated with any particular data domain. Yet in retrospect, one of the most common approaches to designing and deploying master data management projects has focused on implementing a single data domain at a time, concentrating on the consolidation (or as I often term it, “dump”) of data into the master repository. (more…)