David Loshin

David Loshin
David Loshin, president of Knowledge Integrity, Inc, (www.knowledge-integrity.com), is a recognized thought leader and expert consultant in the areas of data quality, master data management, and business intelligence. David is a prolific author regarding business intelligence best practices, as the author of numerous books and papers on data management, including the just-published “Practitioner’s Guide to Data Quality Improvement,” with additional content provided at www.dataqualitybook.com. David is a frequent invited speaker at conferences, web seminars, and sponsored web sites and channels including www.b-eye-network.com. His best-selling book, “Master Data Management,” has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at www.mdmbook.com. David can be reached at loshin@knowledge-integrity.com.

Data Prefetching and Caching

Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 3

OK, so in our last two discussions, we looked at the memory bottleneck and how even in high performance environments, there are still going to be situations in which streaming the data to where it needs to be will introduce latencies that throttle processing speed.  And I also noted that we needed some additional strategies to address that potential bottleneck, so here goes: (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Tagged , , , , , , , , , | Leave a comment

Some Thoughts about Data Proximity for Big Data Calculations – Part 2

Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 2 

In my last posting, I suggested that the primary bottleneck for performance computing of any type, including big data applications, is the latency associated with getting data from where it is to where it needs to be. If the presumptive big data analytics platform/programming model is Hadoop, (which is also often presumed to provide in-memory analytics), though, there are three key issues: (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Tagged , , | Leave a comment

The Pain is the Performance (or is it the Lack Thereof?)

Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 1

“Big Data” is all the rage – it is virtually impossible to check out any information management media channel, online resource, or community of interest without having your eyeballs bathed in articles touting the benefits and inevitability of what has come to be known as big data. I have watched this transformation over the past few years as data warehousing and business analytics appliances have entered the mainstream. Pure and simple: what was the bleeding edge of technology twenty years ago in performance computing is now commonplace, with Hadoop being the primary platform (or more accurately, programming environment) for developing big data analytics applications. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Warehousing | Tagged , , , , , | Leave a comment

Simplistic Approaches to Data Federation Solve (Only) Part of the Puzzle – We Need Data Virtualization (Part 2 of 3)

In my last post, I introduced the concept of data federation, which for now I would like to differentiate from data virtualization – a term that I’ll bring into focus in a bit. But first, we explored two issues: data accessibility and data latency. Within recent times, the sophistication of data accessibility services has matured greatly, to the point where one can somewhat abstract those accessibility services from the downstream consumer (or “reuser”) of data. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in SOA, Ultra Messaging | Tagged , , | Leave a comment

Fundamental Challenges in Data Reusability and Repurposing (Part 1 of 3)

Apparently, everyone’s favorites words these days are “big data.” But just because some new tools and techniques promise the potential of absorbing and analyzing huge amounts of data from a variety of sources, it does not mean that installing Hadoop in your enterprise is going to automatically help you to get new insights from existing and “big data,” faster. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Services, SOA | Tagged , | 1 Comment

Considerations for Multi-Domain Master Data Modeling – Part 4

OK, so which master data modeling approach is better? Pre-packaged or business model-driven? My personal opinion is to look at which approach best addresses my hot-button issues (inconsistent reference data, inconsistencies in structure, inconsistencies in meanings). (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Master Data Management | Tagged , , | Leave a comment

Master Data Consolidation Versus Master Data Sharing: Modeling Matters! – Part 3

In my last post: Master Data Modelers, I alluded to a fundamental issue with the way that some organizations drive their master data management project plan, and that a fundamental issue influences the modeling approach that is taken. It centers on what should be a very simple question: is MDM about data consolidation or data sharing? (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Master Data Management, Uncategorized | Leave a comment

Master Data Model Alternatives – Part 2

Last time I introduced two different approaches for master data models and thought it would be worth examining the differences in greater detail.

The first approach is to use pre-packaged core models provided by a vendor as part of an overall MDM suite of tools. Often these types of products evolved out of industry applications in which a common information model was used to support specific types of enterprise applications. For example, a vendor might have analyzed the property and casualty insurance industry and developed core data models for customer, policy, claim, service, financial products, etc. A set of application layers may have been developed on top of these models to implement common workflows (customer risk rating for establishing premium rates, or initiating a claim). However, there is a perception that aspects of those industry-oriented models can be segregated into a more universal format, which can become the starting point for a prepackaged master domain. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Master Data Management | Tagged , , , | 1 Comment

Structure, Semantics and Master Data Models – Part 1

Looking back at some of my Informatica Perspectives posts over the past year or so, I reflected on some common themes about data management and data governance, especially in the context of master data management and particularly, master data models. As both the tools and the practices around MDM mature, we have seen some disillusionment in attempts to deploy an MDM solution, with our customers noting that they continue to hit bumps in the road in the technical implementation associated with both master data consolidation and then with publication of shared master data.

Almost every issue we see can be characterized into one of three buckets: (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Master Data Management | Tagged , , | Leave a comment