David Loshin

David Loshin
David Loshin, president of Knowledge Integrity, Inc, (www.knowledge-integrity.com), is a recognized thought leader and expert consultant in the areas of data quality, master data management, and business intelligence. David is a prolific author regarding business intelligence best practices, as the author of numerous books and papers on data management, including the just-published “Practitioner’s Guide to Data Quality Improvement,” with additional content provided at www.dataqualitybook.com. David is a frequent invited speaker at conferences, web seminars, and sponsored web sites and channels including www.b-eye-network.com. His best-selling book, “Master Data Management,” has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at www.mdmbook.com. David can be reached at loshin@knowledge-integrity.com.

Aligning Big Data with MDM

In my recent blog posts, we have looked at ways that master data management can become an integral component to the enterprise architecture, and I would be remiss if I did not look at how MDM dovetails with an emerging data management imperative: big data and big data analytics. Fortunately, the value of identity resolution and MDM has the potential for both contributing to performance improvement while enabling efficient entity extraction and recognition. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Integration, Master Data Management | Tagged , , , | Leave a comment

Rationalizing Master Data as a Utility

One of the biggest issues I have with most MDM implementation is the sacrificing of assessing data consumer requirements in deference to data consolidation. In general, my complaint is that the creation of a master data repository does not guarantee any creation or improvement of value to the organization unless there are clearly-defined ways in which the master data sets are to be used. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Integration, Master Data Management | Tagged , | Leave a comment

Master Data and Integration – Data, Function, and Process

One theme I briefly touched on in my previous post was the desire for consistency and synchronization of shared master data for the community of consuming business processes and applications. However, in our experience, from a business process perspective, synchronization goes beyond timeliness or currency of the data associated with any particular data domain. Yet in retrospect, one of the most common approaches to designing and deploying master data management projects has focused on implementing a single data domain at a time, concentrating on the consolidation (or as I often term it, “dump”) of data into the master repository. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Integration, Master Data Management | Tagged , | 1 Comment

Cloud Computing and Master Data Management

In my conversations with different clients and prospects, I have detected both desire and apprehension when the topic of master data management in the cloud is raised. There are two competing pressures. The first is the steady migration of internal capability to cloud-based solutions, in which business process application functions are enabled via off-premise infrastructure. These capabilities are effectively activated through the migration of internal data to the cloud environment. The second is the concern about protection of sensitive data once it has been moved outside the organization’s administrative domain. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Cloud Computing, Master Data Management | Tagged | Leave a comment

Data Replication is the Cure

Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 4

To bring the discussion full circle, though, our original focus was on big data and the fact that even with an implementation of an analytical application (especially one developed using Hadoop), there were still scenarios in which data access latency proves to be a performance bottleneck. Luckily, the same data replication principles we discussed last week can be put to use in supporting big data analytics. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Leave a comment

Data Prefetching and Caching

Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 3

OK, so in our last two discussions, we looked at the memory bottleneck and how even in high performance environments, there are still going to be situations in which streaming the data to where it needs to be will introduce latencies that throttle processing speed.  And I also noted that we needed some additional strategies to address that potential bottleneck, so here goes: (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Tagged , , , , , , , , , | Leave a comment

Some Thoughts about Data Proximity for Big Data Calculations – Part 2

Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 2

In my last posting, I suggested that the primary bottleneck for performance computing of any type, including big data applications, is the latency associated with getting data from where it is to where it needs to be. If the presumptive big data analytics platform/programming model is Hadoop, (which is also often presumed to provide in-memory analytics), though, there are three key issues: (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Tagged , , | Leave a comment

The Pain is the Performance (or is it the Lack Thereof?)

Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 1

“Big Data” is all the rage – it is virtually impossible to check out any information management media channel, online resource, or community of interest without having your eyeballs bathed in articles touting the benefits and inevitability of what has come to be known as big data. I have watched this transformation over the past few years as data warehousing and business analytics appliances have entered the mainstream. Pure and simple: what was the bleeding edge of technology twenty years ago in performance computing is now commonplace, with Hadoop being the primary platform (or more accurately, programming environment) for developing big data analytics applications. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Warehousing | Tagged , , , , , | Leave a comment

Simplistic Approaches to Data Federation Solve (Only) Part of the Puzzle – We Need Data Virtualization (Part 2 of 3)

In my last post, I introduced the concept of data federation, which for now I would like to differentiate from data virtualization – a term that I’ll bring into focus in a bit. But first, we explored two issues: data accessibility and data latency. Within recent times, the sophistication of data accessibility services has matured greatly, to the point where one can somewhat abstract those accessibility services from the downstream consumer (or “reuser”) of data. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in SOA, Ultra Messaging | Tagged , , | Leave a comment