Visualizing Hierarchies as Graphs in MDM

Today’s world is all about relationships! Be it social media like Facebook or business oriented social networking like linkedIn, relationships play an extremely important role in bringing real value to our lives.

The traditional way of defining relationships in a database has always been RDBMS. And it is strange to believe that we name it “Relational” DBMS, it actually contradicts itself when it comes to finding relationships through complex queries!

Graph database has emerged as the most preferred way of defining relationships recently. Most of social media sites have defined their own social graph that stores the information about people, what they do, how do they relate to each other. e.g. Google defined web graph to capture how the different web documents are connected to each other.

Today big companies are implementing Master Data Management to master the customer data, product data, location data etc. and try to relate it to make the data look more sensible. However it becomes difficult when it comes to non-hierarchical data. Most of the times, master data is strictly not hierarchical. RDBMS may make things little difficult with growing complexity to pull hierarchical data.

The challenges with MDM systems are increasing lately. It is mainly because of the lack of support for hierarchical data relationships. Moreover, master data is strictly never hierarchical. MDM systems, that are based on RDBMS concept, generally tends to follow the waterfall approach to data modeling and business leadership understand a little about physical data model [ but they may have better understanding of conceptual data model ].

Data analysis is a challenge with relational databases, but they are good at storing information. However, complex relational data models and hierarchical approaches in fact don’t align completely with the real world.

Many MDM systems came into existence when nobody imagined that they would receive the feedback over social channel. In recent days, many companies try to tap the consumer’s sentiments and feedback about their products on Twitter or other social channels. e.g. what the consumers have to say about a particular drug? These MDM systems don’t have the capability to ingest log files, machine level data, videos and unstructured data. In fact, they struggle to make sense out of it and show it in real time to the external applications.

In a typical organizational hierarchy model, one employee reports to one other employee. In turn, this employee reports to other employee and so on. This kind of recursive relationship can be easily implemented in RDBMS. However, it won’t work that gracefully once the there are few changes in the organizational hierarchy. E.g. somebody got promotion or somebody leaves the organization or a merger is happening.

In real world, it is not that employee reports to only one manager. Most of the times the employee is reporting multiple managers as he is working in a project with the project manager, his mentor in the organization is a different manager etc. These hierarchies are actually graphs.

The relational databases provide good structure and accessibility for most of data; they also have limitations which have given rise to a new class of databases that address specific needs for dealing with extremely large or complex data. Called “NoSQL” (or “not-only SQL”) they are designed to overcome specific data management challenges such as providing rapid data access to power real-time applications, bringing order to data in non-traditional formats, or avoiding the costs and turnaround time required to develop a conventional database schema. Five major classes of NoSQL databases have emerged: column families (also known as “wide-column stores” or “columnar databases”), document, graph, key-value and XML (also known as “native XML”).

Graph – These use a graph structure, essentially a diagram of the relationships within the data, in place of tables.

The graph database modeling is a very natural and intuitive way of data modeling. This is because; it captures the “as-is” image of relationships drawn on a piece of paper. So a business analysts’ team may end up owning it instead of the IT team.

There is as such no much differentiation around physical and logical database in graph database vs RDBMS. Most of the times whatever we draw on the whiteboard in terms of nodes, edges and relationships end up being the physical data model. As such there is no different logical data model and physical data model.

For any MDM project, it is expected to have better collaboration between IT and business team to achieve a common goal. That gap can be bridged using graph database.

Graph databases can even be used to identify potential duplicates in an MDM system based not just on matching data elements, but by inferring relationships between instances of master data.

To make things simpler and to take a close look at how we can design a Graph Database, let us take an example from Life Sciences. Assuming two doctors, Ram and Tim are working for a hospital GMC and have graduated from same University. They have their own specialization and work in different capacity for the hospital. Tim also visits another hospital, say called Columbia, on emergency calls and has a role defined there too.

mdmThe business users  and management leaders may be the one who can define the relationships and use their functional knowledge to come up with simple data model. Here is how it would look like in one of Graph Databases :

Had it been a data model designed for RDBMS, it would have been in hands of professional RDBMS DBAs. As a result, few relational tables with foreign keys from primary tables of Doctor, Hospital & University would have been created. Needless to say, complex queries would have become another challenge in less than few months!

To ensure successful MDM projects that takes care of complex and growing relationship, one must take following points into consideration:

  • Maintain the dependability and trustworthiness between the real world scenarios and the way you store/retrieve the data
  • Define Complex Relationship Challenges and adapt new age ideas to address it
  • Make way for new relationships inside your data

Most of these are addressed with Graph Database. Are you ready to embark on this journey of implementing Graphs in MDM world?

Comments

  • Tim Duffy

    Amol,
    Great job. This is an excellent discussion about how Graph technology is becoming a disruptor to the old way of doing things and why it is better