Big Data is Adding a New Dimension to Customer Relationships

Matching Customer Data on Hadoop

Customer data has just started to explode in all major line of business. Talk about Insurance Companies or Airlines, it is adding terabytes of data each passing day. Challenge has become to maintain it rather than taking the advantage out of it. Generally, there are many 3rd party and non-traditional sources feeding the data to the client’s hub. Although adding more data does not necessarily mean you are adding dollars to the profit you are making. In fact, it is the other way round!! The more data you accumulate, more becomes your responsibility to understand it and get desired value out of it. Yes, even if it is in terabytes per day.

Imagine the customer data is segregated across silos and you don’t have any way to match it. At end of each month or quarter, it shall become a nightmare to query each system and build a dashboard out of it. Needless to say, the data quality would be very low and data redundancy would be high. Obviously data accuracy takes a big hit there impacting the decision making power of the customer.

Big Data Relationship Management

One of the possible solution is to get the data on HDFS clusters and start matching the customer records to form matching groups. Informatica Big Data Relationship Management  (BDRM) undoubtedly provides best solution to use matching on Hadoop. Here are few things that BDRM has to offer:

  • It creates clusters of related records using multiple match criteria on Hadoop to identify groups
  • Match data from 3rd party and non-traditional sources
  • Rapidly add new data sources to augment existing data
  • Real time identity search

BDRM uses powerful SSA Name 3 match engine that runs distributed matching and linking in parallel across multiple Hadoop nodes.

Use Cases

As such every customer who has a requirement of matching tons of customer data, can potentially use BDRM for matching. In case of cloud, it may take few weeks (or a month) to match millions of data. However, BDRM has an ability to match hundreds of millions of customer records in few hours. In fact during one of the POCs, BDRM has matched around 700+ million records in around 18 hours. Now, that could be a life saver!

Indeed, there are scenarios where customer may request the SI to on board millions of data during the weekend as he is not ready to sacrifice his profits and bring down the system during weekdays. Fair enough. And believe me, it is difficult for any matching tool to match records on HDFS clusters that fast. And how about the Map-Reduce jobs, it is extremely difficult to get resources who can write those lines of code and continue the maintenance. Guess what, BDRM does that for you. Yes, it generates the map-reduce jobs and loads the indexed and linked data into a repository for further analysis and visulatization.

Consider a use case, you work for a large insurance organization that wants to enrich the existing customer database with data from third-party data provider services on a Hadoop environment. The organization wants to compare the existing data with the third-party data to identify potential business prospects. The organization wants a 360-degree view of the customers to understand the relationship between them and come up with targeted marketing programs. This could make the best use case for BDRM to match the data, link the matched records to identify potential business prospects.

Key Features of BDRM

Single view of party, 360-degree view, Appends social data and Real-time search

Indisputably, BDRM is the right choice for improving big data analytics, infering non-obvious relationships, viewing social relationships and for achieving rapid results.



Before concluding, I would like to share with you our Customer Success Stories for BDRM.


  • shikarishambu

    Can you please provide some details of insurance companies adding terabytes of data in short span – what kind of insurance companies, what kind of data/ datasources. Specifically, I would be interested in learning about data in the context of Life/ Annuities/Health insurance (LAH).

    • Andrew Joss

      One area I’ve seen Life/Health Insurance companies taking on new data is around fitness trackers. These little devices have the potential to generate significant volumes of data. I wear one myself and have found that a single 60 minute period of exercise generates a file around 1MB in size assuming it’s capturing location, step and heart rate data. This could equate to 24MB of data day if I assume I’m not capturing location (as it won’t change when I’m sleeping) but do include sleep pattern data. That’s 8,760MB of data from one device and from one person per annum. On top of this could be other health data points such as weight, blood pressure, height, BMI, waist size, cholesterol levels etc.

      I wrote a blog on Health data in Insurance at the following link and I hope it helps:

  • Hai,
    Thanks for sharing the nice blog on Big data Relation ship management. Her,.e you will know more information about big data