Category Archives: Identity Resolution

Garbage In, Treasure Out: Real-World Use Cases

Last week I described how Informatica Identity Resolution (IIR) can be used to match data from different lists or databases even when the data includes typos, translation mistakes, transcription errors, invalid abbreviations, and other errors.  IIR has a wide range of use cases.  Here are a few. (more…)

Posted in Big Data, Data Governance, Data Integration, Enterprise Data Management, Identity Resolution, Integration Competency Centers | Tagged , , | Leave a comment

Garbage In, Treasure Out

Even in “good” data there is a lot of garbage. For example a person’s name.  John could also be spelled as Jon or Von (I have a high school sports trophy to prove it).  Schmidt could become Schmitt or Smith. In Hungarian my name is Janos Kovacs. Human beings entering data make errors in spelling, phonetics, and keypunching.  We also have to deal with variations associated with compound and account names, abbreviations, nicknames, prefix & suffix variations, foreign names, and missing elements. As long as humans are involved in entering data there will be a significant amount of garbage in any database.  So how do we turn this gibberish into gems of information?


Posted in Data Integration, Data Transformation, Identity Resolution, Integration Competency Centers | Tagged , , , | Leave a comment

Social MDM and Future Competitive Intelligence

As if Master Data Management (MDM) as we know it today isn’t hard enough, we may have new challenges (and opportunities) ahead related to the drastic growing of social networks and the appetite among organizations for digging into big data.

In traditional MDM we aim to optimize the identification and descriptions of the who, what and where in traditional systems of record. Basically we handle our own products, our present suppliers, our current customers and known prospects and the related locations. When moving on to Social MDM we aim to link those entities to the who, what and where in systems of engagement so we may better handle descriptions of our own products, collaborate with suppliers and follow our customers and known prospects footprint in the digital world. (more…)

Posted in Identity Resolution, Master Data Management | Tagged , | 1 Comment

Key Data Challenges to Overcome for FATCA Compliance

While Dodd Frank received most of the media attention after the great financial crisis, during that period, the U.S. government signed into law the Foreign Account Tax Compliance Act (FATCA) back in March 2010 which will require Foreign Financial Institutions (FFIs) to report the names of U.S. persons and owners of companies who have bank accounts in foreign accounts for tax reporting and withholding purposes.

The law was set to go into effect on January 1, 2013 however on October 24, 2012, the U.S. Internal Revenue Service (IRS) announced a one year extension to January 1, 2014 to give FFIs more time implement procedures for meeting the FATCA reporting requirements. Banks who elect not to comply or fail to meet these deadlines will be tagged as a ‘non-participating FFI’ and subject to a 30% withholding tax on all U.S. sourced income paid to it by a U.S. financial institution. Ouch!!

The reasons for FATCA are fairly straight forward. The United States Internal Revenue Service (IRS) wants to collect its share of tax revenue from individuals who have financial accounts and assets in overseas banks. According to industry studies, it is estimated that of the seven million U.S. citizens and green card holders who live or work outside the U.S., less than seven percent file tax returns. Officially, the intention of FATCA is not to raise additional tax revenue but to trace its missing, non-compliant taxpayers and return them to the U.S. tax system. Once FATCA goes into effect, the IRS expects it will collect an additional $8.7 billion in tax revenue.

Satisfying FATCA reporting requirements will require banks to identify:

  • Any customer who may have an existing U.S. tax status.
  • Customers who hold a U.S. citizenship or green card.
  • Country of birth and residency.
  • U.S.-based addresses associated with accounts – incoming and outgoing payments.
  • Customers who have re-occurring payments to the U.S. including electronic transfers and recipient banks located in the U.S.
  • Customers who have payments coming from the U.S. to banks abroad.
  • Customers with high balances across retail banking, wealth management, asset management, Investment and Commercial Banking business lines.

Although these requirements sound simple enough, there are many data challenges to overcome including:

  • Access to account information from core banking systems, customer management and relationship systems, payment systems, databases and desktops across multiple lines of business which can range into the hundreds, if not thousands of individual data sources.
  • Data varying in different formats and structures including unstructured documents such as scanned images, PDFs, etc.
  • Data quality errors including:
  • Incomplete records: Data that is missing or unusable from the source system or file yet required for FATCA identification.
  • Non-conforming record types: Data that is available in a non-standard format that does not integrate with data from other systems.
  • Inconsistent values: Data values that give conflicting information or have different definitions with similar values.
  • Inaccuracy: Data that is incorrect or out of date.
  • Duplicates: Data records or attributes are repeated.
  • Lack of Integrity: Data that is missing or not referenced in any system.

Most modern core banking systems have built in data validation checks to ensure that the right values are entered. Unfortunately, many banks continue to operate 20-30 year-old systems, many of which were custom built and lack upstream validation capabilities. In many cases, these data errors arise when combining ‘like’ data and information from multiple systems. Given the number of data sources and the volume of data that banks deal with, it will be important for FFIs to have capable technology to expedite and accurately profile FATCA source data to identify errors at the source as well as errors that occur as data is being combined and transformed for reporting purposes.

Another data quality challenge facing FFI’s will be to identify unique account holders while dealing with the following data anomalies:

  • Deciphering names across different language (山田太郎 vs. Taro Yamada)
  • Use of Nicknames (e.g. John, Jonathan, Johnny)
  • Concatenation (e.g. Mary Anne vs. Maryanne)
  • Prefix / Suffix (e.g. MacDonald vs. McDonald)
  • Spelling error (e.g. Potter vs. Porter)
  • Typographical error (e.g. Beth vs. Beht)
  • Transcription error (e.g. Hannah vs. Hamah)
  • Localization (e.g. Stanislav Milosovich vs. Stan Milo)
  • Phonetic variations (e.g. Edinburgh – Edinborough)
  • Transliteration (e.g. Kang vs. Kwang)

 Attempting to perform these intricate data validations and matching processes requires technology that is purposely built for this function. Specifically, identity matching and resolution technology that leverages proven probabilistic, deterministic and fuzzy matching algorithms against any data of any language, capable of processing large data sets in a timely manner and that is designed to be used by business analysts versus an IT developer. Most importantly, being able to deliver the end results into the bank’s FATCA reporting systems and applications where the business needs it most.

As I stated earlier, FATCA impacts both U.S. and non-U.S. banks and is as important for the U.S. tax collectors as well as to the health of the global financial and economic markets. Even with the extended deadlines, those who lack capable data quality management processes, policies, standards and enabling technologies to deal with these data quality issues must act now or face the penalties defined by Uncle Sam.

Posted in Data Governance, Data Quality, Financial Services, Governance, Risk and Compliance, Identity Resolution | Tagged , , , , | Leave a comment

Accountable Government – Stopping Improper Payments

As the federal government reported an estimated $115 billion in improper payments in Fiscal Year 2011, the impetus to eliminate and recover these funds continues to mount. State governments also struggle with mounting and often embarrassing improper payments with estimated totals approaching $125 billion.  (more…)

Posted in Complex Event Processing, Identity Resolution, Master Data Management, Public Sector | Tagged , , | Leave a comment

Reading The Tea Leaves: Predictions For Data Quality In 2012

Following up from my previous post on 2011 reflections, it’s now time to take a look at the year ahead and consider what key trends will likely impact the world of data quality as we know it. As I mentioned in my previous post, we saw continued interest in data quality across all industries and I expect that trend to only continue to pick up steam in 2012. Here are three areas in particular that I foresee will rise to the surface: (more…)

Posted in Data Governance, Data Quality, Identity Resolution, Master Data Management, Pervasive Data Quality, Profiling, Scorecarding, Uncategorized, Vertical | Tagged , , , , , , , | 3 Comments

Making Your Data Work for You

Yesterday, CIOs from Informatica, Qualcomm and UMASS Memorial Healthcare participated in a panel to discuss how to deliver business value from applications while managing “data deluge” – the ever increasing growth and fragmentation of data across the application portfolio. Having worked in the IT Applications area for 15 years, I know firsthand how big a challenge this can be for organizations.

We are experiencing an unprecedented growth in the sheer amount of data that can be made available. Sites like Facebook and Twitter provide exciting new insights into user preferences and habits and the move to electronic systems for utility companies and healthcare organizations means that an even larger set of information can be stored electronically for reference and used to gain new business insights. Even internal systems such as sales automation, marketing and support applications contribute to this overwhelming tide of data that can be extremely valuable but hard to unlock. (more…)

Posted in Big Data, Customer Acquisition & Retention, Customer Services, Customers, Data Integration, Data Quality, Identity Resolution, Master Data Management | Tagged , , , , , , , | Leave a comment

Why MDM and Data Quality is Such a Big Deal for Big Data

Big Data is the confluence of three major technology trends hitting the industry right now: Big Transaction Data (describing the enormous growing volumes of transactional data within the enterprise), Big Interaction Data (describing new types of data such as Social Media data that are impacting the enterprise), and Big Data Processing (describing new ways of processing data such as Hadoop). If you can imagine companies having problems with business-critical master data such as customers, products, accounts, and locations at current data volumes, now that problem is compounded many-fold with the growth into Big Data. That’s where MDM and Data Quality come in as the fundamental solutions. So, why is MDM and Data Quality such a big deal for Big Data? (more…)

Posted in Big Data, Customer Acquisition & Retention, Data Aggregation, Data Governance, Data Integration, Data Quality, Enterprise Data Management, Identity Resolution, Informatica 9.1, Informatica Events, Master Data Management, Profiling, Scorecarding | Tagged , , , , , , , , , , , , , , , , | Leave a comment

Customer Data Forum Off To A Great Start Featuring MDM

We launched a coast-to-coast Customer Data Forum road show with visits to Atlanta and Washington, D.C., that attracted business and IT professionals interested in using master data management (MDM) to attract and retain customers.

From the business side, our guests consisted of analysts, sales operations personnel, and business liaisons to IT, while the IT side was represented by enterprise and data architects, IT directors, and business intelligence and data warehousing professionals. In Washington, about half the audience was from public sector and government agencies. (more…)

Posted in Data Governance, Data Integration, Data Integration Platform, Data Quality, Data Services, Data Warehousing, Enterprise Data Management, Identity Resolution, Informatica Events, Master Data Management, Partners, Pervasive Data Quality, Profiling, Public Sector, Scorecarding | Tagged , , , , , , , , , , , , , , , , , | Leave a comment