Data Prep 1, 2, 3: Alleviating HR Data Growing Pains

Mergers bring the unknown, including unforeseen data challenges.

Picture this. Yesterday you had 100,000 employees. Today you have 175,000 employees. The merger is official.

Now comes the fun part.

If you’ve ever gone through a merger, you know firsthand that it’s not necessarily all roses and cotton candy for everyone.

For some, it can be one of the most turbulent, uncertain, and stressful — or perhaps, exciting — times of your career. Waves of unpredictable layoffs. Restructuring. Reorganization. Change in benefits. Change in offices. Change in technology. Change in processes. Change management. Change, change, change…

Along with the inherent impact on employee morale and productivity, a huge, messy byproduct of mergers is … data. And tons of it.

What do you do with all those employees? All those different applications, systems, and processes? All that data?

And who’s responsible for making sure the employees are accounted for and taken care of during this transitional period? Human resources.


The HR Data Dilemma of Turning Two Companies into One


Combining employee records from two companies introduces some major data management growing pains for the HR department.

The Human Resources department is summoned front and center to begin the critical (and painstaking) task of consolidating and reorganizing employee records. Which is further complicated if both companies have different naming conventions for job titles with the same roles. (And, let’s face it, most companies have their own way of labeling jobs and defining responsibilities.)

The unassuming task of pulling thousands of employee records and making all the job titles, descriptions, and ID codes consistent is like telling HR teams to run across a tightrope, over a fire pit, with their eyes shut.

There’s no time for missteps.


For HR, mergers often involve intense hours of manually consolidating, standardizing, and moving employee data from both companies to create new master records.

How Can HR Stay on Top of This Massive Data Influx?

Dealing with the aftermath of a big merger isn’t exactly what your HR department signed up for. We’re talking countless hours of data profiling, reformatting, editing, filtering, sorting, cleansing, enriching, combining, and standardizing values. (Beyond tedious.)

When two companies become one, these detail-intense data consolidation and data migration projects become urgent, time-sensitive priorities for HR analysts and technology teams.

And while some teams might default to Excel to do this, smart HR teams use automated data preparation tools to ensure data is properly cleansed and enriched before creating new master records.


Automated data prep tools dramatically help:

  1. Expedite the process of standardizing and blending records together
  2. Reduce the number human errors associated with manual data manipulation
  3. Profile, re-categorize, and update employee records accurately and efficiently

By using an automated data preparation tool like REV, you can save mountains of time and minimize the manual effort of combining employee records from both companies. Plus, you don’t have to rely on error-prone techniques like custom scripting, creating formulas, or cutting and pasting to get this project done.


10 Common HR Data Prep Challenges (And How to Plow through Them with REV)

Using REV, HR teams can breeze through some of the tedious, repetitive, and necessary data preparation tasks for consolidating and standardizing employee records from both companies to create new master employee records.

Let’s go over 10 common issues HR runs into during these projects and specific ways you can tackle them with REV.

  1. Standardizing. HR is expected to create new master employee records with consistent job titles, job descriptions, and salaries, etc. But each company has different naming conventions for job titles, departments, or divisions.
Standardize text fields in REV with mass find and replace edits.
Yikes! There are multiple naming conventions for the same job: “Sales Representative” is the same job as “Sales Rep.”


Standardize values in Informatica REV
Let’s fix this! Double-click “Sales Rep” and type “Sale Representative” to update all the corresponding fields in the sheet.


Informatica REV lets you standardize text values in a column with mass edits.
All updated! REV readjusted the value frequencies for “Sales Representatives” to the new total: 23 (i.e., 13 Sales Representatives + 10 Sales Reps changed to 23 Sales Representatives). Neat, huh?


2. Incomplete and incorrect data. Each company has its share of missing data and each has their own employee ID numbers, job location ID codes, addresses, benefit elections, and employment status (full-time, part-time, or contractor).


Informatica REV's data profiling lets you spot outliers and inconsistent values
Instantly spot blank and inconsistent values. REV shows you the numeric and text values so you can fix these issues in the entire column, at one time.


Informatica REV lets you fix values with mass edits.
To change “Ireland” to a numeric code, double-click it, type the correct code, and push Enter to do a mass edit (i.e., a find-and-replace to update that column.)


3. Partial duplicate data. Finding and fixing partial duplicate data entries is like going on an egg hunt without knowing what you’re looking for. Each company might have employees with multiple records due to job title changes, department shifts, or relocating to another job site. But how can you tell? And how can you find those “fuzzy” matches?


Find partical duplicate records with Informatica REV
Hmmm… Looks like Cersei Lannister has two records with two different Employee ID numbers. Now that we’ve isolated these partial duplicate records, we can cross-reference other HR records and update this sheet with the valid and most recent employee ID number. Otherwise, we might’ve missed this and Cersai would’ve had two records in the new HR system.


4. Spotting and removing exact duplicate entries. Finding pesky duplicate data has always been a pain in the you-know-what, especially when you don’t know where they are and how many there are.


Informatica REV resolve duplicates
REV lets you zap duplicate data. Just right-click the column you want to dedupe and select “Resolve duplicates.”


Delete duplicate rows with Informatica REV
REV shows you a preview of the duplicate rows and asks if you want to delete the perfect duplicates or skip to the next set.


5. Reformatting dates. “January 29, 2015”, “Jan/29/2015”, “01/29/2015”, “1-29-15”, or “29-Jan”? Same date, different formats. Missing information. It happens. But that really needs to be uniform and accurate (not to get all Type A, but dates should be consistent).


Change date formats in Informatica REV
Change the date format by right-clicking the date column. Select Format. Choose the new format you want. If there are missing dates, you can do some cross-checking and fill in the appropriate dates as you find them.


6. Splitting columns. Employee names might be recorded in one column by one company and two columns by the other. Split name columns into two to separate first and last names, so you can sort by alphabetical order according to last name.


Informatica REV split column
Create two columns by right-clicking the Name column and selecting “Split.”


Informatica REV split column by whitespace
Select the number of columns you want: 2. Choose “Whitespace” since we want to split the column based on the space between the first and last name. REV shows you a preview of the two new columns that’ll replace the original Name column. Looks good, click done!


7. Pinpointing outliers or anomalies. Finding mistakes in your datasets or exceptions that don’t belong can be an overwhelming task. You never know what’s in there. Or how many data entry mistakes there are. (You might not even know where to start or how to find abnormal data hiding in your spreadsheets!)


Informatica REV shows you anomalies and numeric distribution summaries
REV reveals all the quirky data buried in your spreadsheet. For this example, I clicked the Employee ID column to see the Numeric distribution summary. We should probably investigate the anomalies hiding in the red zone.


Find mistakes and errors in your spreadsheet with Informatica REV
Eeek! Looks like we have some missing employee ID numbers and one ID with a typo (that “t” sure doesn’t belong there!).


8. Data profiling. HR needs to understand what kind of data is in each spreadsheet in order to determine the best way to combine it with the other company’s employee information. Using REV’s data profiling capabilities, HR can quickly see the type of data in each column as well as the number of times those values show up.

For instance, if you want to know more about the current reporting structures and team sizes, you can click the column header for managers and review the information in the Value frequencies panel to see how many people report to each person. This gives you immediate insights for creating new org charts and identifying redundant positions.


Want to find out how many people report to Jim Jones? Click on the Manager column and REV tells you that Jim Jones manages 14 people in the sales division. You can also see a summary of the other managers in the Value frequencies panel. Slick and quick!


9. Blending data. Consolidating spreadsheets with thousands of records can take hours and hours. And for some, it may be a frustrating nightmare in Excel. (It’s tolerable, at best.)


Blend data in Informatica REV
REV gives you two options to combine sheets together: “Join” and “Union.” Click the “Blend” icon and select “Join sheets” if you want to add columns to existing rows. For example, if you had Ana Lucia’s record with her employment information, address, and email address and you wanted to add her benefits information, you would need to join the sheets together in a side-by-side manner. Basically, this lets you bring in her benefit information to complete her original record.


Informatica REV Union lets you blend two sheets together.
You can also blend two sheets together with the “Union” function.


Informatica REV union
Think of “Union” like you’re adding on to the the existing list. You’re continuing the list vertically, adding more employees to your list, essentially adding more rows of data. You could use “Union” to add the first company list of employees to the second list of employees. Awesome, right?


  1. Collaborating with your team. Teamwork is great. But, you know the feeling… You just spent hours reformatting and cleaning up a messy spreadsheet. You check your email only to discover that your co-worker sent over more data that needs to be included. Now what?


Informatica REV data refresh
With REV, you can collaborate with your co-workers without messing up your dataset or redoing your steps. You can easily add new data to your project, no worries. The same data prep steps will be applied automatically with the Refresh function.


Informatica REV Refresh Data
If you make any changes to the original sheet, you can hit “Refresh From Parent Sheet.” Those updates will be incorporated into the blended dataset with the same data prep steps you used before. Pretty cool!

Keep Your HR Department Happy, Master Your Employee Records


Master your HR records, Keep your HR department happy
After your employee records are updated and standardized, you can minimize future data chaos by keeping your HR records mastered.

Mergers bring change. That’s a given. And some HR teams may experience job-related stress as they struggle to manage new employee records, systems, and policies.

While part of HR’s job is to look out for employees, we need to remember to look out for HR teams during this transitional time as well. They are, after all, employees too… And they need the right resources so they can get back to the jobs they were hired to do.

Even though there’s no one-size-fits-all prescription for how to handle these types of data consolidation and migration projects, HR teams can use automated data prep tools like REV to sprint to the finish line … ahead of schedule. Without losing sleep. Without scrambling for resources. Without pulling out their hair.

As your company continues to change, you can alleviate future data pains by properly maintaining Employee Master records with MDM and applying regular data quality checks.

That way, the next time your organization merges with another company or decides to downsize, you’ll be prepared for any kind of data challenge.

Ever wonder how data analysts spend their day?  


Microsoft Word - RT-A0213-The Case for Dedicated Data Prep_Bench

Blue Hill Research was curious too. So they surveyed 186 data analysts to find out how they spend their time and what tools they use the most to prepare data for analysis.

What did Blue Hill analyst James Haight find out?

Analysts spend a disproportionate amount of their day curating, preparing, enriching, validating, and manipulating data.

How much time?

Anywhere from 2-8 hours a day! (Unfathomable…)


You’ll be surprised when you see the rest of the results in the Blue Hill benchmark report, “Quantifying the Case for Enhanced Data Preparation.”

By reading Blue Hill’s Benchmark report, you’ll learn:

  • The limitations, risks, and expense of using traditional methods of preparing data
  • Macro trends affecting data preparation challenges
  • Time and cost savings of using dedicated data preparation solutions
  • The most popular data sources, as well as emerging data sources
  • Specific recommendations for increasing data analysts’ productivity

Get your free copy today.