Data Matching Technology

Quit Futzing Around and Clean Up Your Data Already!

Unless you’ve been living under a rock, by now you know that the cost of having dirty data is huge. A simple search for “the cost of dirty data” will yield more than two billion results. We’re not going to cover the topic but you don’t have to trust us. Let Me Google That for you. Also, in the spirit of managing scope for this article, We’re going to focus only on your company database–you know, your accounts and your prospects.

There are two very basic questions you need to be able to answer with your data to consider your data “clean” :

  •  How many customers do you have?
  • Who are your customers?

 In the overused metaphor of Maslow’s Hierarchy of Needs, answering these questions would be akin to having food and shelter.

 Let’s dig in.

How Many Customers Do You Have?

The list of problems with data duplication is long but the biggest issues are tied to finance reporting when transactions aren’t tied to a single account, and the loss of credibility when you introduce yourself for the fifth time to a company that’s been your customer for 10 years—or worse, your CEO asks you the simple question, “How many customers do we have” and you can’t give a straight answer.

Those who have tried to deduplicate at scale appreciate the difficulty of the effort. Consider these records. Are any of them duplicates of the other?

Excel says there are no duplicates in that list.

SQL says there are no duplicates in that list.

All fuzzy matchers say there are no duplicates in that list.

The human says, “I don’t know, but let me check Google maps.” Two minutes later, the human has an answer. (If you have a few thousand rows of data to deduplicate, you might be able to go this route.)

So, here’s your “Step 1” in cleaning your data. Find a matching tool or a matching service that understands that there are two (2) business locations in the data above because it understands that Loop 360 and N Capital of Texas Hwy are the same road and that North Cat Mountain is a neighborhood in Austin

Match and Append’s match technology knows these things and we scoff at so-called “matchers” that don’t.

In the end, you want to know which accounts are the same business entity:

You have options once you’ve identified the duplicates. Eg., Report on financial transactions belonging to each customer. Create golden records. Deprecate duplicate records. You have options.

Who Are Your Customers?

If the first most embarrassing question your CEO can ask that you can’t answer is, “How many customers do we have,” then the second most embarrassing question you can’t answer is, “Who are our customers?” You better be able to answer in terms of industries, company sizes, and geographies – basic firmographics — at a minimum!

Companies like Match and Append has an excellent set of firmographic data. They also give you the added benefit of having contacts for your accounts.

So, here’s your “Step 2” in cleaning your data. Contact Match and Append and ask us to append firmographics to your list. Make sure you get address level match precision.

We also recommend going beyond the record-by-record data and into Corporate Linkage so that the relationships between records are known. It’s incredibly valuable to know that multiple companies are related to one-another through a common parent, or ultimate parent. Be sure

We have technology in the form of integrated applications or APIs that will keep your new firmographics up-to-date. Ask about that, too.

See, It’s Not That Hard

Getting your data clean requires that you compartmentalize the effort. Deduplicate your data and add basic firmographics. Period. Be leery of the scope creep that will be demanded by those who are not responsible for cleaning the data–data to define your Ideal Customer Profile (“ICP”), to fill in whitespace, to determine purchase intent, to reveal other insights. All are very valuable but none have anything to do with cleaning your data. Once you’ve compartmentalized, focus maniacally on cleaning your data so that you can stop the bleeding of costs associated with bad data, and then move on to increasing the strategic value of your data by addressing ICP and other attributes that your company needs to grow.

So, quit futzing around and clean up your data already!


Ray Renteria