How to Remove False Identifiers From Your Consumer Data

Join Our Community

Bad consumer data has taken over big data, and it’s costly and ubiquitous. But there’s still hope for marketers. Here are 4 actionable steps to find and remove false identifiers.

Key Takeaways

  • False identifiers create bad data, which is expensive and wastes a lot of time
  • 4 steps to eliminating false identifiers:
    1. Be cautious of data sources (and the top sources of false information)
    2. Watch for excessive MD5s on one IP address
    3. Beware of using proxy servers
    4. Identify characteristics of spoofed devices

Advertising fraud is extremely costly on a global level. One report from the Association of National Advertisers (ANA) showed that economic losses total $5.8 billion because of this issue. 

However, knowledge is the first step in finding a solution. The report also revealed that that number had decreased by 11% since 2017, and the amount of fraud actually getting through enhanced systems continues to decrease. Thus, the ANA believes that the war on fraud is working and the problem will continue to improve.

No matter how satisfied the ANA is with this progress, it’s still up to you to take all possible steps to keep false identifiers from your company’s data. Here’s a look at the challenges that false identifiers create and four steps you can take to reduce them.

What problems do false identifiers create?

BDEX conducted an analysis that found that 25% of more than a billion device identifiers from a large sample of top providers were invalid. Additionally:

  • 20% of MAIDs had errors
  • 2% of email MD5s in the U.S. data market had errors
  • 21% of email MD5s had links to over 10 separate MAIDs (a sign of invalid or fraudulent identifiers)

Sometimes fraud is intentional and other times it’s inadvertently created by user error. In either case, once this data is sold and resold on the market, the invalid information quickly proliferates. 

Bad data is expensive. On an organizational level, marketers can increase return on ad spend by 43% if they take steps to eliminate bad data from their systems. Bad data also continues to get in the way of marketers making the connections they’re trying to make. They may send messages to the wrong people or miss an opportunity to reach a consumer at a crucial moment in their buying journey. 

You waste both time and money as long as bad data is being stored in your systems and used in your marketing, but that’s not all. Bad data also gets in the way of making effective data-based business decisions, both for now and into the future. Without the right information, it’s nearly impossible to take advantage of all the opportunities big data offers marketers.

Next, let’s look at ways you can remove false identifiers and thus avoid the problems that bad data creates.

4 ways to remove false identifiers

1. Be extra aware of data sources

One of the first ways you can fight against ad fraud is to only use data sources that prioritize data quality. Data quality requires advanced use of automation tools so false identifiers can be detected and removed from the ecosystem. 

But it’s also important to know the birthplaces of false identifiers. Where does all that ad fraud come from? Primarily, fraudsters are trying to improve social media rankings, increase website traffic, or increase ad revenue with more clicks or views of ads. Here are a few sources to be most aware of:

  • User error (e.g., data is incorrectly entered)
  • Fraud:
    • Spoofing (e.g., a device is posing as a different device)
    • Bot fraud (e.g., fraudsters create fake devices)
    • Geomasking (e.g., a legitimate location of a device is disguised)
    • Click farms (organizations that employ people to click on ads to increase traffic)
  • Invalid MD5s and MAIDs
  • Unactionable identifiers
  • Invalid ID links

All of these sources produce invalid data that is then sold and resold in the data marketplace. Knowing where bad data comes from can help you recognize it and guard against it.

2. Watch for excessive MD5s on one IP address

A major indicator of fraudulent identifiers is if email MD5s are linked to many different MAIDs. There should never be an excessive number of linkages with legitimate information since the goal is to connect these IDs to specific, individual consumers. 

For example, most IP addressed have less than seven email addresses associated with them, and most email addresses have less than five IP addresses linked to them. Anything more than that may indicate fraudulent information. 

3. Beware of using proxy servers

A proxy server is a resource that acts between you and the original data source to deliver information, and they are common in the world of big data. While this system in and of itself is not always fraudulent, be cautious when using proxy servers. 

Direct relationships when recording data are generally more valid and reliable. Find data sources that are verified across channels and have processes in place to filter out bad data.

4. Identify characteristics of spoofed devices

It can be easy to detect false identifiers by looking for the warning signs of spoofed devices. For example, if one account is liking or following thousands of social media pages, or if a website suddenly receives lots of traffic in mere moments that is uncharacteristic of trends and seems unlikely, those may well be coming from spoofed devices.

Sometimes it just takes common sense to guard against bad data. If you think that a behavior or identifier appears fraudulent, err on the side of caution to avoid costly mistakes.

Why work with BDEX for data quality?

BDEX prioritizes data quality across all our data solutions. Because our research has revealed just how much bad data is out there, we continually take steps to combat this trend. Our ID graph is the highest quality graph in the industry and provides over 800 million email MD5-MAID-IP matches from reliable sources. 

Our data solutions empower you to make more meaningful connections with your customers at the exact right moment with real-time targeting. We have over 6 billion unique IDs, more than 5,500 data categories to choose from, and over a trillion data signals in total. 

To learn more about our commitment to data quality or to find out how our Data Exchange Platform provides more user data than any other platform, contact the BDEX team today.