Understanding Data Quality and How to Measure It.

Wikipedia refers to Data Quality as “The condition of a set of values of qualitative or quantitative variables”. So what does this mean…Really?

The quality of data can be subjective based on how well it “fits” for the intended uses. People’s viewpoints on the same datasets are often in disagreement even when a purpose for the data is agreed upon.

It is next to impossible to collect error free data. To ensure quality data is used within your organization for your mailing lists a standard set of data cleaning requirements should be agreed upon.

Many organizations have a data administrator or “keeper” of the database. This person is often in the IT department and has a major role in making sure the data meets the standards set by the organization.

With data coming it at light speed, parameters must be in place to organize everything completely. You can use specific methods to do this. First by setting up your data profiling protocols.

What is Data Profiling?

Data profiling involves a review of all information in your database. You are determining if the data is accurate and/or complete. Steps need to be taken with all the data entries that don’t meet your criteria. For in-house data you may have certain fields or details needed to insure you have the exact information needed. On the flip side when using outsourced data, you may only need a specific set of attributes that allow for marketing.

Data profiling also takes into account how your data if formatted. Do you want specific date fields or address fields in a specific order so you can insure there are no duplicates. Ie. is your date field formatted at May 1st, 2018 or is it 5/01/2018? This type of data profiling structure helps insure your data is consistent and easier to maintain.

How to Determine the Quality of Your Data

With your data profiling structure set the next step is deciding what to do when you find errors. There are four basic protocols your can follow:

Accept the Error – Let’s say a street address is usable for postal mailing in two types of formats (i.e. Oak Street instead of Oak St.) Based on your circumstances you can decide if the entry needs attention.

Reject the Error – May times data imports can really damage your database. This usually happens when you have data that is completely non-applicable for specific fields. You may have seen examples of this where a CRM database has text information within the phone field. Or too much information in a name field. This is a crossroads point where a decision should be made if it is better to just delete the field information, spend time fixing or import as is.

Correct the Error – Common errors that are easily corrected are customer name misspellings, capitalizations, and number formatting. Decide on the formatted structure across all your datasets first and follow your protocol.

Use Default Values – Blank fields especially in CRM’s can lead to unanswered questions. Sometimes it’s better to use a default value such as n/a than leaving the information blank. That way the user can surmise the information wasn’t left out by error.

Data Quality End Result

Data quality is in the eyes of the beholder. Each organization must decide on their definition of accuracy and completeness. As Big Data grows more and more every year having a standard foundation in place will help insure better conversions rates, customer service, ROI etc.

To learn how M1 Data & Analytics can assist you with quality data contact us at 877-776-1195 Also check out our Definitive Guide To Mailing Lists to learn more.