Back to packs

CRM & Customer Data Quality

free

Clean customer data — validate emails, phones, addresses, deduplication, and completeness for CRM systems like Salesforce and HubSpot.

18 rules 530 downloads4.6 avg (38)
crmcustomersalesforcehubspotcontactsdeduplicationmarketing
4.6(38 ratings)

Sign in to rate this pack

Test this pack with your data

Download the template, fill in your data, and see quality results instantly.

Test This Pack

Download & Install

Choose your tool — get a ready-to-run file

Run this on your data? Upload your CSV — we'll auto-map the columns, validate, and report the bad rows.Test my data
Or use the CLI
$ npx dqhub install crm-customer-data --format soda --table YOUR_TABLE

About this pack

Data quality rules for customer/contact data in CRM systems. Covers: - Contact validation: email format, phone format (US and international) - Address: ZIP code, country code validation - Deduplication: exact and near-duplicate detection across customer records - Completeness: required customer fields, conditional requirements - Consistency: status field validation, cross-column checks - Freshness: stale customer records detection Works with data from Salesforce, HubSpot, Dynamics, or any CRM.

Sources & References

GDPR — Article 44-49

Cross-border transfer validation requires valid country identification

What's included

6format rules
5completeness rules
3uniqueness rules
2consistency rules
1referential integrity rules
1freshness rules

Checks included (18)

Valid Email Format(email)

Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.

Valid US Phone Number Format(phone)

Validates that values conform to a US phone number format. Accepts 10-digit numbers in common formats: (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, XXX XXX XXXX, XXXXXXXXXX, and optional +1 or 1 country code prefix.

Valid International Phone Number (E.164)(phone)

Validates that values conform to the E.164 international phone number format. Requires a + prefix followed by the country code and subscriber number, with a total length between 8 and 15 digits. Optionally allows spaces, hyphens, or dots as visual separators.

Valid US ZIP Code Format(zip_code)

Validates that values conform to a US ZIP code format. Accepts the standard 5-digit format (e.g., 90210) and the extended ZIP+4 format with a hyphen (e.g., 90210-1234). Does not validate that the ZIP code corresponds to an actual geographic area.

ISO Country Code Validation(country_code)

Validates that values are valid ISO 3166-1 alpha-2 country codes (e.g., US, GB, DE, FR)

Valid URL Format(url)

Validates that values conform to a well-formed URL format with http or https scheme. Checks for valid scheme, domain (including subdomains and TLD), optional port, path, query parameters, and fragment identifiers.

Column Not Null

Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.

Column Completeness Threshold

Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).

String Not Empty

Asserts that a string column contains no empty strings. This is distinct from a null check — a value can be non-null but still empty ('') or whitespace-only. Catches cases where upstream systems insert blank strings instead of proper nulls.

Conditional Not Null

Asserts that a target column is not null whenever a condition column has a specific value. For example, 'shipping_date must not be null when order_status is shipped'. Enforces business rules where field population depends on another field's state.

Required Fields for Status

Asserts that when a status column has a specific value (e.g., 'active'), a set of required fields must all be populated (non-null). Enforces lifecycle-based data completeness rules where later stages demand richer data.

Column Unique

Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.

Duplicate Detection

Detects and counts duplicate rows based on specified columns. Returns the number of duplicates found and identifies the offending rows. Supports threshold-based alerting for acceptable duplicate rates.

Near-Duplicate Detection

Detects near-duplicate records using string similarity measures. Identifies rows where key columns have similar but not identical values, such as name variations ("Jon Smith" vs "John Smith"), address typos, or inconsistent formatting.

Enum Value Valid

Asserts that all values in a column belong to a predefined set of allowed values. Catches typos, unexpected category values, or upstream system changes that introduce new enum variants without coordination.

Case Consistency

Asserts that a string column uses consistent casing throughout all rows. Validates that values follow a specified pattern such as all uppercase, all lowercase, title case, or a custom regex. Catches data entry inconsistencies and ETL transformation issues.

Foreign Key Valid

Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.

Stale Data Detection

Asserts that no individual record in the table is older than the specified number of days without an update. Identifies records that may have been missed by incremental update pipelines or are stuck in a stale state.