CRM & Customer Data Quality
freeClean customer data — validate emails, phones, addresses, deduplication, and completeness for CRM systems like Salesforce and HubSpot.
Checks included (18)
Valid Email Format(email)
Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.
Valid US Phone Number Format(phone)
Validates that values conform to a US phone number format. Accepts 10-digit numbers in common formats: (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, XXX XXX XXXX, XXXXXXXXXX, and optional +1 or 1 country code prefix.
Valid International Phone Number (E.164)(phone)
Validates that values conform to the E.164 international phone number format. Requires a + prefix followed by the country code and subscriber number, with a total length between 8 and 15 digits. Optionally allows spaces, hyphens, or dots as visual separators.
Valid US ZIP Code Format(zip_code)
Validates that values conform to a US ZIP code format. Accepts the standard 5-digit format (e.g., 90210) and the extended ZIP+4 format with a hyphen (e.g., 90210-1234). Does not validate that the ZIP code corresponds to an actual geographic area.
ISO Country Code Validation(country_code)
Validates that values are valid ISO 3166-1 alpha-2 country codes (e.g., US, GB, DE, FR)
Valid URL Format(url)
Validates that values conform to a well-formed URL format with http or https scheme. Checks for valid scheme, domain (including subdomains and TLD), optional port, path, query parameters, and fragment identifiers.
Column Not Null
Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.
Column Completeness Threshold
Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).
String Not Empty
Asserts that a string column contains no empty strings. This is distinct from a null check — a value can be non-null but still empty ('') or whitespace-only. Catches cases where upstream systems insert blank strings instead of proper nulls.
Conditional Not Null
Asserts that a target column is not null whenever a condition column has a specific value. For example, 'shipping_date must not be null when order_status is shipped'. Enforces business rules where field population depends on another field's state.
Required Fields for Status
Asserts that when a status column has a specific value (e.g., 'active'), a set of required fields must all be populated (non-null). Enforces lifecycle-based data completeness rules where later stages demand richer data.
Column Unique
Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.
Duplicate Detection
Detects and counts duplicate rows based on specified columns. Returns the number of duplicates found and identifies the offending rows. Supports threshold-based alerting for acceptable duplicate rates.
Near-Duplicate Detection
Detects near-duplicate records using string similarity measures. Identifies rows where key columns have similar but not identical values, such as name variations ("Jon Smith" vs "John Smith"), address typos, or inconsistent formatting.
Enum Value Valid
Asserts that all values in a column belong to a predefined set of allowed values. Catches typos, unexpected category values, or upstream system changes that introduce new enum variants without coordination.
Case Consistency
Asserts that a string column uses consistent casing throughout all rows. Validates that values follow a specified pattern such as all uppercase, all lowercase, title case, or a custom regex. Catches data entry inconsistencies and ETL transformation issues.
Foreign Key Valid
Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.
Stale Data Detection
Asserts that no individual record in the table is older than the specified number of days without an update. Identifies records that may have been missed by incremental update pipelines or are stuck in a stale state.