GDPR Data Accuracy & Privacy
GDPRfreeData quality rules for GDPR Article 5(1)(d) accuracy principle — validate personal data, consent records, and retention policies.
Checks included (22)
Column Not Null
Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.
Column Completeness Threshold
Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).
String Not Empty
Asserts that a string column contains no empty strings. This is distinct from a null check — a value can be non-null but still empty ('') or whitespace-only. Catches cases where upstream systems insert blank strings instead of proper nulls.
Conditional Not Null
Asserts that a target column is not null whenever a condition column has a specific value. For example, 'shipping_date must not be null when order_status is shipped'. Enforces business rules where field population depends on another field's state.
Required Fields for Status
Asserts that when a status column has a specific value (e.g., 'active'), a set of required fields must all be populated (non-null). Enforces lifecycle-based data completeness rules where later stages demand richer data.
GDPR Right to Erasure Completion Check
Validates that records flagged for deletion have been processed. When a deletion request exists (deletion_requested = true), the record must have a deleted_at timestamp confirming erasure. Unfulfilled deletion requests violate GDPR Article 17.
GDPR Data Minimization Check(sensitive_field)
Validates that optional sensitive fields are NULL when not required by the stated processing purpose. Ensures compliance with the data minimization principle — only data necessary for the specific purpose should be collected.
Valid Email Format(email)
Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.
Valid International Phone Number (E.164)(phone)
Validates that values conform to the E.164 international phone number format. Requires a + prefix followed by the country code and subscriber number, with a total length between 8 and 15 digits. Optionally allows spaces, hyphens, or dots as visual separators.
Valid US Phone Number Format(phone)
Validates that values conform to a US phone number format. Accepts 10-digit numbers in common formats: (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, XXX XXX XXXX, XXXXXXXXXX, and optional +1 or 1 country code prefix.
ISO Country Code Validation(country_code)
Validates that values are valid ISO 3166-1 alpha-2 country codes (e.g., US, GB, DE, FR)
Valid Date String Format(event_date)
Validates that date string values match the expected format. Supports configurable formats including YYYY-MM-DD (ISO 8601), MM/DD/YYYY, DD/MM/YYYY, YYYY/MM/DD, and DD-Mon-YYYY. Validates month (01-12), day (01-31), and reasonable year ranges.
Date Not In Future
Validates that a date or timestamp column contains no values in the future. Catches data entry errors, timezone issues, and ETL bugs that produce future-dated records for columns like birth_date, transaction_date, or created_at.
Date Not Too Old
Validates that date values are not older than a configurable threshold. Catches default dates (1900-01-01, 1970-01-01), data migration artifacts, and stale records that may indicate data loading or conversion errors.
GDPR Consent Expiry Validation(consent_expiry_date)
Validates that consent records have not expired. Consent expiry dates must be in the future or NULL (no expiry). Expired consent means processing is no longer lawful under GDPR.
GDPR Data Retention Period Check(created_at)
Validates that personal data records do not exceed their retention period. Records older than the configured retention limit should be flagged for deletion to comply with GDPR storage limitation principle.
Column Unique
Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.
Duplicate Detection
Detects and counts duplicate rows based on specified columns. Returns the number of duplicates found and identifies the offending rows. Supports threshold-based alerting for acceptable duplicate rates.
Table Freshness
Asserts that a table has been updated within the specified number of hours. Uses the table's metadata (last modified timestamp) or a designated timestamp column to verify data is fresh and pipelines are running on schedule.
Stale Data Detection
Asserts that no individual record in the table is older than the specified number of days without an update. Identifies records that may have been missed by incremental update pipelines or are stuck in a stale state.
Foreign Key Valid
Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.
Enum Value Valid
Asserts that all values in a column belong to a predefined set of allowed values. Catches typos, unexpected category values, or upstream system changes that introduce new enum variants without coordination.