Back to packs

GDPR Data Accuracy & Privacy

GDPRfree

Data quality rules for GDPR Article 5(1)(d) accuracy principle — validate personal data, consent records, and retention policies.

22 rules 2230 downloads4.0 avg (152)
gdprprivacypersonal-dataconsentretentionpiidata-protection
4.0(152 ratings)

Sign in to rate this pack

Test this pack with your data

Download the template, fill in your data, and see quality results instantly.

Test This Pack

Download & Install

Choose your tool — get a ready-to-run file

Run this on your data? Upload your CSV — we'll auto-map the columns, validate, and report the bad rows.Test my data
Or use the CLI
$ npx dqhub install gdpr-data-accuracy --format soda --table YOUR_TABLE

About this pack

Rule pack for organizations processing personal data under GDPR. Covers: - Article 5(1)(d) Accuracy: Contact data validation (email, phone), format checks - Article 5(1)(e) Storage Limitation: Data retention/age checks - Article 7 Consent: Completeness of consent records, date validity - Article 25 Data Protection by Design: PII field completeness, standardization - Article 30 Records of Processing: Required field validation - Article 44-49 Cross-border Transfers: Country code validation Designed for Data Protection Officers and compliance teams.

Sources & References

GDPR — Article 44-49

Cross-border transfer validation requires valid country identification

Consent must be current and valid; processing based on expired consent is unlawful

GDPR — Recital 32 — Conditions for Consent

Consent should have a clear indication of agreement and be withdrawable

Personal data must be kept for no longer than necessary for the purposes for which it is processed

Data subjects have the right to obtain erasure of personal data without undue delay

Personal data must be adequate, relevant and limited to what is necessary for the purposes of processing

What's included

7completeness rules
5format rules
4range rules
2uniqueness rules
2freshness rules
1referential integrity rules
1consistency rules

Checks included (22)

Column Not Null

Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.

Column Completeness Threshold

Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).

String Not Empty

Asserts that a string column contains no empty strings. This is distinct from a null check — a value can be non-null but still empty ('') or whitespace-only. Catches cases where upstream systems insert blank strings instead of proper nulls.

Conditional Not Null

Asserts that a target column is not null whenever a condition column has a specific value. For example, 'shipping_date must not be null when order_status is shipped'. Enforces business rules where field population depends on another field's state.

Required Fields for Status

Asserts that when a status column has a specific value (e.g., 'active'), a set of required fields must all be populated (non-null). Enforces lifecycle-based data completeness rules where later stages demand richer data.

GDPR Right to Erasure Completion Check

Validates that records flagged for deletion have been processed. When a deletion request exists (deletion_requested = true), the record must have a deleted_at timestamp confirming erasure. Unfulfilled deletion requests violate GDPR Article 17.

GDPR Data Minimization Check(sensitive_field)

Validates that optional sensitive fields are NULL when not required by the stated processing purpose. Ensures compliance with the data minimization principle — only data necessary for the specific purpose should be collected.

Valid Email Format(email)

Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.

Valid International Phone Number (E.164)(phone)

Validates that values conform to the E.164 international phone number format. Requires a + prefix followed by the country code and subscriber number, with a total length between 8 and 15 digits. Optionally allows spaces, hyphens, or dots as visual separators.

Valid US Phone Number Format(phone)

Validates that values conform to a US phone number format. Accepts 10-digit numbers in common formats: (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, XXX XXX XXXX, XXXXXXXXXX, and optional +1 or 1 country code prefix.

ISO Country Code Validation(country_code)

Validates that values are valid ISO 3166-1 alpha-2 country codes (e.g., US, GB, DE, FR)

Valid Date String Format(event_date)

Validates that date string values match the expected format. Supports configurable formats including YYYY-MM-DD (ISO 8601), MM/DD/YYYY, DD/MM/YYYY, YYYY/MM/DD, and DD-Mon-YYYY. Validates month (01-12), day (01-31), and reasonable year ranges.

Date Not In Future

Validates that a date or timestamp column contains no values in the future. Catches data entry errors, timezone issues, and ETL bugs that produce future-dated records for columns like birth_date, transaction_date, or created_at.

Date Not Too Old

Validates that date values are not older than a configurable threshold. Catches default dates (1900-01-01, 1970-01-01), data migration artifacts, and stale records that may indicate data loading or conversion errors.

GDPR Consent Expiry Validation(consent_expiry_date)

Validates that consent records have not expired. Consent expiry dates must be in the future or NULL (no expiry). Expired consent means processing is no longer lawful under GDPR.

GDPR Data Retention Period Check(created_at)

Validates that personal data records do not exceed their retention period. Records older than the configured retention limit should be flagged for deletion to comply with GDPR storage limitation principle.

Column Unique

Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.

Duplicate Detection

Detects and counts duplicate rows based on specified columns. Returns the number of duplicates found and identifies the offending rows. Supports threshold-based alerting for acceptable duplicate rates.

Table Freshness

Asserts that a table has been updated within the specified number of hours. Uses the table's metadata (last modified timestamp) or a designated timestamp column to verify data is fresh and pipelines are running on schedule.

Stale Data Detection

Asserts that no individual record in the table is older than the specified number of days without an update. Identifies records that may have been missed by incremental update pipelines or are stuck in a stale state.

Foreign Key Valid

Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.

Enum Value Valid

Asserts that all values in a column belong to a predefined set of allowed values. Catches typos, unexpected category values, or upstream system changes that introduce new enum variants without coordination.