Back to rules

PII Annotation and Anonymization Flag

completenesscritical

Validates that records containing personal data have pii_flag set to true and anonymization_method populated. Under EU AI Act Article 10, training data containing personal data requires appropriate data governance measures including privacy-preserving techniques. Proper PII annotation ensures transparency and supports GDPR compliance alongside AI Act requirements.

v1.0.0by dqhub939 downloads4.4 (80)
ai-actprivacygdprdata-governancepiianonymizationpersonal-data
Try This Rule

Parameters

column_namestringrequired

The column containing email addresses

thresholdfloatdefault: 0.99

Minimum fraction of valid emails (0.0 to 1.0)

Compliance Mapping

EU AI ActArticle 10 — Data and Data Governance

NIST AI RMFGOVERN 1.5 / MAP 5.1

ISO/IEC 5259ISO/IEC 5259-1:2024 — Data Quality for AI

Install

soda
checks for {{table_name}}:
  - invalid_percent({{column_name}}) < {{(1 - threshold) * 100}}:
      valid regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
dbt
{% test valid_email(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} not regexp '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
{% endtest %}
sql
SELECT COUNT(*) as total,
  SUM(CASE WHEN {{column_name}} REGEXP
    '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    THEN 1 ELSE 0 END) as valid
FROM {{table_name}}
Great Expectations
{
  "expectation_type": "expect_column_values_to_match_regex",
  "kwargs": {
    "column": "{{column_name}}",
    "regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "mostly": {{threshold}}
  }
}
spark
from pyspark.sql.functions import col
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
invalid = df.filter(~col("{{column_name}}").rlike(pattern)).count()

Test Data

Passing Examples

idvalue
1alice@example.com
2bob.smith@company.co.uk
3charlie+tag@domain.org

Failing Examples

idvalue
1not-an-email
2@missing-local.com
3spaces in@email.com

CLI

Terminal
npx dqhub install pii-annotation-flag --format soda --table YOUR_TABLE
npx dqhub install pii-annotation-flag --format dbt --model YOUR_MODEL
npx dqhub install pii-annotation-flag --format sql --dialect snowflake