Back to rules

Protected Attribute Representation Coverage

statisticalcritical

Validates that protected attributes (such as gender, age_group, ethnicity) are represented with a minimum coverage percentage per group. Under EU AI Act Article 10, training data must be examined for possible biases that are likely to affect the health, safety, or fundamental rights of persons. Insufficient representation of protected groups can lead to discriminatory AI outcomes.

v1.0.0by dqhub142 downloads4.4 (11)
biasprotected-attributesdata-governanceai-actfairnessrepresentationdiscrimination
Try This Rule

Parameters

column_namestringrequired

The column containing email addresses

thresholdfloatdefault: 0.99

Minimum fraction of valid emails (0.0 to 1.0)

Compliance Mapping

ISO/IEC 5259ISO/IEC 5259-1:2024 — Data Quality for AI

EU AI ActArticle 10 — Data and Data Governance

NIST AI RMFMAP 2.3 / MEASURE 2.11

Install

soda
checks for {{table_name}}:
  - invalid_percent({{column_name}}) < {{(1 - threshold) * 100}}:
      valid regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
dbt
{% test valid_email(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} not regexp '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
{% endtest %}
sql
SELECT COUNT(*) as total,
  SUM(CASE WHEN {{column_name}} REGEXP
    '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    THEN 1 ELSE 0 END) as valid
FROM {{table_name}}
Great Expectations
{
  "expectation_type": "expect_column_values_to_match_regex",
  "kwargs": {
    "column": "{{column_name}}",
    "regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "mostly": {{threshold}}
  }
}
spark
from pyspark.sql.functions import col
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
invalid = df.filter(~col("{{column_name}}").rlike(pattern)).count()

Test Data

Passing Examples

idvalue
1alice@example.com
2bob.smith@company.co.uk
3charlie+tag@domain.org

Failing Examples

idvalue
1not-an-email
2@missing-local.com
3spaces in@email.com

CLI

Terminal
npx dqhub install bias-attribute-coverage --format soda --table YOUR_TABLE
npx dqhub install bias-attribute-coverage --format dbt --model YOUR_MODEL
npx dqhub install bias-attribute-coverage --format sql --dialect snowflake