Back to rules

No Biometric Identifier in De-Identified Data

completenesscritical

Validates that de-identified datasets do not contain biometric identifiers including fingerprint hashes, retinal scan data, voiceprint signatures, facial photograph references, or DNA sequence data. Checks for common biometric data patterns, file path references to biometric media, and encoded biometric payloads. Biometric identifiers are among the 18 HIPAA Safe Harbor identifiers that must be removed.

v1.0.0by dqhub698 downloads3.8 (44)
phisafe-harborfingerprintvoiceprinthealthcarehipaade-identificationbiometricretinalfacial-photoprivacy
Try This Rule

Parameters

column_namestringrequired

The column containing email addresses

thresholdfloatdefault: 0.99

Minimum fraction of valid emails (0.0 to 1.0)

Compliance Mapping

HIPAA45 CFR §164.514(b) Safe Harbor

Install

soda
checks for {{table_name}}:
  - invalid_percent({{column_name}}) < {{(1 - threshold) * 100}}:
      valid regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
dbt
{% test valid_email(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} not regexp '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
{% endtest %}
sql
SELECT COUNT(*) as total,
  SUM(CASE WHEN {{column_name}} REGEXP
    '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    THEN 1 ELSE 0 END) as valid
FROM {{table_name}}
Great Expectations
{
  "expectation_type": "expect_column_values_to_match_regex",
  "kwargs": {
    "column": "{{column_name}}",
    "regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "mostly": {{threshold}}
  }
}
spark
from pyspark.sql.functions import col
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
invalid = df.filter(~col("{{column_name}}").rlike(pattern)).count()

Test Data

Passing Examples

idvalue
1alice@example.com
2bob.smith@company.co.uk
3charlie+tag@domain.org

Failing Examples

idvalue
1not-an-email
2@missing-local.com
3spaces in@email.com

CLI

Terminal
npx dqhub install no-biometric-identifier --format soda --table YOUR_TABLE
npx dqhub install no-biometric-identifier --format dbt --model YOUR_MODEL
npx dqhub install no-biometric-identifier --format sql --dialect snowflake