Back to rules

Unique Entity Identifier (UEI) Format

formatcritical

Validates that Unique Entity Identifier (UEI) values are exactly 12 uppercase alphanumeric characters. The UEI replaced the DUNS number in April 2022 as the primary identifier for entities doing business with the federal government. UEIs are assigned through SAM.gov registration.

v1.0.0by dqhub230 downloads5 (7)
ueisam-govfederaldata-act
Try This Rule

Parameters

column_namestringrequired

The column containing email addresses

thresholdfloatdefault: 0.99

Minimum fraction of valid emails (0.0 to 1.0)

Compliance Mapping

DATA ActSAM.gov Unique Entity Identifier (https://sam.gov)

OMB2 CFR 25.110 (Central Contractor Registration)

Install

soda
checks for {{table_name}}:
  - invalid_percent({{column_name}}) < {{(1 - threshold) * 100}}:
      valid regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
dbt
{% test valid_email(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} not regexp '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
{% endtest %}
sql
SELECT COUNT(*) as total,
  SUM(CASE WHEN {{column_name}} REGEXP
    '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    THEN 1 ELSE 0 END) as valid
FROM {{table_name}}
Great Expectations
{
  "expectation_type": "expect_column_values_to_match_regex",
  "kwargs": {
    "column": "{{column_name}}",
    "regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "mostly": {{threshold}}
  }
}
spark
from pyspark.sql.functions import col
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
invalid = df.filter(~col("{{column_name}}").rlike(pattern)).count()

Test Data

Passing Examples

idvalue
1alice@example.com
2bob.smith@company.co.uk
3charlie+tag@domain.org

Failing Examples

idvalue
1not-an-email
2@missing-local.com
3spaces in@email.com

CLI

Terminal
npx dqhub install uei-format --format soda --table YOUR_TABLE
npx dqhub install uei-format --format dbt --model YOUR_MODEL
npx dqhub install uei-format --format sql --dialect snowflake