Missing Value Trend

completenessmedium

Asserts that the missing value rate for a column has not increased beyond an acceptable threshold compared to a prior period. Detects gradual data quality degradation by comparing the current null rate against a baseline or rolling window.

v1.0.0by dqhub589 downloads4.3 (27)

trendtime-seriesdegradationmonitoringmissing-ratedrift

Try This Rule

Parameters

column_namestringrequired

The column containing email addresses

thresholdfloatdefault: 0.99

Minimum fraction of valid emails (0.0 to 1.0)

Install

soda

checks for {{table_name}}:
  - invalid_percent({{column_name}}) < {{(1 - threshold) * 100}}:
      valid regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

dbt

{% test valid_email(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} not regexp '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
{% endtest %}

sql

SELECT COUNT(*) as total,
  SUM(CASE WHEN {{column_name}} REGEXP
    '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    THEN 1 ELSE 0 END) as valid
FROM {{table_name}}

Great Expectations

{
  "expectation_type": "expect_column_values_to_match_regex",
  "kwargs": {
    "column": "{{column_name}}",
    "regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "mostly": {{threshold}}
  }
}

spark

from pyspark.sql.functions import col
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
invalid = df.filter(~col("{{column_name}}").rlike(pattern)).count()

Test Data

Passing Examples

id	value
1	alice@example.com
2	bob.smith@company.co.uk
3	charlie+tag@domain.org

Failing Examples

id	value
1	not-an-email
2	@missing-local.com
3	spaces in@email.com

CLI

Terminal

npx dqhub install missing-value-trend --format soda --table YOUR_TABLE
npx dqhub install missing-value-trend --format dbt --model YOUR_MODEL
npx dqhub install missing-value-trend --format sql --dialect snowflake