Back to rules

Date Order Valid

consistencyhigh

Asserts that a start date column is always before or equal to an end date column for every row. Catches data entry errors, timezone conversion bugs, or ETL transformation issues that invert temporal ordering.

v1.0.0by dqhub622 downloads5 (27)
consistencydate-ordertemporalvalidationbusiness-rule
Try This Rule

Parameters

column_namestringrequired

The column containing email addresses

thresholdfloatdefault: 0.99

Minimum fraction of valid emails (0.0 to 1.0)

Install

soda
checks for {{table_name}}:
  - invalid_percent({{column_name}}) < {{(1 - threshold) * 100}}:
      valid regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
dbt
{% test valid_email(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} not regexp '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
{% endtest %}
sql
SELECT COUNT(*) as total,
  SUM(CASE WHEN {{column_name}} REGEXP
    '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    THEN 1 ELSE 0 END) as valid
FROM {{table_name}}
Great Expectations
{
  "expectation_type": "expect_column_values_to_match_regex",
  "kwargs": {
    "column": "{{column_name}}",
    "regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "mostly": {{threshold}}
  }
}
spark
from pyspark.sql.functions import col
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
invalid = df.filter(~col("{{column_name}}").rlike(pattern)).count()

Test Data

Passing Examples

idvalue
1alice@example.com
2bob.smith@company.co.uk
3charlie+tag@domain.org

Failing Examples

idvalue
1not-an-email
2@missing-local.com
3spaces in@email.com

CLI

Terminal
npx dqhub install date-order-valid --format soda --table YOUR_TABLE
npx dqhub install date-order-valid --format dbt --model YOUR_MODEL
npx dqhub install date-order-valid --format sql --dialect snowflake