Claim Date Sequence Validation

consistencycritical

Validates the temporal ordering of claim dates: the loss date must be on or before the report date, and the report date must be on or before the current date. Catches backdated claims, data entry errors, and ETL issues that break the expected chronological sequence of insurance claim events.

v1.0.0by dqhub632 downloads4 (38)

claimdatesequence

Try This Rule

Parameters

column_namestringrequired

The column containing email addresses

thresholdfloatdefault: 0.99

Minimum fraction of valid emails (0.0 to 1.0)

Install

soda

checks for {{table_name}}:
  - invalid_percent({{column_name}}) < {{(1 - threshold) * 100}}:
      valid regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

dbt

{% test valid_email(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} not regexp '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
{% endtest %}

sql

SELECT COUNT(*) as total,
  SUM(CASE WHEN {{column_name}} REGEXP
    '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    THEN 1 ELSE 0 END) as valid
FROM {{table_name}}

Great Expectations

{
  "expectation_type": "expect_column_values_to_match_regex",
  "kwargs": {
    "column": "{{column_name}}",
    "regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
    "mostly": {{threshold}}
  }
}

spark

from pyspark.sql.functions import col
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
invalid = df.filter(~col("{{column_name}}").rlike(pattern)).count()

Test Data

Passing Examples

id	value
1	alice@example.com
2	bob.smith@company.co.uk
3	charlie+tag@domain.org

Failing Examples

id	value
1	not-an-email
2	@missing-local.com
3	spaces in@email.com

CLI

Terminal

npx dqhub install claim-date-sequence --format soda --table YOUR_TABLE
npx dqhub install claim-date-sequence --format dbt --model YOUR_MODEL
npx dqhub install claim-date-sequence --format sql --dialect snowflake