Back to packs

Data Quality Starter Pack

free

Essential data quality checks every team should have — null checks, format validation, uniqueness, freshness, and basic statistics.

19 rules 2216 downloads4.3 avg (123)
starteressentialbeginnergetting-startedfoundationbest-practices
4.3(123 ratings)

Sign in to rate this pack

Test this pack with your data

Download the template, fill in your data, and see quality results instantly.

Test This Pack

Download & Install

Choose your tool — get a ready-to-run file

Run this on your data? Upload your CSV — we'll auto-map the columns, validate, and report the bad rows.Test my data
Or use the CLI
$ npx dqhub install data-quality-starter --format soda --table YOUR_TABLE

About this pack

The foundational DQ rule pack for any data team getting started with data quality. Includes the most commonly used checks across all categories: - Completeness: null checks, empty string detection, required fields - Format: email, phone, date, URL validation - Uniqueness: primary key validation, duplicate detection - Range: non-negative values, reasonable date ranges - Freshness: table staleness detection - Volume: row count checks, empty table detection - Statistical: mean stability, cardinality monitoring Perfect for teams implementing data quality for the first time.

What's included

3completeness rules
3format rules
3uniqueness rules
3range rules
2volume rules
2statistical rules
1referential integrity rules
1freshness rules
1consistency rules

Checks included (19)

Column Not Null

Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.

Column Completeness Threshold

Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).

String Not Empty

Asserts that a string column contains no empty strings. This is distinct from a null check — a value can be non-null but still empty ('') or whitespace-only. Catches cases where upstream systems insert blank strings instead of proper nulls.

Valid Email Format(email)

Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.

Valid Date String Format(event_date)

Validates that date string values match the expected format. Supports configurable formats including YYYY-MM-DD (ISO 8601), MM/DD/YYYY, DD/MM/YYYY, YYYY/MM/DD, and DD-Mon-YYYY. Validates month (01-12), day (01-31), and reasonable year ranges.

Valid UUID Format(record_id)

Validates that values conform to a UUID format (versions 1-5). Checks for the standard 8-4-4-4-12 hexadecimal format with hyphens (e.g., 550e8400-e29b-41d4-a716-446655440000). The version digit (first character of the third group) must be 1-5, and the variant bits (first character of the fourth group) must be 8, 9, a, or b per RFC 4122.

Column Unique

Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.

Primary Key Valid

Validates that a column qualifies as a valid primary key by ensuring all values are both unique and not null. Combines uniqueness and completeness checks into a single rule.

Duplicate Detection

Detects and counts duplicate rows based on specified columns. Returns the number of duplicates found and identifies the offending rows. Supports threshold-based alerting for acceptable duplicate rates.

Non-Negative Values

Validates that a numeric column contains no negative values. Common for quantities, counts, amounts, durations, and other measures that should never be negative.

Date Not In Future

Validates that a date or timestamp column contains no values in the future. Catches data entry errors, timezone issues, and ETL bugs that produce future-dated records for columns like birth_date, transaction_date, or created_at.

Value In Range

Validates that all values in a numeric column fall within a specified minimum and maximum range (inclusive). Catches data entry errors, unit conversion issues, and out-of-bounds values.

Table Not Empty

Asserts that a table contains at least one row. This is the most fundamental volume check to confirm that a table has not been accidentally truncated, dropped, or failed to load any data.

Row Count Minimum

Asserts that a table contains at least the specified minimum number of rows. Useful for tables with known baseline volumes where dropping below a threshold indicates a data load issue.

Mean In Range

Asserts that the arithmetic mean of a numeric column falls within an expected range. Detects data drift, calculation errors, or upstream changes that shift the central tendency of key metrics.

Cardinality Check

Asserts that the number of distinct values in a column falls within an expected range. Detects issues such as collapsed categories (too few distinct values), data explosion (too many), or enum drift from upstream changes.

Foreign Key Valid

Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.

Table Freshness

Asserts that a table has been updated within the specified number of hours. Uses the table's metadata (last modified timestamp) or a designated timestamp column to verify data is fresh and pipelines are running on schedule.

Enum Value Valid

Asserts that all values in a column belong to a predefined set of allowed values. Catches typos, unexpected category values, or upstream system changes that introduce new enum variants without coordination.