BCBS 239 Risk Data Aggregation
BCBS_239freeData quality rules aligned to the 14 principles of BCBS 239 for risk data aggregation and reporting in banking.
Checks included (23)
ISO Currency Code Validation(currency_code)
Validates that values are valid ISO 4217 currency codes (e.g., USD, EUR, GBP, JPY)
ISO Country Code Validation(country_code)
Validates that values are valid ISO 3166-1 alpha-2 country codes (e.g., US, GB, DE, FR)
IBAN Format Validation(iban)
Validates International Bank Account Number format — 2-letter country code, 2 check digits, and up to 30 alphanumeric characters
Valid Email Format(email)
Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.
Mean In Range
Asserts that the arithmetic mean of a numeric column falls within an expected range. Detects data drift, calculation errors, or upstream changes that shift the central tendency of key metrics.
Standard Deviation Stable
Asserts that the standard deviation of a numeric column has not changed more than the specified percentage from a known baseline. Detects changes in data variability that may indicate corrupted data, changed source systems, or process failures.
Cardinality Check
Asserts that the number of distinct values in a column falls within an expected range. Detects issues such as collapsed categories (too few distinct values), data explosion (too many), or enum drift from upstream changes.
Distribution Check
Asserts that the value distribution of a categorical column matches expected proportions within a given tolerance. For example, verifying that 60% of records are type A, 30% type B, and 10% type C. Detects upstream labeling changes, data skew, or ingestion bias.
Column Not Null
Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.
Column Completeness Threshold
Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).
Required Fields for Status
Asserts that when a status column has a specific value (e.g., 'active'), a set of required fields must all be populated (non-null). Enforces lifecycle-based data completeness rules where later stages demand richer data.
Non-Negative Values
Validates that a numeric column contains no negative values. Common for quantities, counts, amounts, durations, and other measures that should never be negative.
Value In Range
Validates that all values in a numeric column fall within a specified minimum and maximum range (inclusive). Catches data entry errors, unit conversion issues, and out-of-bounds values.
Amount Limit
Validates that financial amounts do not exceed a configurable limit. Catches data entry errors, currency conversion issues, and fraudulent transactions. Supports both upper and optional lower bounds.
Column Unique
Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.
Composite Unique
Validates that the combination of specified columns forms a unique key. Ensures no two rows share the same values across all listed columns, enforcing composite key integrity.
Primary Key Valid
Validates that a column qualifies as a valid primary key by ensuring all values are both unique and not null. Combines uniqueness and completeness checks into a single rule.
Table Freshness
Asserts that a table has been updated within the specified number of hours. Uses the table's metadata (last modified timestamp) or a designated timestamp column to verify data is fresh and pipelines are running on schedule.
Column Max Age
Asserts that the most recent value in a date/timestamp column is within the specified number of hours from the current time. Useful for verifying that new data is arriving as expected in date-partitioned or event-driven tables.
Ingestion Delay
Asserts that the time difference between the source event timestamp and the load/ingestion timestamp is within the defined SLA. Detects pipeline lag, backpressure, or ingestion failures that cause data to arrive late.
Foreign Key Valid
Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.
Cross-Table Count Match
Validates that row counts or aggregated values match between related tables. For example, the number of order line items should match the item_count on the order header, or the sum of transaction amounts should match the account balance.
Schema Column Count
Asserts that a table has the expected number of columns. Detects unintended schema changes such as dropped columns, added columns from upstream migrations, or schema drift between environments.