Back to packs

W-2 Payroll Data Quality

free

Validate W-2 tax form data — wages, Social Security caps, tax withholding consistency, EIN and SSN formats.

8 rules 1033 downloads4.9 avg (57)
w-2payrolltaxirswageswithholding
4.9(57 ratings)

Sign in to rate this pack

Test this pack with your data

Download the template, fill in your data, and see quality results instantly.

Test This Pack

Download & Install

Choose your tool — get a ready-to-run file

Run this on your data? Upload your CSV — we'll auto-map the columns, validate, and report the bad rows.Test my data
Or use the CLI
$ npx dqhub install w2-payroll --format soda --table YOUR_TABLE

About this pack

Data quality rules for W-2 and payroll data compliance. Covers: - Federal wages non-negative (Box 1) - Social Security wages within annual cap ($168,600 for 2025) - Tax withholding cannot exceed wages - EIN format validation - SSN format validation (masked) Based on IRS Publication 15 and SSA EFW2 specifications.

What's included

2range rules
2format rules
2completeness rules
1consistency rules
1uniqueness rules

Checks included (8)

W-2 Federal Wages Non-Negative (Box 1)(federal_wages)

Validates that W-2 Box 1 (Wages, tips, other compensation) is a numeric value greater than or equal to zero. Federal wages reported on Form W-2 must be non-negative. A negative value indicates a data entry error or system issue that would cause IRS e-file rejection.

W-2 Social Security Wage Cap(ss_wages)

Validates that W-2 Box 3 (Social Security wages) does not exceed the annual Social Security wage base limit. For tax year 2025, the wage base is $168,600. Wages above this cap are not subject to Social Security tax and should not be reported in Box 3. Values must also be non-negative.

EIN (Employer Identification Number) Format(ein)

Validates Employer Identification Number (EIN) format as defined by the IRS. An EIN is a 9-digit number assigned to businesses for tax identification purposes. The format is NN-NNNNNNN (2 digits, hyphen, 7 digits). The first two digits (prefix) indicate the IRS campus that assigned the number and must fall within valid ranges. Prefixes 00, 07, 08, 09, 17, 18, 19, 28, 29, 49, 69, 70, 78, 79, 89, 96, and 97 are not currently assigned.

SSN Format Validation (Masked)(ssn_masked)

Validates masked Social Security Number format where the first five digits are masked with X characters and the last four digits are visible. Expected format: XXX-XX-NNNN. Also enforces SSA rules: the last four digits cannot be 0000 (no SSN ends in 0000). This rule is designed for systems that store only the last four digits of SSN for privacy compliance while still requiring format validation.

Column Not Null

Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.

Column Completeness Threshold

Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).

W-2 Federal Tax Withholding Cannot Exceed Wages

Validates that W-2 Box 2 (Federal income tax withheld) does not exceed Box 1 (Wages, tips, other compensation). Federal tax withheld cannot logically be greater than the total wages from which it was deducted. A violation indicates a payroll calculation error or data entry mistake that would trigger IRS scrutiny.

Column Unique

Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.