HIPAA PHI De-Identification (Safe Harbor)
HIPAAfreeDetect and validate removal of the 18 Safe Harbor identifiers — SSN, MRN, emails, phones, IPs, dates, ZIP codes, and biometric data in de-identified healthcare datasets.
Checks included (10)
No SSN in De-Identified Data(ssn)
Validates that de-identified datasets do not contain Social Security Number patterns. Checks for the standard XXX-XX-XXXX format as well as 9 consecutive digits that match SSN structure. SSNs are one of the 18 HIPAA Safe Harbor identifiers that must be removed for de-identification.
Dates Generalized to Year Only(date_of_birth)
Validates that all date fields in de-identified data are generalized to year only (YYYY format) with no month or day components. Under HIPAA Safe Harbor, all date elements more granular than year must be removed, including birth date, admission date, discharge date, and death date. Additionally, ages over 89 must be aggregated into a single category of 90 or above.
ZIP Code Truncated to 3 Digits(zip_code)
Validates that ZIP codes in de-identified data are truncated to the first 3 digits only. Under HIPAA Safe Harbor, geographic data must be limited to the first 3 digits of the ZIP code, and 3-digit ZIP codes with populations under 20,000 must be replaced with '000'. Valid values are exactly 3 digits or '000' for restricted ZIP prefixes.
No Email Address in De-Identified Data(email)
Validates that de-identified datasets do not contain email address patterns. Checks for standard email formats (user@domain.tld) across all text fields. Email addresses are one of the 18 HIPAA Safe Harbor identifiers that must be removed for de-identification.
No Phone Number in De-Identified Data(phone)
Validates that de-identified datasets do not contain telephone number patterns. Checks for common US phone formats including XXX-XXX-XXXX, (XXX) XXX-XXXX, 10-digit sequences, and numbers with country code prefix. Telephone numbers are one of the 18 HIPAA Safe Harbor identifiers that must be removed.
No Fax Number in De-Identified Data(fax)
Validates that de-identified datasets do not contain fax number patterns. Checks for standard phone/fax number formats (XXX-XXX-XXXX, 10-digit sequences) in fax-designated columns. Fax numbers are one of the 18 HIPAA Safe Harbor identifiers that must be removed for de-identification.
No IP Address in De-Identified Data(ip_address)
Validates that de-identified datasets do not contain Internet Protocol (IP) address patterns. Checks for IPv4 dotted-quad notation (e.g., 192.168.1.1) with each octet in the valid 0-255 range. IP addresses are one of the 18 HIPAA Safe Harbor identifiers that must be removed for de-identification.
No Device Identifier in De-Identified Data(device_id)
Validates that de-identified datasets do not contain device identifiers or serial numbers. Checks for Unique Device Identifier (UDI) patterns following GS1 or HIBCC standards, as well as common serial number formats. Device identifiers and serial numbers are among the 18 HIPAA Safe Harbor identifiers that must be removed.
No Medical Record Number in De-Identified Data(mrn)
Validates that de-identified datasets do not contain Medical Record Numbers (MRNs). Checks for columns with known MRN naming patterns and validates that values do not match common MRN formats (6-10 digit numeric identifiers or alphanumeric hospital record IDs). MRNs are one of the 18 HIPAA Safe Harbor identifiers that must be removed.
No Biometric Identifier in De-Identified Data(biometric_data)
Validates that de-identified datasets do not contain biometric identifiers including fingerprint hashes, retinal scan data, voiceprint signatures, facial photograph references, or DNA sequence data. Checks for common biometric data patterns, file path references to biometric media, and encoded biometric payloads. Biometric identifiers are among the 18 HIPAA Safe Harbor identifiers that must be removed.