Fix incorrect PESEL checksum validation in PlPeselRecognizer #1520

BlaiseCz · 2025-01-31T11:29:43Z

Bug Description

The PESEL checksum validation in PlPeselRecognizer.validate_result() is incorrect. The current implementation does not correctly compute the control digit, leading to false negatives, where valid PESEL numbers are incorrectly rejected.

This affects Presidio's ability to correctly recognize and validate PESEL numbers, impacting anonymization and sensitive data detection.

To Reproduce

Run the following test:

from presidio_analyzer.predefined_recognizers import PlPeselRecognizer

pesel_recognizer = PlPeselRecognizer()

valid_pesel = "44051401359"  # This is a valid PESEL
print(pesel_recognizer.validate_result(valid_pesel))  # Expected: True, Actual: False

**Note if unsure, check this: https://kalkulatory.gofin.pl/kalkulatory/sprawdzanie-pesel-weryfikacja-pesel

Observed Behavior

The function returns False for a valid PESEL due to incorrect checksum computation.

Expected Behavior

A valid PESEL (with the correct checksum) should return True.

Root Cause: Incorrect Checksum Calculation

The issue lies in the final checksum validation step. The existing code:

checksum = sum(digit * weight for digit, weight in zip(digits[:10], weights))
checksum %= 10

return checksum == digits[10]  # ❌ Incorrect final checksum check!

This incorrectly compares checksum directly to the last digit of PESEL instead of computing the correct control digit.

Proposed Fix

The correct formula to compute the PESEL checksum is:

def validate_result(self, pattern_text: str) -> bool:  # noqa D102
    if len(pattern_text) != 11 or not pattern_text.isdigit():
        return False  # Ensure the input is a valid 11-digit number

    digits = [int(digit) for digit in pattern_text]
    weights = [1, 3, 7, 9, 1, 3, 7, 9, 1, 3]  # Correct weights

    checksum = sum(digit * weight for digit, weight in zip(digits[:10], weights)) % 10
    check_digit = (10 - checksum) % 10  # ✅ Corrected final checksum computation

    return check_digit == digits[10]  # ✅ Now correctly compares with the last digit

Why This Fix Works

Ensures the checksum modulo 10 logic is correctly applied.
Guarantees that only valid PESELs pass the validation.
Fixes the false-negative issue without introducing false positives.

Additional Context

This issue impacts Polish users relying on PESEL validation in Presidio.
The bug affects data masking and validation accuracy.
Fixing this ensures compliance with official PESEL formatting rules.

Fix incorrect PESEL checksum validation in PlPeselRecognizer

b05d826

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect PESEL checksum validation in PlPeselRecognizer #1520

Fix incorrect PESEL checksum validation in PlPeselRecognizer #1520

BlaiseCz commented Jan 31, 2025 •

edited

Loading

Fix incorrect PESEL checksum validation in PlPeselRecognizer #1520

Are you sure you want to change the base?

Fix incorrect PESEL checksum validation in PlPeselRecognizer #1520

Conversation

BlaiseCz commented Jan 31, 2025 • edited Loading

Bug Description

To Reproduce

Observed Behavior

Expected Behavior

Root Cause: Incorrect Checksum Calculation

Proposed Fix

Why This Fix Works

Additional Context

BlaiseCz commented Jan 31, 2025 •

edited

Loading