Replies: 1 comment 3 replies
-
Measuring is difficult, but the correlation in the address will be strong. Correlation to home phone number shouldn't be so bad, so I would suggest two comparisons, one for address and the other for phone I would suggest something like: Address comparison, made up of levels
Then probably a standard LevenshteinAtThresholds should do for the phone number This is a fairly simple approach; more generally address matching is notoriously hard. There's a more sophisticated example in the following repo of address matching (where address is the entity type). https://github.com/RobinL/uk_address_matcher Worth noting that in most of our pipelines, because of the complexity of address matching, we tend to just match on postcode |
Beta Was this translation helpful? Give feedback.
-
In our case records have correlated attributes related to address:
Records may have some of the attributes missed.
What could be the good comparison levels (ideally with frequencies)
or any recommendations on how to choose between attributes?
How can I measure the attributes correlation?
Beta Was this translation helpful? Give feedback.
All reactions