Using Splink for adress Matching #1962
-
Hello , I'm actually doing a final project of my School . Somehow at my company I was confronted with matching things without true keys so to speak , matching data from outside . One of my core task is to match adress my company has with BAN adress "Base adresse nationale" which plan to be something big for adress in the next future . The issue being , it is mayor that writes adresses . So I came confronted with the case of a street name like "rue du 18 mai 1948" which has more than 8 way to be written across city . Even more annoying , some city have "impasse pasteur" and "rue pasteur" which are different . But in a lot of case the road type can be switched . So I implemanted something in python which use different metrics LCS ratio , ratio , Jaro-Winkler (wasn't of much use), Levenshtein, Damereau Levenshtein . Sorry for the long message , but i felt likethis wasn't trivial and i needed to further explain my case |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
If you only have an address and no other column, Splink is unlikely to work terribly well for you because it violates the conditional indepenedence assumption of the Fellegi Sunter model. it's essentially like having a single column of data - see here |
Beta Was this translation helpful? Give feedback.
-
For any future readers, there's an example here of high performance, accurate address matching using Spink: It's designed with UK addresses in mind but could be used for any addresses with a few tweaks |
Beta Was this translation helpful? Give feedback.
For any future readers, there's an example here of high performance, accurate address matching using Spink:
https://github.com/RobinL/address_matching_example/
It's designed with UK addresses in mind but could be used for any addresses with a few tweaks