-
Hello, i'm trying to use splink for record linking accross two datasets of individuals which have coordinates (CRS is British National Grid). I've been trying to use coordinates as a blocking tool finding all records within 20 meters, using |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Blocking rules need to be equality conditions for good performance see here: With a continuous numeric variable, probably you want to block on the rounded numbers Thinking about it, you actually need two blocking rules on the rounded values e.g. 0.9 and 1.1 are close, but if rounding up will round to different values. So you want to block on (say) rounded up, and also the value +0.5 rounded up |
Beta Was this translation helpful? Give feedback.
-
Thanks for this, i've been thinking that another way of checking this would be to create two grids of squares of x meters, one starting at coordinates |
Beta Was this translation helpful? Give feedback.
Blocking rules need to be equality conditions for good performance see here:
https://moj-analytical-services.github.io/splink/topic_guides/blocking/performance.html?h=equality#equi-join-conditions
With a continuous numeric variable, probably you want to block on the rounded numbers
Thinking about it, you actually need two blocking rules on the rounded values e.g. 0.9 and 1.1 are close, but if rounding up will round to different values. So you want to block on (say) rounded up, and also the value +0.5 rounded up