How to create blocking rules across differently named columns? #1960
-
I haven't seen this addressed in the documentation, so apologies if it's there and I've just missed it! I have two datasets with four different columns in each dataset that contain names, and also four different corresponding address/city/postal code columns. I'd like to add blocking rules, basically Is there a way when setting up blocking rules to have different column names used? All the examples I see in the documentation refer to instances where the columns in the different dataframes have the same names. This will eventually extend to the comparisons, as I'll need to compare multiple addresses and names against each other, but at the moment I'm just trying to figure out the blocking. Thanks! Joseph |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hiya, Yes, this is possible, here's a runnable example: (Note that when you provide a blocking rule as a string:
|
Beta Was this translation helpful? Give feedback.
Hiya,
Yes, this is possible, here's a runnable example:
(Note that when you provide a blocking rule as a string:
"l.city = r.city2"
then under the hood it turns into a sql join expression (INNER JOIN l.city=r.city2
))