Skip to content

Trying to understand the behaviour for clustering on results of find_matches_to_new_records #2282

Answered by ADBond
bpandey-CS asked this question in Q&A
Discussion options

You must be logged in to vote

This looks like something of a bug. Having a look at this it seems to be due to the fact that the clustering uses the input data set (in this case the golden records) as a starting point, and along the way there is an assumption that these are the full set of nodes. We are planning to make some behind-the-scenes adjustments to clustering, as well as allowing an option to cluster without a linker, and will definitely keep this in mind so we can remove this issue.

In the meantime as a workaround, you should be able to circumvent this by running the clustering with a new linker (for Splink 4 users reading, set this up with a new DatabaseAPI as well), with input of your new data df_new:

df_inc =

Replies: 4 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by bpandey-CS
Comment options

You must be logged in to vote
1 reply
@ADBond
Comment options

Comment options

You must be logged in to vote
1 reply
@nabebaye
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants