Replies: 1 comment
-
Edit: This is now possible in Previous answer: Yes - it's reasonably common to override the ELSE part of the model. Disagreement penalties are driven by the At the moment there isn't a great way to manually set the Here's an alternative that will work for now, but is not part of the public API: import splink.comparison_library as cl
from splink import DuckDBAPI, Linker, SettingsCreator, block_on, splink_datasets
from splink.datasets import splink_dataset_labels
db_api = DuckDBAPI()
settings = SettingsCreator(
link_type="dedupe_only",
comparisons=[
cl.ExactMatch("first_name"),
cl.ExactMatch("surname"),
cl.ExactMatch("dob"),
cl.ExactMatch("city"),
cl.ExactMatch("email"),
],
blocking_rules_to_generate_predictions=[
block_on("first_name"),
block_on("surname"),
],
retain_matching_columns=True,
retain_intermediate_calculation_columns=True,
)
linker = Linker(splink_datasets.fake_1000, settings, db_api)
linker.training.estimate_probability_two_random_records_match(
[block_on("first_name", "surname")],
recall=0.7,
)
linker.training.estimate_u_using_random_sampling(max_pairs=1e6)
linker.training.estimate_parameters_using_expectation_maximisation(
block_on("first_name", "surname")
)
linker.training.estimate_parameters_using_expectation_maximisation(block_on("dob"))
surname_comparison = linker._settings_obj._get_comparison_by_output_column_name("surname")
else_comparison_level = surname_comparison._get_comparison_level_by_comparison_vector_value(0)
else_comparison_level._m_probability = 0.00001
linker.visualisations.match_weights_chart() |
Beta Was this translation helpful? Give feedback.
-
In our use case, records have temporal attributes (which are valid for a relatively short period).
Disagreement on these attributes should not introduce a big penalty.
Is it normal (common) to manually decrease ELSE in the model?
If so, what is the best way to do it? Change in JSON and load again?
Beta Was this translation helpful? Give feedback.
All reactions