Skip to content

Is there a way to parallel estimate_parameters_using_expectation_maimisation with DuckDB? #1269

Answered by RobinL
Mahora65 asked this question in Q&A
Discussion options

You must be logged in to vote

Duckdb should, at least in theory, run in parallel for each individual training session, although I recognise in practice it doesn't always use all cores. I think we'll find as duckdb matures it becomes better and better at parallelizing queries.

Unfortunately there isn't a straightforward way to run multiple training sessions in parallel due to the way Splink updates parameter estimates after each training session.

It's also worth noting from a performance point of view that blocking rules for predictions and blocking rules for em training might want to be different. See https://moj-analytical-services.github.io/splink/topic_guides/blocking_rules.html

Finally, it's worth noting that Spar…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Mahora65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants