-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict()
fails with threshold probability 0
#2420
Comments
Hi, I was wondering if I might be able to help with this. I'm a beginner with opensource contributions but I have had a look at why this math error may be occuring. It seems that where the predict method is defined it calls a function predict_from_comparison_vectors_sqls_using_settings() which passes in threshold_match_probability(). If this is zero it gets passed through a few other functions until it gets to predict_from_comparison_vectors_sqls() which uses a log2 and so a 0 here gets a math error. This could be a complete lack of understanding but I was thinking something along the lines of a check for threshold_match_probability() > 0 within the predict_from_comparison_vectors_sqls_using_settings() function so that its validated every time the function is called. Or, a simple way would be to have a check for 0 just before the predict_from_comparison_vectors_sqls_using_settings() is called enusring only valid values get passed in. I would imagine having one within predict_from_comparison_vectors_sqls_using_settings() would be better design as it ensures every call is validated. Please let me know if this approach seems useful, or if I’m on the right track. I'd love to try and code the solution and submit a PR if you think it makes sense. Thanks |
Hello! Thanks yeah you're along the right lines. The simplest solution is to explicitly treat a threshold match probability of 0 as a null match probability. If you look here: splink/splink/internals/predict.py Lines 101 to 114 in 8b44ab5
It's probably jsut a case of refactoring this logic so if the threshold is 0, it's treated as if it's None |
Hi RobinL, Thanks for getting back. Yeah that makes a lot of sense acctually and a clean way of doing it. I'll look into refactoring the logic around threshold_match_probability so that if the value is 0, it's handled the same way as None. More speicifcally, adjusting that part in predict.py to ensure 0 values are treated as null. Once it's ready, I'll submit a PR for review. Cheers |
Handle threshold_match_probablity 0 in predict() #2420
Closed by #2425 |
probability -> bayes factor -> log2 ->
ValueError: math domain error
Loosely related: #1716.
Similar issues: #2333, #2334.
The text was updated successfully, but these errors were encountered: