You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
prediction_type (or something like that): e.g. yield prediction, only_mapped_reaction, condition_prediction
If user only wants the mapped reaction strings, we should by-pass the sanity-checks for the reaction conditions, ultimately resulting in a larger dataset to work with. Likewise for yield prediction (we remove reactions without yields) etc.
Data_set: only_uspto, all_available
For benchmarking purposes, it would be great to have an option that always generates the same dataset (e.g. only USPTO data), and another option that just includes all data currently stored in USPTO
The text was updated successfully, but these errors were encountered:
Instead of having a 'prediction type', let's create two flat file benchmarks, both just extracting USPTO data, but one with default settings that removes/handles reactions with uncommon molecules, and another with all the arg settings set to 0.
When creating flat files for benchmarking, we should creat train/val/test splits (80/10/10), splitting the data in 3 different ways: random, temporal (by grant date), and rxn class (both by super class (very hard) and by sub-classes (medium difficulty)).
Add 2 new args:
The text was updated successfully, but these errors were encountered: