Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Transfer script #3

Open
Arthfael opened this issue Nov 28, 2022 · 1 comment
Open

Issue with Transfer script #3

Arthfael opened this issue Nov 28, 2022 · 1 comment

Comments

@Arthfael
Copy link

(My apologies, it is me again.)
I am trying to transfer the model in Pepper-main\trained_models\2019_guo_nci60\2019_guo_nci60_Coefficient_Predictor_Model.h5 to correct peptide intensities in one of my datasets *, however, the transfer script is consistently failing at the same stage:

>>> model = define_model(2438, 28)
-ValueError: Exception encountered when calling layer "custom_loss_layer_9" (type CustomLossLayer).
-
-Dimensions must be equal, but are 28 and 2 for '{{node custom_loss_layer_9/sub}} = Sub[T=DT_FLOAT]
-(custom_loss_layer_9/strided_slice, custom_loss_layer_9/mul)' with input shapes: [?,28], [?,2].
-
-Call arguments received by layer "custom_loss_layer_9" (type CustomLossLayer):
-  • y_true=tf.Tensor(shape=(None, 29), dtype=float32)
-  • y_pred=tf.Tensor(shape=(None, 1), dtype=float32)

Using run_count = 2 works. But then the next 2 lines break it further:

>>> target_model = define_model(5668, 612)
...
>>> tarmod = 'Pepper-main\trained_models\2019_guo_nci60\2019_guo_nci60_Coefficient_Predictor_Model.h5'
>>> target_model.load_weights(tarmod)
-ValueError: Cannot assign value to variable ' conv2d_14/kernel:0': Shape mismatch.
-The variable shape (7, 20, 1, 5), and the assigned value shape (10, 1, 3, 20) are incompatible.

(Looking into define_model, I can see that it behaves differently if run_count = 612. Why this specific value? Is this the number of MS runs in the example model provided?)

Running the function row by row, I can identify that it fails at
>>> my_custom_layer = CustomLossLayer()(inputs_label, output) # here can also initialize those var1, var2

So from this, I have a few questions:

  • Is run_count the number of MS runs, or something different, i.e. some neural network related run count?
  • Does the behavior I have described mean that to transfer a model, one should know the number of proteins and MS runs said model was trained on? Is there not a way to detect it from the h5 file?
  • Why are numbers of proteins and N or runs hard coded into the define_model function? Should I rewrite it to include a more explicit switch between its behavior for initializing the pre-trained model vs the target_model?
    Any help would be much appreciated.

Kind regards,

Armel

  • Please note too that I could not locate the file referenced in the script by line:
    target_model.load_weights('trained_models/all_TMT11_lumos_datasets/all_TMT11_lumos_datasets_Coefficient_Predictor_Model_run' + str(random_run) + '.h5')
    but that is ok, I was actually interested in testing any model, and my data was not TMT-labelled.
@BercesteDincer
Copy link
Collaborator

BercesteDincer commented Dec 5, 2022

Hi Armel,

Thank you very much for your interest in our package and we hope that the you will find the model useful! Below are some details to answer your questions, hope they are helpful and happy to follow up if you have more questions:

The folder transfer_experiments contains the joined full dataset that can be helpful to train on and some example calls which might be helpful.

1. The trained model is not available in the repo since it takes quite some space. Hopefully, retraining the model with a dataset from the list of datasets can give you a trained model.

  1. The function define_model(protein_count, run_count) is defined such at it takes the number of unique proteins in the dataset and the number of runs in the dataset. The current script defines two models:
model = define_model(2438, 28)
target_model = define_model(5668, 612)

Variable model denotes the model we want to use for the dataset we are transferring to and target_model denotes the pre-trained model (which we read the weights from). The numbers above are defined based on the dataset we are transferring to and the model we are loading weights from. You can modify these number based on which dataset you are pretraining on and which you are transferring to.

  1. Is run_count the number of MS runs, or something different, i.e. some neural network related run count?
    The run_count corresponds to the number of MS runs.

  2. Does the behavior I have described mean that to transfer a model, one should know the number of proteins and MS runs said model was trained on? Is there not a way to detect it from the h5 file?
    Yes, exactly, we need to pass the number of proteins and MS runs. If you are using our example calls, the given numbers should work. Otherwise, they would depend on which datasets you use.

  3. Why are numbers of proteins and N or runs hard coded into the define_model function? Should I rewrite it to include a more explicit switch between its behavior for initializing the pre-trained model vs the target_model?
    It might be helpful to implement a switch if you plan to run transfer experiments for multiple datasets.

I hope these are helpful. Again, let us know if any questions arise!

Best,
Ayse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants