-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicate scores for major papers using Frechet Audio Distance #3
Comments
I also encountered the same issue recently. |
Doesn't seem like there is a re-train from their last modified date: https://storage.googleapis.com/audioset Let me do a further diff check on both models. |
Did a check based on My results are pretty close to the originals:
@yoyololicon do you have a false case that you could share? I could probably do more digging in to see if there are any failure modes. |
@gudgud96 import numpy as np
import torch
import tensorflow as tf
import tensorflow_hub as hub
model_torch = torch.hub.load('harritaylor/torchvggish', 'vggish')
model_torch.postprocess = False
model_torch.eval()
model_tf = hub.load("https://tfhub.dev/google/vggish/1")
sample = np.random.uniform(-1, 1, size=16000*5)
with torch.no_grad():
torch_embeddings = model_torch(sample, 16000).cpu().numpy()
tf_embeddings = model_tf(sample)
np.linalg.norm(torch_embeddings-tf_embeddings, axis=1).mean() |
@yoyololicon Ran your script, the issue is that As suggested by @brentspell (credits to Brent once again!) we can disable the final ReLU layer in torchvggish, I have used it in my implementation as well: model_torch = torch.hub.load('harritaylor/torchvggish', 'vggish')
model_torch.postprocess = False
model_torch.embeddings = nn.Sequential(*list(model_torch.embeddings.children())[:-1])
model_torch.eval() You should be able to see that the difference is very minimal in this case. Hope it helps! |
@gudgud96 Oh, I completely miss this. Thanks for pointing it out! |
Close issue for now, as basic test on sine tones can pass. If there are queries regarding accuracies / interest towards replicating exact numbers of papers, we shall re-open this issue. |
As mentioned in fcaspe/ddx7#1, I am unable to replicate the FAD score to a satisfactory level yet as reported in the paper.
Need further investigation on whether the diff is due to inherent implementation diffs compared to the Google version, or diffs outside of FAD calculation. Hence I decide to look into some major works to do a more detailed benchmark of the FAD scores reported VS calculated here. Candidates to be listed (will start with DDX7), paper suggestions are welcomed.
The text was updated successfully, but these errors were encountered: