Replicate scores for major papers using Frechet Audio Distance #3

gudgud96 · 2022-10-26T14:44:09Z

As mentioned in fcaspe/ddx7#1, I am unable to replicate the FAD score to a satisfactory level yet as reported in the paper.

Need further investigation on whether the diff is due to inherent implementation diffs compared to the Google version, or diffs outside of FAD calculation. Hence I decide to look into some major works to do a more detailed benchmark of the FAD scores reported VS calculated here. Candidates to be listed (will start with DDX7), paper suggestions are welcomed.

yoyolicoris · 2022-12-09T23:07:59Z

I also encountered the same issue recently.
Maybe Google had re-trained their VGGish model so the output can be different?
The torchvggish hasn't been updated for a long time.

gudgud96 · 2022-12-11T03:24:39Z

Doesn't seem like there is a re-train from their last modified date: https://storage.googleapis.com/audioset

Let me do a further diff check on both models.

gudgud96 · 2022-12-11T06:59:36Z

Did a check based on test_audio/ provided in google-research/frechet-audio-distance based on distorted sine waves:

https://github.com/google-research/google-research/blob/master/frechet_audio_distance/gen_test_files.py#L86

My results are pretty close to the originals:

	baseline vs test1	baseline vs test2
`google-research`	12.4375	4.7680
`frechet_audio_distance`	12.7398	4.9815

@yoyololicon do you have a false case that you could share? I could probably do more digging in to see if there are any failure modes.

yoyolicoris · 2022-12-11T13:16:46Z

@gudgud96
The test code I got from my colleague.
It basically compares the differences between the two embeddings, and we think the difference is significant.

import numpy as  np
import torch
import tensorflow as tf
import tensorflow_hub as hub

model_torch = torch.hub.load('harritaylor/torchvggish', 'vggish')
model_torch.postprocess = False
model_torch.eval()

model_tf = hub.load("https://tfhub.dev/google/vggish/1")

sample = np.random.uniform(-1, 1, size=16000*5)

with torch.no_grad():
    torch_embeddings = model_torch(sample, 16000).cpu().numpy()
tf_embeddings = model_tf(sample)

np.linalg.norm(torch_embeddings-tf_embeddings, axis=1).mean()

gudgud96 · 2022-12-14T14:30:09Z

@yoyololicon Ran your script, the issue is that torchvggish has an extra ReLU layer as compared to the google-research original implementation, see harritaylor/torchvggish#24

As suggested by @brentspell (credits to Brent once again!) we can disable the final ReLU layer in torchvggish, I have used it in my implementation as well:

model_torch = torch.hub.load('harritaylor/torchvggish', 'vggish')
model_torch.postprocess = False
model_torch.embeddings = nn.Sequential(*list(model_torch.embeddings.children())[:-1])
model_torch.eval()

You should be able to see that the difference is very minimal in this case. Hope it helps!

yoyolicoris · 2022-12-14T21:48:44Z

@gudgud96 Oh, I completely miss this. Thanks for pointing it out!

gudgud96 · 2023-01-24T16:15:30Z

Close issue for now, as basic test on sine tones can pass. If there are queries regarding accuracies / interest towards replicating exact numbers of papers, we shall re-open this issue.

gudgud96 self-assigned this Oct 26, 2022

gudgud96 closed this as completed Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicate scores for major papers using Frechet Audio Distance #3

Replicate scores for major papers using Frechet Audio Distance #3

gudgud96 commented Oct 26, 2022 •

edited

Loading

yoyolicoris commented Dec 9, 2022

gudgud96 commented Dec 11, 2022

gudgud96 commented Dec 11, 2022

yoyolicoris commented Dec 11, 2022

gudgud96 commented Dec 14, 2022

yoyolicoris commented Dec 14, 2022

gudgud96 commented Jan 24, 2023

Replicate scores for major papers using Frechet Audio Distance #3

Replicate scores for major papers using Frechet Audio Distance #3

Comments

gudgud96 commented Oct 26, 2022 • edited Loading

yoyolicoris commented Dec 9, 2022

gudgud96 commented Dec 11, 2022

gudgud96 commented Dec 11, 2022

yoyolicoris commented Dec 11, 2022

gudgud96 commented Dec 14, 2022

yoyolicoris commented Dec 14, 2022

gudgud96 commented Jan 24, 2023

gudgud96 commented Oct 26, 2022 •

edited

Loading