Skip to content
This repository has been archived by the owner on Feb 1, 2025. It is now read-only.

Pre-activation as output of VGGish #24

Open
eatsleepraverepeat opened this issue Nov 2, 2021 · 1 comment
Open

Pre-activation as output of VGGish #24

eatsleepraverepeat opened this issue Nov 2, 2021 · 1 comment

Comments

@eatsleepraverepeat
Copy link

eatsleepraverepeat commented Nov 2, 2021

Hello there,

when comparing this code to the one placed in tensorflow/models I've found that implementations use different layers as output of VGGish model (if considering activation as a separate layer),

yours:

nn.ReLU(True))

google's: https://github.com/tensorflow/models/blob/f32dea32e3e9d3de7ed13c9b16dc7a8fea3bd73d/research/audioset/vggish/vggish_slim.py#L104-L106 (activation_fn=None)

Also, it's mentioned in README

Note that the embedding layer does not include a final non-linear activation, so the embedding value is pre-activation

Changing output layer of VGGish in your implementation to pre-activation one (w/o RELU) makes embeddings (almost) equal in both cases, - raw and PCA'ed ones.

Thanks for porting though, great work!

@brentspell
Copy link

First, I would like to echo the kudos for publishing this port of VGGIsh. I am implementing a Fréchet Audio Distance (FAD) library and will definitely make use of it.

For anyone else who arrives here looking for a workaround, the final ReLU can be removed from the pretrained VGGish model with the following snippet:

vggish = pt.hub.load("harritaylor/torchvggish", "vggish")
vggish.embeddings = pt.nn.Sequential(*list(vggish.embeddings.children())[:-1])

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants