Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the model can only train on size of [64, 64]? #99

Open
ohhh-yang opened this issue Aug 16, 2024 · 2 comments
Open

Why the model can only train on size of [64, 64]? #99

ohhh-yang opened this issue Aug 16, 2024 · 2 comments

Comments

@ohhh-yang
Copy link

When I try a size not equal to [64, 64], it comes below fault:
File "/home/PyTorch-VAE/models/vanilla_vae.py", line 122, in forward
mu, log_var = self.encode(input)
File "/home/PyTorch-VAE/models/vanilla_vae.py", line 91, in encode
mu = self.fc_mu(result)
File "/home/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/home/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 dim 1 must match mat2 dim 0

@unistdJRZ
Copy link

see the models folder, all the model's latent sapce a setting to fit the feature map size that 64x64 image achived

@KbKuuhaku
Copy link

KbKuuhaku commented Jan 16, 2025

self.fc_mu and self.fc_var are two linear layers that receive flatten feature map as input.

if hidden_dims is None:
    hidden_dims = [32, 64, 128, 256, 512]

...

self.fc_mu = nn.Linear(hidden_dims[-1]*4, latent_dim)
self.fc_var = nn.Linear(hidden_dims[-1]*4, latent_dim)


...

result = self.encoder(input)
result = torch.flatten(result, start_dim=1)

# Split the result into mu and var components
# of the latent Gaussian distribution
mu = self.fc_mu(result)
log_var = self.fc_var(result)

return [mu, log_var]

The encoder reduces the size of input images from (B, 3, W, W) to (B, C, W / 32, W / 32).

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (1): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
)

Since hidden_dims is [32, 64, 128, 256, 512] by default, C will be 512, and hidden_dims[-1] * 4 will be 2048.

512 * (W / 32) * (W / 32) = 2048. So the input size W must be 64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants