Why the model can only train on size of [64, 64]? #99

ohhh-yang · 2024-08-16T11:52:22Z

When I try a size not equal to [64, 64], it comes below fault:
File "/home/PyTorch-VAE/models/vanilla_vae.py", line 122, in forward
mu, log_var = self.encode(input)
File "/home/PyTorch-VAE/models/vanilla_vae.py", line 91, in encode
mu = self.fc_mu(result)
File "/home/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/home/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 dim 1 must match mat2 dim 0

unistdJRZ · 2024-12-22T07:44:33Z

see the models folder, all the model's latent sapce a setting to fit the feature map size that 64x64 image achived

KbKuuhaku · 2025-01-16T09:12:47Z

self.fc_mu and self.fc_var are two linear layers that receive flatten feature map as input.

if hidden_dims is None:
    hidden_dims = [32, 64, 128, 256, 512]

...

self.fc_mu = nn.Linear(hidden_dims[-1]*4, latent_dim)
self.fc_var = nn.Linear(hidden_dims[-1]*4, latent_dim)


...

result = self.encoder(input)
result = torch.flatten(result, start_dim=1)

# Split the result into mu and var components
# of the latent Gaussian distribution
mu = self.fc_mu(result)
log_var = self.fc_var(result)

return [mu, log_var]

The encoder reduces the size of input images from (B, 3, W, W) to (B, C, W / 32, W / 32).

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (1): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
  (4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.01)
  )
)

Since hidden_dims is [32, 64, 128, 256, 512] by default, C will be 512, and hidden_dims[-1] * 4 will be 2048.

512 * (W / 32) * (W / 32) = 2048. So the input size W must be 64.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the model can only train on size of [64, 64]? #99

Why the model can only train on size of [64, 64]? #99

ohhh-yang commented Aug 16, 2024

unistdJRZ commented Dec 22, 2024

KbKuuhaku commented Jan 16, 2025 •

edited

Loading

Why the model can only train on size of [64, 64]? #99

Why the model can only train on size of [64, 64]? #99

Comments

ohhh-yang commented Aug 16, 2024

unistdJRZ commented Dec 22, 2024

KbKuuhaku commented Jan 16, 2025 • edited Loading

KbKuuhaku commented Jan 16, 2025 •

edited

Loading