Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss calculation should not permanently change shapes of logits and targets #10

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Andrei-Aksionov
Copy link
Contributor

I want to talk about loss calculation in the forward method:

else:
    B, T, C = logits.shape
    logits = logits.view(B*T, C)
    targets = targets.view(B*T)
    loss = F.cross_entropy(logits, targets)

return logits, loss

the logic behind the change of shapes was explained in the video, but I don't understand why we change shapes permanently since it's only needed for the loss calculation only.

In the training loop:

    # sample a batch of data
    xb, yb = get_batch('train')

    # evaluate the loss
    logits, loss = model(xb, yb)

xb has shape of (B, T), yb has shape (B, T) and one might expect that logits will have (B, T, ...) where in fact it's (B*T, ...).

The code works flawlessly simply because we don't do anything with logits, yet I think it's not the most desired behavior, especially considering that this is basically an educational code (or to be precise: an accompanying code for an educational video).

Additional notes

The same could be done with transposing, but since we have batches we need to use Tensor.mT:

else:
    loss = F.cross_entropy(logits.mT, targets)

Based on my benchmarks (on intel cpu) .mT is faster than .view, but F.cross_entropy works significantly faster with 2-D tensor rather than with 3-D, so in this case combo of 'view+F.cross_entropy' is preferable.

And this PR includes also one small renaming:
FeedFoward -> FeedForward

I guess Foward is like Forward, but only if you have a thick British accent 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant