Loss calculation should not permanently change shapes of logits and targets #10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I want to talk about loss calculation in the forward method:
the logic behind the change of shapes was explained in the video, but I don't understand why we change shapes permanently since it's only needed for the loss calculation only.
In the training loop:
xb
has shape of (B, T),yb
has shape (B, T) and one might expect thatlogits
will have (B, T, ...) where in fact it's (B*T, ...).The code works flawlessly simply because we don't do anything with logits, yet I think it's not the most desired behavior, especially considering that this is basically an educational code (or to be precise: an accompanying code for an educational video).
Additional notes
The same could be done with transposing, but since we have batches we need to use Tensor.mT:
Based on my benchmarks (on intel cpu)
.mT
is faster than.view
, butF.cross_entropy
works significantly faster with 2-D tensor rather than with 3-D, so in this case combo of 'view+F.cross_entropy' is preferable.And this PR includes also one small renaming:
FeedFoward -> FeedForward
I guess Foward is like Forward, but only if you have a thick British accent 😃