Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #62 #70
base: master
Are you sure you want to change the base?
Fix #62 #70
Changes from 5 commits
57a3fa2
81e41fe
aa97f7f
6a0ba9f
927e095
ba909d2
abf8738
415b597
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self, this I need to think about. Some of this complication was working around things that are now fixed in CRC.jl, if I remember right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, admittedly this line took some trial and error and is a little bit above my pay-grade. I managed to convince myself, but perhaps there's something cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I finally understand what's going on. Sorry it took a while.
re
constructs another Skip containing the gradient, andbacking
turns that into a NamedTuple with the same field names, which is what Tangent wants.The only way I can see this failing is this: If the primal type's constructor is fussy about what types it can accept, then it may not be happy to accept something which is valid as its gradient. E.g. if there is only
Skip(::AbstractLayer)
, andre
tries to make one with aTangent
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! Yes, I struggled with that edge case too. Unfortunately I think it's quite tricky to work around. For example, suppose you have a user-defined
functor(m::MyModel) = (m.w,), w -> ...
. Then:MyModel
(or even aNamedTuple
of fields/values) withoutre
, as you do not know the corresponding field name given only(m.w,)
, butTangent
/Nothing
/etc. values in it's fields and will error beforebacking
can unpack it againAvoiding
re
would be ideal, but I think that would requirefunctor
to always returnNamedTuple
s on custom structs. I noticed that this is the default in@functor
, though, so maybe it's not such a painful requirement? In the mean time I can at least add a branch that would avoidre
for structs that arefunctor
ed toNamedTuple
s.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact there's another problem I didn't spot before, what a mess:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yikes. That's a good example, hits all the pain points at once. If I'm understanding correctly, the gradient should be
((a = [0.0, 4.0], b = nothing, c = nothing),)
, right?I think the problem is the
_trainmap
above; it populates thenothing
values from_trainable
(non-trainable fields) with the primal values, when they should beNoT
. That's how theb
and/orc
values get back in there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think
_trainmap
needs to do somethingisnothing(t) ? NoT : f(t, a)
here. That's wherec = [4.0, 5.0]
is coming from.But
b = [3.0]
is coming from this PR's trick of calling the reconstructor made by@functor
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. So on top of the modified
_trainmap
to fixc
, one would still have to filterbacking(re(y))
to replace repopulated primal values which aren'tfunctor
-ed withNoT
in order to fixb
.EDIT: But, based on the output of
Tangent{typeof(x), typeof(y)}(y)
, maybe the modified_trainmap
alone would be enough andbacking(re(y))
isn't needed after all, asTangent
will assignNoT
to omitted fields iny
automatically.EDIT 2: Never mind, that would still fail for children which aren't fields, like
Skip
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright pushed something that works for both
Skip
and yourTwoThirds
example (modified_trainmap
+ filteringbacking(re(y))
). But since it usesre
it would still fail for fussy constructors.