-
Notifications
You must be signed in to change notification settings - Fork 130
Layers with special behavior on dynamic spatial axes
This is a list of layers with special behavior on dynamic spatial axes, i.e. axes with dynamic sequence lengths because considering the padding or sequence lengths is important for correct behavior. Or in general any layer where the output tensor (placeholder) depends on the sequence lengths.
-
SoftmaxOverSpatialLayer
, will make sure that the padded frames are masked away. BatchSoftmaxLayer
-
ReduceLayer
.sum
,max
etc will ignore the padded frames. -
MathNormLayer
. shares code withReduceLayer
internally. -
DotLayer
when reducing a dynamic spatial axes -
BatchNormLayer
(and batch norm in general on any layer) - (
NormLayer
actually should have special behavior and ignore padded frames, but incorrectly it does not currently (#575)) SliceNdLayer
SeqLenMaskLayer
FlattenBatchLayer
PostfixInTimeLayer
- (
CumsumLayer
withreverse=True
should ignore padded frames but currently does not (#574)) -
LossLayer
(deprecated). see below for losses -
RecLayer
withdirection=-1
-
SelfAttentionLayer
(deprecated) - (
LengthLayer
would return the sequence lengths)
(This list is currently incomplete.)
Sequence lengths also matter for the losses. For the framewise losses, it matters for the accumulation (it will ignore the padded frames), and obviously it also matters for all the sequence losses such as CTC.
Somewhat related is the option recurrent
on each layer class (or loss). recurrent=False
implies that sequence lengths do not matter as well as the ordering of frames. But this is not exactly the same. E.g. ConvLayer
has recurrent=True
but ConvLayer
does not make use of the sequence lengths.
The obvious example of a layer where the dynamic spatial axes do not matter is LinearLayer
.
(Partly related is the list of layers with special behavior for recurrent automatic optimization.)