Recurrent
keras.layers.recurrent.Recurrent(return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, implementation=0)
Abstract base class for recurrent layers.
Do not use in a model -- it's not a valid layer!
Use its children classes LSTM
, GRU
and SimpleRNN
instead.
All recurrent layers (LSTM
, GRU
, SimpleRNN
) also
follow the specifications of this class and accept
the keyword arguments listed below.
Example
# as the first layer in a Sequential model
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
# now model.output_shape == (None, 32)
# note: `None` is the batch dimension.
# for subsequent layers, no need to specify the input size:
model.add(LSTM(16))
# to stack recurrent layers, you must use return_sequences=True
# on any recurrent layer that feeds into another recurrent layer.
# note that you only need to specify the input size on the first layer.
model = Sequential()
model.add(LSTM(64, input_dim=64, input_length=10, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(10))
Arguments
- weights: list of Numpy arrays to set as initial weights.
The list should have 3 elements, of shapes:
[(input_dim, output_dim), (output_dim, output_dim), (output_dim,)]
. - return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- return_state: Boolean. Whether to return the last state in addition to the output.
- go_backwards: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
- stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
- unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
- implementation: one of {0, 1, or 2}.
If set to 0, the RNN will use
an implementation that uses fewer, larger matrix products,
thus running faster on CPU but consuming more memory.
If set to 1, the RNN will use more matrix products,
but smaller ones, thus running slower
(may actually be faster on GPU) while consuming less memory.
If set to 2 (LSTM/GRU only),
the RNN will combine the input gate,
the forget gate and the output gate into a single matrix,
enabling more time-efficient parallelization on the GPU.
- Note: RNN dropout must be shared for all gates, resulting in a slightly reduced regularization.
- input_dim: dimensionality of the input (integer).
This argument (or alternatively, the keyword argument
input_shape
) is required when using this layer as the first layer in a model. - input_length: Length of input sequences, to be specified
when it is constant.
This argument is required if you are going to connect
Flatten
thenDense
layers upstream (without it, the shape of the dense outputs cannot be computed). Note that if the recurrent layer is not the first layer in your model, you would need to specify the input length at the level of the first layer (e.g. via theinput_shape
argument)
Input shapes
3D tensor with shape (batch_size, timesteps, input_dim)
,
(Optional) 2D tensors with shape (batch_size, output_dim)
.
Output shape
- if
return_state
: a list of tensors. The first tensor is the output. The remaining tensors are the last states, each with shape(batch_size, units)
. - if
return_sequences
: 3D tensor with shape(batch_size, timesteps, units)
. - else, 2D tensor with shape
(batch_size, units)
.
Masking
This layer supports masking for input data with a variable number
of timesteps. To introduce masks to your data,
use an Embedding layer with the mask_zero
parameter
set to True
.
Note on using statefulness in RNNs
You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one mapping between samples in different successive batches.
To enable statefulness:
- specify stateful=True
in the layer constructor.
- specify a fixed batch size for your model, by passing
if sequential model:
batch_input_shape=(...)
to the first layer in your model.
else for functional model with 1 or more Input layers:
batch_shape=(...)
to all the first layers in your model.
This is the expected shape of your inputs
including the batch size.
It should be a tuple of integers, e.g. (32, 10, 100)
.
- specify shuffle=False
when calling fit().
To reset the states of your model, call .reset_states()
on either
a specific layer, or on your entire model.
Note on specifying the initial state of RNNs
You can specify the initial state of RNN layers symbolically by
calling them with the keyword argument initial_state
. The value of
initial_state
should be a tensor or list of tensors representing
the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by
calling reset_states
with the keyword argument states
. The value of
states
should be a numpy array or list of numpy arrays representing
the initial state of the RNN layer.
SimpleRNN
keras.layers.recurrent.SimpleRNN(units, activation='tanh', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)
Fully-connected RNN where the output is to be fed back to input.
Arguments
- units: Positive integer, dimensionality of the output space.
- activation: Activation function to use
(see activations).
If you pass None, no activation is applied
(ie. "linear" activation:
a(x) = x
). - use_bias: Boolean, whether the layer uses a bias vector.
- kernel_initializer: Initializer for the
kernel
weights matrix, used for the linear transformation of the inputs. (see initializers). - recurrent_initializer: Initializer for the
recurrent_kernel
weights matrix, used for the linear transformation of the recurrent state. (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - recurrent_regularizer: Regularizer function applied to
the
recurrent_kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to
the
kernel
weights matrix (see constraints). - recurrent_constraint: Constraint function applied to
the
recurrent_kernel
weights matrix (see constraints). - bias_constraint: Constraint function applied to the bias vector (see constraints).
- dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
- recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
References
GRU
keras.layers.recurrent.GRU(units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)
Gated Recurrent Unit - Cho et al. 2014.
Arguments
- units: Positive integer, dimensionality of the output space.
- activation: Activation function to use
(see activations).
If you pass None, no activation is applied
(ie. "linear" activation:
a(x) = x
). - recurrent_activation: Activation function to use for the recurrent step (see activations).
- use_bias: Boolean, whether the layer uses a bias vector.
- kernel_initializer: Initializer for the
kernel
weights matrix, used for the linear transformation of the inputs. (see initializers). - recurrent_initializer: Initializer for the
recurrent_kernel
weights matrix, used for the linear transformation of the recurrent state. (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - recurrent_regularizer: Regularizer function applied to
the
recurrent_kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to
the
kernel
weights matrix (see constraints). - recurrent_constraint: Constraint function applied to
the
recurrent_kernel
weights matrix (see constraints). - bias_constraint: Constraint function applied to the bias vector (see constraints).
- dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
- recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
References
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
LSTM
keras.layers.recurrent.LSTM(units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)
Long-Short Term Memory unit - Hochreiter 1997.
For a step-by-step description of the algorithm, see this tutorial.
Arguments
- units: Positive integer, dimensionality of the output space.
- activation: Activation function to use
(see activations).
If you pass None, no activation is applied
(ie. "linear" activation:
a(x) = x
). - recurrent_activation: Activation function to use for the recurrent step (see activations).
- use_bias: Boolean, whether the layer uses a bias vector.
- kernel_initializer: Initializer for the
kernel
weights matrix, used for the linear transformation of the inputs. (see initializers). - recurrent_initializer: Initializer for the
recurrent_kernel
weights matrix, used for the linear transformation of the recurrent state. (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- unit_forget_bias: Boolean.
If True, add 1 to the bias of the forget gate at initialization.
Setting it to true will also force
bias_initializer="zeros"
. This is recommended in Jozefowicz et al. - kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - recurrent_regularizer: Regularizer function applied to
the
recurrent_kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to
the
kernel
weights matrix (see constraints). - recurrent_constraint: Constraint function applied to
the
recurrent_kernel
weights matrix (see constraints). - bias_constraint: Constraint function applied to the bias vector (see constraints).
- dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
- recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
References