Recurrent
keras.layers.recurrent.Recurrent(weights=None, return_sequences=False, go_backwards=False, stateful=False, input_dim=None, input_length=None)
Abstract base class for recurrent layers. Do not use in a model -- it's not a functional layer!
All recurrent layers (GRU, LSTM, SimpleRNN) also follow the specifications of this class and accept the keyword arguments listed below.
Input shape
3D tensor with shape (nb_samples, timesteps, input_dim)
.
Output shape
- if
return_sequences
: 3D tensor with shape(nb_samples, timesteps, output_dim)
. - else, 2D tensor with shape
(nb_samples, output_dim)
.
Arguments
- weights: list of numpy arrays to set as initial weights.
The list should have 3 elements, of shapes:
[(input_dim, output_dim), (output_dim, output_dim), (output_dim,)]
. - return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- go_backwards: Boolean (default False). If True, process the input sequence backwards.
- stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
- input_dim: dimensionality of the input (integer).
This argument (or alternatively, the keyword argument
input_shape
) is required when using this layer as the first layer in a model. - input_length: Length of input sequences, to be specified
when it is constant.
This argument is required if you are going to connect
Flatten
thenDense
layers upstream (without it, the shape of the dense outputs cannot be computed). Note that if the recurrent layer is not the first layer in your model, you would need to specify the input length at the level of the first layer (e.g. via theinput_shape
argument)
Masking
This layer supports masking for input data with a variable number
of timesteps. To introduce masks to your data,
use an Embedding layer with the mask_zero
parameter
set to True
.
TensorFlow warning
For the time being, when using the TensorFlow backend,
the number of timesteps used must be specified in your model.
Make sure to pass an input_length
int argument to your
recurrent layer (if it comes first in your model),
or to pass a complete input_shape
argument to the first layer
in your model otherwise.
Note on using statefulness in RNNs
You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one mapping between samples in different successive batches.
To enable statefulness:
- specify stateful=True
in the layer constructor.
- specify a fixed batch size for your model, by passing
a batch_input_shape=(...)
to the first layer in your model.
This is the expected shape of your inputs including the batch size.
It should be a tuple of integers, e.g. (32, 10, 100)
.
To reset the states of your model, call .reset_states()
on either
a specific layer, or on your entire model.
Note on using dropout with TensorFlow
When using the TensorFlow backend, specify a fixed batch size for your model following the notes on statefulness RNNs.
SimpleRNN
keras.layers.recurrent.SimpleRNN(output_dim, init='glorot_uniform', inner_init='orthogonal', activation='tanh', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0)
Fully-connected RNN where the output is to be fed back to input.
Arguments
- output_dim: dimension of the internal projections and the final output.
- init: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: initializations).
- inner_init: initialization function of the inner cells.
- activation: activation function. Can be the name of an existing function (str), or a Theano function (see: activations).
- W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the input weights matrices.
- U_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the recurrent weights matrices.
- b_regularizer: instance of WeightRegularizer, applied to the bias.
- dropout_W: float between 0 and 1. Fraction of the input units to drop for input gates.
- dropout_U: float between 0 and 1. Fraction of the input units to drop for recurrent connections.
References
GRU
keras.layers.recurrent.GRU(output_dim, init='glorot_uniform', inner_init='orthogonal', activation='tanh', inner_activation='hard_sigmoid', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0)
Gated Recurrent Unit - Cho et al. 2014.
Arguments
- output_dim: dimension of the internal projections and the final output.
- init: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: initializations).
- inner_init: initialization function of the inner cells.
- activation: activation function. Can be the name of an existing function (str), or a Theano function (see: activations).
- inner_activation: activation function for the inner cells.
- W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the input weights matrices.
- U_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the recurrent weights matrices.
- b_regularizer: instance of WeightRegularizer, applied to the bias.
- dropout_W: float between 0 and 1. Fraction of the input units to drop for input gates.
- dropout_U: float between 0 and 1. Fraction of the input units to drop for recurrent connections.
References
- On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
LSTM
keras.layers.recurrent.LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one', activation='tanh', inner_activation='hard_sigmoid', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0)
Long-Short Term Memory unit - Hochreiter 1997.
For a step-by-step description of the algorithm, see this tutorial.
Arguments
- output_dim: dimension of the internal projections and the final output.
- init: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: initializations).
- inner_init: initialization function of the inner cells.
- forget_bias_init: initialization function for the bias of the forget gate. Jozefowicz et al. recommend initializing with ones.
- activation: activation function. Can be the name of an existing function (str), or a Theano function (see: activations).
- inner_activation: activation function for the inner cells.
- W_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the input weights matrices.
- U_regularizer: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the recurrent weights matrices.
- b_regularizer: instance of WeightRegularizer, applied to the bias.
- dropout_W: float between 0 and 1. Fraction of the input units to drop for input gates.
- dropout_U: float between 0 and 1. Fraction of the input units to drop for recurrent connections.
References