Recurrent Layers

Recurrent

keras.layers.recurrent.Recurrent(return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, implementation=0)

Abstract base class for recurrent layers.

Do not use in a model -- it's not a valid layer! Use its children classes LSTM, GRU and SimpleRNN instead.

All recurrent layers (LSTM, GRU, SimpleRNN) also follow the specifications of this class and accept the keyword arguments listed below.

Example

# as the first layer in a Sequential model
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
# now model.output_shape == (None, 32)
# note: `None` is the batch dimension.

# for subsequent layers, no need to specify the input size:
model.add(LSTM(16))

# to stack recurrent layers, you must use return_sequences=True
# on any recurrent layer that feeds into another recurrent layer.
# note that you only need to specify the input size on the first layer.
model = Sequential()
model.add(LSTM(64, input_dim=64, input_length=10, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(10))

Arguments

weights: list of Numpy arrays to set as initial weights. The list should have 3 elements, of shapes: [(input_dim, output_dim), (output_dim, output_dim), (output_dim,)].
return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
return_state: Boolean. Whether to return the last state in addition to the output.
go_backwards: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
implementation: one of {0, 1, or 2}. If set to 0, the RNN will use an implementation that uses fewer, larger matrix products, thus running faster on CPU but consuming more memory. If set to 1, the RNN will use more matrix products, but smaller ones, thus running slower (may actually be faster on GPU) while consuming less memory. If set to 2 (LSTM/GRU only), the RNN will combine the input gate, the forget gate and the output gate into a single matrix, enabling more time-efficient parallelization on the GPU.
- Note: RNN dropout must be shared for all gates, resulting in a slightly reduced regularization.
input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword argument input_shape) is required when using this layer as the first layer in a model.
input_length: Length of input sequences, to be specified when it is constant. This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed). Note that if the recurrent layer is not the first layer in your model, you would need to specify the input length at the level of the first layer (e.g. via the input_shape argument)

Input shapes

3D tensor with shape (batch_size, timesteps, input_dim), (Optional) 2D tensors with shape (batch_size, output_dim).

Output shape

if return_state: a list of tensors. The first tensor is the output. The remaining tensors are the last states, each with shape (batch_size, units).
if return_sequences: 3D tensor with shape (batch_size, timesteps, units).
else, 2D tensor with shape (batch_size, units).

Masking

This layer supports masking for input data with a variable number of timesteps. To introduce masks to your data, use an Embedding layer with the mask_zero parameter set to True.

Note on using statefulness in RNNs

You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one mapping between samples in different successive batches.

To enable statefulness: - specify stateful=True in the layer constructor. - specify a fixed batch size for your model, by passing if sequential model: batch_input_shape=(...) to the first layer in your model. else for functional model with 1 or more Input layers: batch_shape=(...) to all the first layers in your model. This is the expected shape of your inputs including the batch size. It should be a tuple of integers, e.g. (32, 10, 100). - specify shuffle=False when calling fit().

To reset the states of your model, call .reset_states() on either a specific layer, or on your entire model.

Note on specifying the initial state of RNNs

You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.

You can specify the initial state of RNN layers numerically by calling reset_states with the keyword argument states. The value of states should be a numpy array or list of numpy arrays representing the initial state of the RNN layer.

[source]

SimpleRNN

keras.layers.recurrent.SimpleRNN(units, activation='tanh', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)

Fully-connected RNN where the output is to be fed back to input.

Arguments

units: Positive integer, dimensionality of the output space.
activation: Activation function to use (see activations). If you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs. (see initializers).
recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. (see initializers).
bias_initializer: Initializer for the bias vector (see initializers).
kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights matrix (see regularizer).
bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
kernel_constraint: Constraint function applied to the kernel weights matrix (see constraints).
recurrent_constraint: Constraint function applied to the recurrent_kernel weights matrix (see constraints).
bias_constraint: Constraint function applied to the bias vector (see constraints).
dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.

References

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

[source]

GRU

keras.layers.recurrent.GRU(units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)

Gated Recurrent Unit - Cho et al. 2014.

Arguments

units: Positive integer, dimensionality of the output space.
activation: Activation function to use (see activations). If you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
recurrent_activation: Activation function to use for the recurrent step (see activations).
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs. (see initializers).
recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. (see initializers).
bias_initializer: Initializer for the bias vector (see initializers).
kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights matrix (see regularizer).
bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
kernel_constraint: Constraint function applied to the kernel weights matrix (see constraints).
recurrent_constraint: Constraint function applied to the recurrent_kernel weights matrix (see constraints).
bias_constraint: Constraint function applied to the bias vector (see constraints).
dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.

References

[source]

LSTM

keras.layers.recurrent.LSTM(units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)

Long-Short Term Memory unit - Hochreiter 1997.

For a step-by-step description of the algorithm, see this tutorial.

Arguments

units: Positive integer, dimensionality of the output space.
activation: Activation function to use (see activations). If you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
recurrent_activation: Activation function to use for the recurrent step (see activations).
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs. (see initializers).
recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. (see initializers).
bias_initializer: Initializer for the bias vector (see initializers).
unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force bias_initializer="zeros". This is recommended in Jozefowicz et al.
kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights matrix (see regularizer).
bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
kernel_constraint: Constraint function applied to the kernel weights matrix (see constraints).
recurrent_constraint: Constraint function applied to the recurrent_kernel weights matrix (see constraints).
bias_constraint: Constraint function applied to the bias vector (see constraints).
dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.

References