LSTM¶
- 
class torch.nn.LSTM(*args, **kwargs)[source]¶
- Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. - For each element in the input sequence, each layer computes the following function: - where is the hidden state at time t, is the cell state at time t, is the input at time t, is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and , , , are the input, forget, cell, and output gates, respectively. is the sigmoid function, and is the Hadamard product. - In a multilayer LSTM, the input of the -th layer ( ) is the hidden state of the previous layer multiplied by dropout where each is a Bernoulli random variable which is with probability - dropout.- If - proj_size > 0is specified, LSTM with projections will be used. This changes the LSTM cell in the following way. First, the dimension of will be changed from- hidden_sizeto- proj_size(dimensions of will be changed accordingly). Second, the output hidden state of each layer will be multiplied by a learnable projection matrix: . Note that as a consequence of this, the output of LSTM network will be of different shape as well. See Inputs/Outputs sections below for exact dimensions of all variables. You can find more details in https://arxiv.org/abs/1402.1128.- Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- num_layers – Number of recurrent layers. E.g., setting - num_layers=2would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
- batch_first – If - True, then the input and output tensors are provided as (batch, seq, feature). Default:- False
- dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to - dropout. Default: 0
- bidirectional – If - True, becomes a bidirectional LSTM. Default:- False
- proj_size – If - > 0, will use LSTM with projections of corresponding size. Default: 0
 
 - Inputs: input, (h_0, c_0)
- input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See - torch.nn.utils.rnn.pack_padded_sequence()or- torch.nn.utils.rnn.pack_sequence()for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1. If - proj_size > 0was specified, the shape has to be (num_layers * num_directions, batch, proj_size).
- c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch. - If (h_0, c_0) is not provided, both h_0 and c_0 default to zero. 
 
- Outputs: output, (h_n, c_n)
- output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a - torch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence. If- proj_size > 0was specified, output shape will be (seq_len, batch, num_directions * proj_size).- For the unpacked case, the directions can be separated using - output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. If - proj_size > 0was specified,- h_nshape will be (num_layers * num_directions, batch, proj_size).- Like output, the layers can be separated using - h_n.view(num_layers, num_directions, batch, hidden_size)and similarly for c_n.
- c_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len. 
 
 - Variables
- ~LSTM.weight_ih_l[k] – the learnable input-hidden weights of the layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0. Otherwise, the shape is (4*hidden_size, num_directions * hidden_size) 
- ~LSTM.weight_hh_l[k] – the learnable hidden-hidden weights of the layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size). If - proj_size > 0was specified, the shape will be (4*hidden_size, proj_size).
- ~LSTM.bias_ih_l[k] – the learnable input-hidden bias of the layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size) 
- ~LSTM.bias_hh_l[k] – the learnable hidden-hidden bias of the layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size) 
- ~LSTM.weight_hr_l[k] – the learnable projection weights of the layer of shape (proj_size, hidden_size). Only present when - proj_size > 0was specified.
 
 - Note - All the weights and biases are initialized from where - Warning - There are known non-determinism issues for RNN functions on some versions of cuDNN and CUDA. You can enforce deterministic behavior by setting the following environment variables: - On CUDA 10.1, set environment variable - CUDA_LAUNCH_BLOCKING=1. This may affect performance.- On CUDA 10.2 or later, set environment variable (note the leading colon symbol) - CUBLAS_WORKSPACE_CONFIG=:16:8or- CUBLAS_WORKSPACE_CONFIG=:4096:2.- See the cuDNN 8 Release Notes for more information. - Orphan
 - Note - If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype - torch.float164) V100 GPU is used, 5) input data is not in- PackedSequenceformat persistent algorithm can be selected to improve performance.- Examples: - >>> rnn = nn.LSTM(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> c0 = torch.randn(2, 3, 20) >>> output, (hn, cn) = rnn(input, (h0, c0))