GRU¶
- 
class torch.nn.GRU(*args, **kwargs)[source]¶
- Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. - For each element in the input sequence, each layer computes the following function: - where is the hidden state at time t, is the input at time t, is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and , , are the reset, update, and new gates, respectively. is the sigmoid function, and is the Hadamard product. - In a multilayer GRU, the input of the -th layer ( ) is the hidden state of the previous layer multiplied by dropout where each is a Bernoulli random variable which is with probability - dropout.- Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- num_layers – Number of recurrent layers. E.g., setting - num_layers=2would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results. Default: 1
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
- batch_first – If - True, then the input and output tensors are provided as (batch, seq, feature). Default:- False
- dropout – If non-zero, introduces a Dropout layer on the outputs of each GRU layer except the last layer, with dropout probability equal to - dropout. Default: 0
- bidirectional – If - True, becomes a bidirectional GRU. Default:- False
 
 - Inputs: input, h_0
- input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See - torch.nn.utils.rnn.pack_padded_sequence()for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1. 
 
- Outputs: output, h_n
- output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t. If a - torch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using- output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively.- Similarly, the directions can be separated in the packed case. 
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len - Like output, the layers can be separated using - h_n.view(num_layers, num_directions, batch, hidden_size).
 
- Shape:
- Input1: tensor containing input features where and L represents a sequence length. 
- Input2: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. where If the RNN is bidirectional, num_directions should be 2, else it should be 1. 
- Output1: where 
- Output2: tensor containing the next hidden state for each element in the batch 
 
 - Variables
- ~GRU.weight_ih_l[k] – the learnable input-hidden weights of the layer (W_ir|W_iz|W_in), of shape (3*hidden_size, input_size) for k = 0. Otherwise, the shape is (3*hidden_size, num_directions * hidden_size) 
- ~GRU.weight_hh_l[k] – the learnable hidden-hidden weights of the layer (W_hr|W_hz|W_hn), of shape (3*hidden_size, hidden_size) 
- ~GRU.bias_ih_l[k] – the learnable input-hidden bias of the layer (b_ir|b_iz|b_in), of shape (3*hidden_size) 
- ~GRU.bias_hh_l[k] – the learnable hidden-hidden bias of the layer (b_hr|b_hz|b_hn), of shape (3*hidden_size) 
 
 - Note - All the weights and biases are initialized from where - Orphan
 - Note - If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype - torch.float164) V100 GPU is used, 5) input data is not in- PackedSequenceformat persistent algorithm can be selected to improve performance.- Examples: - >>> rnn = nn.GRU(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0)