Having explored the Sigmoid function, we now turn our attention to another popular S-shaped activation function: the Hyperbolic Tangent, commonly known as Tanh. Like Sigmoid, Tanh introduces non-linearity, allowing networks to learn complex patterns beyond simple linear relationships. However, Tanh offers a distinct advantage in certain scenarios due to its output range.
Mathematically, the Tanh function is defined as:
tanh(x)=ex+e−xex−e−xInterestingly, Tanh is closely related to the Sigmoid function. It can be expressed as a scaled and shifted version of the Sigmoid:
tanh(x)=2⋅sigmoid(2x)−1Let's examine the key properties of the Tanh function:
Here's a visualization of the Tanh function and its derivative:
The Tanh function maps inputs to the range (-1, 1) and is zero-centered. Its derivative peaks at 1 when the input is 0 and approaches 0 as the input magnitude increases.
Advantages:
Disadvantages:
Historically, Tanh was often preferred over Sigmoid for hidden layers precisely because of its zero-centered output. It was commonly used in feedforward networks and particularly in certain types of Recurrent Neural Networks (RNNs).
However, with the rise of ReLU and its variants (which we will discuss next), Tanh is used less frequently as the default choice for hidden layers in standard feedforward networks and Convolutional Neural Networks (CNNs). It still finds application in specific contexts, particularly within the gating mechanisms of LSTMs and GRUs (advanced types of RNNs), where its (-1, 1) range is beneficial.
Here's how you might use Tanh in a PyTorch layer:
import torch
import torch.nn as nn
# Example input tensor
input_tensor = torch.randn(5, 10) # Batch of 5, 10 features each
# Define a layer with Tanh activation
# Option 1: Using nn.Tanh() as a module
tanh_activation = nn.Tanh()
output_tensor = tanh_activation(input_tensor)
# Option 2: Using torch.tanh directly (functional approach)
output_tensor_functional = torch.tanh(input_tensor)
# Define a linear layer followed by Tanh
linear_layer = nn.Linear(in_features=10, out_features=20)
activated_output = torch.tanh(linear_layer(input_tensor))
print("Input shape:", input_tensor.shape)
print("Output shape (Module):", output_tensor.shape)
print("Output shape (Functional):", output_tensor_functional.shape)
print("Output shape (Linear + Tanh):", activated_output.shape)
print("\nSample Output (first element, first 5 features):\n", activated_output[0, :5])
While Tanh addresses the non-zero-centered issue of Sigmoid, it doesn't solve the vanishing gradient problem inherent in saturating functions. This limitation paved the way for the development and widespread adoption of ReLU and its variants, which we will explore in the next section.
© 2025 ApX Machine Learning