Activation functions are the non-linear gateways in your neural networks, enabling them to learn complex patterns. If you're familiar with using activations in Keras, you'll find that PyTorch offers similar capabilities, though with its own conventions for application. This section explores how activation functions are implemented and used in PyTorch, drawing comparisons to their Keras counterparts.
In Keras, you typically specify activation functions in one of two ways:
Dense(units=64, activation='relu')
.tf.keras.layers.ReLU()
.PyTorch provides two main ways to apply activation functions, both of which are commonly used:
torch.nn.functional
): Most activation functions are available as simple functions within the torch.nn.functional
module (often imported as F
). These functions take a tensor as input and return the activated tensor. This approach is common within the forward
method of custom nn.Module
classes because activations are generally stateless operations.torch.nn
): Many activation functions also have corresponding nn.Module
classes (e.g., nn.ReLU()
, nn.Sigmoid()
). These can be instantiated and added to your network like any other layer, which is particularly useful when constructing models with nn.Sequential
or when an activation function might have learnable parameters (like nn.PReLU
).Let's look at some common activation functions and how their usage compares.
Here's a comparative overview of widely used activation functions:
The ReLU function is a popular choice due to its simplicity and effectiveness in combating the vanishing gradient problem. It's defined as:
ReLU(x)=max(0,x)Keras:
activation='relu'
in a layer.tf.keras.layers.ReLU()
PyTorch:
torch.nn.functional.relu(input_tensor)
nn.ReLU()
import torch
import torch.nn as nn
import torch.nn.functional as F
# Example data
x = torch.randn(2, 3) # Batch of 2, 3 features each
# Functional ReLU
output_functional = F.relu(x)
# Module ReLU
relu_module = nn.ReLU()
output_module = relu_module(x)
print("Input:\n", x)
print("Functional ReLU output:\n", output_functional)
print("Module ReLU output:\n", output_module)
The Sigmoid function squashes values to a range between 0 and 1. It's often used in the output layer for binary classification problems. Its formula is:
Sigmoid(x)=σ(x)=1+e−x1activation='sigmoid'
tf.keras.layers.Activation('sigmoid')
or tf.keras.activations.sigmoid
torch.sigmoid(input_tensor)
or F.sigmoid(input_tensor)
nn.Sigmoid()
Tanh squashes values to a range between -1 and 1. It's often preferred over Sigmoid for hidden layers as its outputs are zero-centered. The formula is:
Tanh(x)=ex+e−xex−e−xactivation='tanh'
tf.keras.layers.Activation('tanh')
or tf.keras.activations.tanh
torch.tanh(input_tensor)
or F.tanh(input_tensor)
nn.Tanh()
The Softmax function is typically used in the output layer for multi-class classification problems. It converts a vector of raw scores (logits) into a probability distribution. For a vector x=(x1,x2,…,xJ), the Softmax of xi is:
Softmax(xi)=∑j=1JexjexiKeras:
activation='softmax'
tf.keras.layers.Softmax()
or tf.keras.activations.softmax
PyTorch:
F.softmax(input_tensor, dim=...)
nn.Softmax(dim=...)
dim
argument for softmax
. For typical classification tasks where your input is (batch_size, num_classes)
, you'll use dim=1
to apply Softmax across the num_classes
dimension.# Example: Softmax in PyTorch
logits = torch.randn(2, 5) # Batch of 2, 5 classes
# Functional Softmax
probs_functional = F.softmax(logits, dim=1)
# Module Softmax
softmax_module = nn.Softmax(dim=1)
probs_module = softmax_module(logits)
print("Logits:\n", logits)
print("Probabilities (Functional):\n", probs_functional)
print("Probabilities (Module):\n", probs_module)
print("Sum of probabilities per sample (should be 1):\n", probs_module.sum(dim=1))
LeakyReLU is a variant of ReLU that allows a small, non-zero gradient when the unit is not active, helping to mitigate the "dying ReLU" problem. It's defined as:
LeakyReLU(x)={xnegative_slope×xif x>0if x≤0The negative_slope
is a small constant, typically 0.01 by default.
tf.keras.layers.LeakyReLU(alpha=0.01)
(where alpha
is the negative slope)F.leaky_relu(input_tensor, negative_slope=0.01)
nn.LeakyReLU(negative_slope=0.01)
The chart below visualizes some of these common activation functions: ReLU, Sigmoid, Tanh, and LeakyReLU (with a negative slope of 0.1 for better visibility).
Visualization of ReLU, Sigmoid, Tanh, and LeakyReLU activation functions.
Let's see how these are integrated when building a simple model, comparing Keras and PyTorch approaches.
Keras Sequential Model:
# TensorFlow/Keras
import tensorflow as tf
keras_model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
keras_model.summary()
PyTorch nn.Module
:
When defining a custom nn.Module
, you'll typically use the functional versions of activations within the forward
method.
# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
class PyTorchCustomModel(nn.Module):
def __init__(self, input_features, num_classes):
super().__init__()
self.fc1 = nn.Linear(input_features, 128)
self.dropout = nn.Dropout(0.2)
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
x = self.fc1(x)
x = F.relu(x) # Functional ReLU
x = self.dropout(x)
x = self.fc2(x)
x = F.softmax(x, dim=1) # Functional Softmax with dimension specified
return x
# Instantiate the model
pytorch_model_custom = PyTorchCustomModel(input_features=784, num_classes=10)
print(pytorch_model_custom)
PyTorch nn.Sequential
:
If you're using nn.Sequential
, you'll use the module versions of activation functions.
# PyTorch using nn.Sequential
pytorch_model_sequential = nn.Sequential(
nn.Linear(784, 128),
nn.ReLU(), # Module ReLU
nn.Dropout(0.2),
nn.Linear(128, 10),
nn.Softmax(dim=1) # Module Softmax with dimension specified
)
print(pytorch_model_sequential)
The choice between torch.nn.functional
and nn.Module
for activations often comes down to coding style and specific needs:
torch.nn.functional
(e.g., F.relu
):
PReLU
has learnable parameters). Using the functional form can be slightly more concise inside the forward
method of a custom nn.Module
.forward
pass.nn.Module
(e.g., nn.ReLU()
):
__init__
method, this provides a uniform structure.nn.Sequential
: Required when building models with nn.Sequential
, as it expects nn.Module
instances.nn.PReLU
), you must use the module version.Many developers prefer F.relu
and similar functional calls for common, stateless activations within their forward
methods, as it keeps the __init__
cleaner by not having to define an attribute for every activation. However, using nn.ReLU()
is perfectly valid and sometimes preferred for clarity or when using nn.Sequential
.
Some PyTorch activation function modules (like nn.ReLU
) offer an inplace=True
option:
relu_inplace = nn.ReLU(inplace=True)
# or, for functional (though less common for direct functional use)
# x = F.relu_(x) # Note the underscore for in-place functional versions
Using inplace=True
modifies the input tensor directly, which can save memory by avoiding the allocation of a new tensor for the output. However, this should be used with caution:
Generally, unless memory is extremely constrained and you understand the implications, it's safer to use out-of-place operations (the default inplace=False
).
Transitioning from Keras, you'll find that PyTorch offers all the activation functions you're used to. The primary adjustment is understanding where and how to apply them, whether through torch.nn.functional
for direct application or as nn.Module
instances within your network architecture, always remembering to specify dim
for functions like softmax
.
© 2025 ApX Machine Learning