Activation functions are the non-linear gateways in your neural networks, enabling them to learn complex patterns. If you are familiar with using activations in Keras, you will find that PyTorch offers similar capabilities, though with its own conventions for application. Activation functions are implemented and used in PyTorch, and comparisons are drawn to their Keras counterparts.
In Keras, you typically specify activation functions in one of two ways:
Dense(units=64, activation='relu').tf.keras.layers.ReLU().PyTorch provides two main ways to apply activation functions, both of which are commonly used:
torch.nn.functional): Most activation functions are available as simple functions within the torch.nn.functional module (often imported as F). These functions take a tensor as input and return the activated tensor. This approach is common within the forward method of custom nn.Module classes because activations are generally stateless operations.torch.nn): Many activation functions also have corresponding nn.Module classes (e.g., nn.ReLU(), nn.Sigmoid()). These can be instantiated and added to your network like any other layer, which is particularly useful when constructing models with nn.Sequential or when an activation function might have learnable parameters (like nn.PReLU).Let's look at some common activation functions and how their usage compares.
Here's a comparative overview of widely used activation functions:
The ReLU function is a popular choice due to its simplicity and effectiveness in combating the vanishing gradient problem. It's defined as:
Keras:
activation='relu' in a layer.tf.keras.layers.ReLU()PyTorch:
torch.nn.functional.relu(input_tensor)nn.ReLU()import torch
import torch.nn as nn
import torch.nn.functional as F
# Example data
x = torch.randn(2, 3) # Batch of 2, 3 features each
# Functional ReLU
output_functional = F.relu(x)
# Module ReLU
relu_module = nn.ReLU()
output_module = relu_module(x)
print("Input:\n", x)
print("Functional ReLU output:\n", output_functional)
print("Module ReLU output:\n", output_module)
The Sigmoid function squashes values to a range between 0 and 1. It's often used in the output layer for binary classification problems. Its formula is:
activation='sigmoid'tf.keras.layers.Activation('sigmoid') or tf.keras.activations.sigmoidtorch.sigmoid(input_tensor) or F.sigmoid(input_tensor)nn.Sigmoid()Tanh squashes values to a range between -1 and 1. It's often preferred over Sigmoid for hidden layers as its outputs are zero-centered. The formula is:
activation='tanh'tf.keras.layers.Activation('tanh') or tf.keras.activations.tanhtorch.tanh(input_tensor) or F.tanh(input_tensor)nn.Tanh()The Softmax function is typically used in the output layer for multi-class classification problems. It converts a vector of raw scores (logits) into a probability distribution. For a vector , the Softmax of is:
Keras:
activation='softmax'tf.keras.layers.Softmax() or tf.keras.activations.softmaxPyTorch:
F.softmax(input_tensor, dim=...)nn.Softmax(dim=...)dim argument for softmax. For typical classification tasks where your input is (batch_size, num_classes), you'll use dim=1 to apply Softmax across the num_classes dimension.# Example: Softmax in PyTorch
logits = torch.randn(2, 5) # Batch of 2, 5 classes
# Functional Softmax
probs_functional = F.softmax(logits, dim=1)
# Module Softmax
softmax_module = nn.Softmax(dim=1)
probs_module = softmax_module(logits)
print("Logits:\n", logits)
print("Probabilities (Functional):\n", probs_functional)
print("Probabilities (Module):\n", probs_module)
print("Sum of probabilities per sample (should be 1):\n", probs_module.sum(dim=1))
LeakyReLU is a variant of ReLU that allows a small, non-zero gradient when the unit is not active, helping to mitigate the "dying ReLU" problem. It's defined as:
The negative_slope is a small constant, typically 0.01 by default.
tf.keras.layers.LeakyReLU(alpha=0.01) (where alpha is the negative slope)F.leaky_relu(input_tensor, negative_slope=0.01)nn.LeakyReLU(negative_slope=0.01)The chart below visualizes some of these common activation functions: ReLU, Sigmoid, Tanh, and LeakyReLU (with a negative slope of 0.1 for better visibility).
Visualization of ReLU, Sigmoid, Tanh, and LeakyReLU activation functions.
Let's see how these are integrated when building a simple model, comparing Keras and PyTorch approaches.
Keras Sequential Model:
# TensorFlow/Keras
import tensorflow as tf
keras_model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
keras_model.summary()
PyTorch nn.Module:
When defining a custom nn.Module, you'll typically use the functional versions of activations within the forward method.
# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
class PyTorchCustomModel(nn.Module):
def __init__(self, input_features, num_classes):
super().__init__()
self.fc1 = nn.Linear(input_features, 128)
self.dropout = nn.Dropout(0.2)
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
x = self.fc1(x)
x = F.relu(x) # Functional ReLU
x = self.dropout(x)
x = self.fc2(x)
x = F.softmax(x, dim=1) # Functional Softmax with dimension specified
return x
# Instantiate the model
pytorch_model_custom = PyTorchCustomModel(input_features=784, num_classes=10)
print(pytorch_model_custom)
PyTorch nn.Sequential:
If you're using nn.Sequential, you'll use the module versions of activation functions.
# PyTorch using nn.Sequential
pytorch_model_sequential = nn.Sequential(
nn.Linear(784, 128),
nn.ReLU(), # Module ReLU
nn.Dropout(0.2),
nn.Linear(128, 10),
nn.Softmax(dim=1) # Module Softmax with dimension specified
)
print(pytorch_model_sequential)
The choice between torch.nn.functional and nn.Module for activations often comes down to coding style and specific needs:
torch.nn.functional (e.g., F.relu):
PReLU has learnable parameters). Using the functional form can be slightly more concise inside the forward method of a custom nn.Module.forward pass.nn.Module (e.g., nn.ReLU()):
__init__ method, this provides a uniform structure.nn.Sequential: Required when building models with nn.Sequential, as it expects nn.Module instances.nn.PReLU), you must use the module version.Many developers prefer F.relu and similar functional calls for common, stateless activations within their forward methods, as it keeps the __init__ cleaner by not having to define an attribute for every activation. However, using nn.ReLU() is perfectly valid and sometimes preferred for clarity or when using nn.Sequential.
Some PyTorch activation function modules (like nn.ReLU) offer an inplace=True option:
relu_inplace = nn.ReLU(inplace=True)
# or, for functional (though less common for direct functional use)
# x = F.relu_(x) # Note the underscore for in-place functional versions
Using inplace=True modifies the input tensor directly, which can save memory by avoiding the allocation of a new tensor for the output. However, this should be used with caution:
Generally, unless memory is extremely constrained and you understand the implications, it's safer to use out-of-place operations (the default inplace=False).
Transitioning from Keras, you'll find that PyTorch offers all the activation functions you're used to. The primary adjustment is understanding where and how to apply them, whether through torch.nn.functional for direct application or as nn.Module instances within your network architecture, always remembering to specify dim for functions like softmax.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with