In the previous sections, you learned how to stack layers using Keras's Sequential and Functional APIs. However, simply stacking layers like Dense
often isn't enough. If we only used layers that perform linear operations (like matrix multiplication followed by adding a bias), stacking them would still result in a linear function overall. A composition of linear functions is just another linear function. To model complex, real-world patterns, neural networks need to introduce non-linearity. This is where activation functions come into play.
An activation function is applied element-wise to the output of a layer (often referred to as the pre-activation or logits), transforming it before it's passed to the next layer. This non-linear transformation allows the network to learn much more complex mappings between inputs and outputs.
Consider a simple network with only linear layers. Each layer computes output=W⋅input+b, where W is the weight matrix and b is the bias vector. If you stack two such layers, the output becomes:
output2=W2⋅(W1⋅input+b1)+b2=(W2W1)⋅input+(W2b1+b2)
This is still in the form W′⋅input+b′, which is a linear transformation. No matter how many linear layers you stack, the network can only represent linear relationships. Activation functions break this linearity, enabling networks to approximate arbitrarily complex functions.
Keras provides several built-in activation functions. Let's look at some of the most frequently used ones.
The Rectified Linear Unit, or ReLU, is one of the most popular activation functions in deep learning, especially for hidden layers. It's computationally efficient and generally performs well.
Its definition is simple: it returns the input directly if the input is positive, and returns zero otherwise.
f(x)=max(0,x)ReLU is often the default choice for hidden layers in feedforward and convolutional neural networks.
The sigmoid function squashes its input into the range (0, 1).
f(x)=σ(x)=1+e−x1Sigmoid is primarily used in the output layer of a binary classification model, where the output needs to be interpreted as a probability. It's less common in hidden layers nowadays due to the prevalence of ReLU and its variants.
The softmax function is a generalization of the sigmoid function used for multi-class classification problems. It takes a vector of arbitrary real-valued scores (logits) as input and transforms them into a vector of values between 0 and 1 that sum to 1. These outputs can be interpreted as probabilities for each class.
For an input vector x=[x1,x2,...,xN], the softmax output for the i-th element is:
f(x)i=∑j=1NexjexiSoftmax is the standard activation function for the final layer in a multi-class classification network.
The hyperbolic tangent, or tanh, function squashes its input into the range (-1, 1).
f(x)=tanh(x)=ex+e−xex−e−xTanh was previously popular for hidden layers but has largely been replaced by ReLU and its variants. It is still sometimes used, particularly in recurrent neural networks (RNNs), though modern RNN architectures often use other gating mechanisms.
Visualization of ReLU, Sigmoid, and Tanh activation functions. Note their different output ranges and shapes, especially around x=0.
The choice of activation function depends on the layer's position and the specific task:
You can specify the activation function for most Keras layers using the activation
argument:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Using the activation argument within a Dense layer
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(784,)), # ReLU for hidden layer
layers.Dense(10, activation='softmax') # Softmax for output layer (multi-class)
])
# Equivalent using an Activation layer explicitly
model_explicit = keras.Sequential([
layers.Dense(64, input_shape=(784,)),
layers.Activation('relu'), # Apply ReLU separately
layers.Dense(10),
layers.Activation('softmax') # Apply Softmax separately
])
# You can also pass the function object directly
model_object = keras.Sequential([
layers.Dense(64, activation=tf.nn.relu, input_shape=(784,)),
layers.Dense(10, activation=tf.nn.softmax)
])
model.summary()
Keras recognizes activation functions by their string names (e.g., 'relu'
, 'sigmoid'
, 'softmax'
, 'tanh'
, 'linear'
). Using the activation
argument is the most common and concise way. The separate layers.Activation
layer provides flexibility, especially when using the Functional API or custom architectures where you might want to apply an activation independently.
Understanding activation functions is fundamental to building effective neural networks. They are the key components that introduce the necessary non-linearity, allowing models to learn complex patterns beyond simple linear relationships. As you build more sophisticated models using the Functional API or custom layers, you'll see how strategically placing these non-linear transformations enables powerful computations.
© 2025 ApX Machine Learning