Having discussed the theoretical aspects of various activation functions like Sigmoid, Tanh, ReLU, and its variants, let's solidify our understanding by implementing them in Python and visualizing their behavior. Seeing these functions plotted helps grasp their properties, such as output ranges, saturation points, and non-linearity, which are significant considerations when designing neural network layers.

We'll use NumPy for the mathematical computations and Plotly for creating an interactive visualization.

First, let's import the necessary library:

import numpy as np

Now, we define the activation functions based on their mathematical formulas:

# Sigmoid function
def sigmoid(x):
  """Computes the Sigmoid activation."""
  return 1 / (1 + np.exp(-x))

# Hyperbolic Tangent (Tanh) function
def tanh(x):
  """Computes the Tanh activation."""
  return np.tanh(x)

# Rectified Linear Unit (ReLU) function
def relu(x):
  """Computes the ReLU activation."""
  return np.maximum(0, x)

# Leaky Rectified Linear Unit (Leaky ReLU) function
def leaky_relu(x, alpha=0.01):
  """Computes the Leaky ReLU activation."""
  return np.maximum(alpha * x, x)

Next, we'll generate a range of input values, typically centered around zero, to see how the functions behave across different inputs.

# Generate input values from -5 to 5
x = np.linspace(-5, 5, 100)

# Calculate the output of each activation function
y_sigmoid = sigmoid(x)
y_tanh = tanh(x)
y_relu = relu(x)
y_leaky_relu = leaky_relu(x)

With the input values and corresponding outputs for each function calculated, we can now plot them. This visualization allows for a direct comparison of their shapes and characteristics.

Comparison of Sigmoid, Tanh, ReLU, and Leaky ReLU activation functions. Note the different output ranges: Sigmoid (0 to 1), Tanh (-1 to 1), ReLU (0 to infinity), and Leaky ReLU (-infinity to infinity, with a small slope for negative inputs).

Observing the plot, we can clearly see the characteristics discussed earlier:

Sigmoid and Tanh are S-shaped and saturate (flatten out) for large positive or negative inputs. Tanh is zero-centered, which is often beneficial during training compared to Sigmoid.
ReLU is linear for positive inputs and zero for negative inputs. This simplicity makes it computationally efficient but can lead to "dying ReLUs" where neurons get stuck outputting zero.
Leaky ReLU addresses the dying ReLU problem by allowing a small, non-zero gradient for negative inputs, visible as a slight negative slope in the plot.

This practical implementation helps connect the mathematical definitions of activation functions to their actual behavior. Understanding these differences is fundamental when deciding which activation function to use in the hidden layers and output layer of your neural network, depending on the task (e.g., binary classification often uses Sigmoid in the output, multi-class uses Softmax, regression uses linear, and hidden layers frequently use ReLU or its variants).

Was this section helpful?