In the previous chapter, we established the concept of Multi-Layer Perceptrons (MLPs) as a way to overcome the limitations of single-layer models. This chapter focuses on the internal mechanisms and structural design choices that define how these networks operate and learn.

You will learn about activation functions, the non-linear components within neurons that enable MLPs to model complex relationships. We will cover standard functions such as Sigmoid, Tanh, and ReLU ( $f(x) = \max(0, x)$ ), examining their mathematical properties, benefits, and drawbacks. Understanding these functions is key to controlling the flow of information and gradients within the network.

Additionally, we will examine the structure of feedforward networks, clarifying the roles of input, hidden, and output layers. We will discuss considerations for designing network architectures, such as selecting the number of layers and units per layer, providing a foundation for building effective models. Practical examples will illustrate how to implement and compare different activation functions.

Chapter 2: Activation Functions and Network Architecture

Sections