Multi-Layer Networks

[!NOTE] A single neuron is limited to linear decisions. But when we connect them together, we get a Multi-Layer Perceptron (MLP) capable of learning any complex function. This is the birth of Deep Learning.

1. From Perceptron to MLP

To solve the XOR problem, we need to combine multiple decision boundaries. We can do this by adding a hidden layer between the input and output.

Input Layer: Receives raw data.
Hidden Layer(s): Transforms the input into a new representation.
Output Layer: Makes the final prediction.

2. Universal Approximation Theorem

This theorem states that a feedforward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rⁿ, under mild assumptions on the activation function.

In simple terms: Neural Networks are universal function approximators.

3. Interactive MLP Visualizer (XOR Solver)

This network tries to solve the XOR problem.

Input: x, y coordinates.
Hidden Layer: 2 Neurons (h1, h2) with Tanh activation.
Output Layer: 1 Neuron with Sigmoid activation.

Click “Randomize Weights” to see different decision boundaries. Can you find a configuration that separates the corners (XOR pattern)?

Status: Unsolved

4. Forward Propagation

The process of calculating the output from input is called Forward Propagation.

Input Layer: x
Hidden Layer Calculation:
- z[1] = W[1]x + b[1]
- a[1] = g[1](z[1]) (where g is activation, e.g., ReLU)
Output Layer Calculation:
- z[2] = W[2]a[1] + b[2]
- ŷ = a[2] = g[2](z[2]) (e.g., Sigmoid)

5. Implementation in Python

Here is a 2-layer neural network using NumPy, doing one forward pass.

import numpy as np

class NeuralNetwork:
  def __init__(self):
    # Weights (random init)
    self.W1 = np.random.randn(2, 2) # 2 inputs -> 2 hidden
    self.b1 = np.zeros((1, 2))
    self.W2 = np.random.randn(2, 1) # 2 hidden -> 1 output
    self.b2 = np.zeros((1, 1))

  def sigmoid(self, x):
    return 1 / (1 + np.exp(-x))

  def forward(self, X):
    # Layer 1
    self.z1 = np.dot(X, self.W1) + self.b1
    self.a1 = self.sigmoid(self.z1)

    # Layer 2
    self.z2 = np.dot(self.a1, self.W2) + self.b2
    self.a2 = self.sigmoid(self.z2)
    return self.a2

# XOR Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
nn = NeuralNetwork()
output = nn.forward(X)
print("Predictions:\n", output)

6. What’s Next?

We know how to calculate predictions (Forward Prop). But how do we find the correct weights? That’s where Backpropagation comes in, which we will cover in the next module.