The Perceptron

[!NOTE] The Perceptron is the “Hello World” of Deep Learning—a single neuron that can learn to classify linearly separable data.

1. Introduction

The Perceptron was invented in 1957 by Frank Rosenblatt. It is a linear binary classifier, meaning it makes decisions by drawing a straight line (or hyperplane) to separate two classes of data.

Conceptually, it mimics a biological neuron:

  1. Dendrites receive input signals.
  2. Cell Body sums the inputs.
  3. Axon transmits the output signal if the sum exceeds a threshold.

2. Anatomy of a Perceptron

Mathematically, a Perceptron consists of:

  • Inputs (x): The features of the data (e.g., pixel intensity, house size).
  • Weights (w): The importance of each input.
  • Bias (b): An offset that shifts the decision boundary.
  • Weighted Sum (z): The linear combination of inputs and weights.
  • z = w<sub>1</sub>x<sub>1</sub> + w<sub>2</sub>x<sub>2</sub> + ... + w<sub>n</sub>x<sub>n</sub> + b
  • z = w · x + b (Vector notation)
  • Activation Function: A step function that determines the output.
  • output = 1 if z > 0
  • output = 0 otherwise

Interactive Perceptron Visualizer

Adjust the weights (w<sub>1</sub>, w<sub>2</sub>) and bias (b) to see how the decision boundary (the red line) changes. Try to separate the blue dots (Class 1) from the orange dots (Class 0).

Accuracy: 0%

3. The Perceptron Learning Algorithm

How does a Perceptron “learn”? It iteratively adjusts its weights to minimize classification errors.

The update rule is:

w ← w + α(y - ŷ)x

b ← b + α(y - ŷ)

Where:

  • α (alpha) is the learning rate (e.g., 0.01).
  • y is the true label (0 or 1).
  • ŷ (y-hat) is the predicted label (0 or 1).
  • (y - ŷ) is the error term.

If the prediction is correct (y = ŷ), the error is 0, and weights don’t change. If y=1 and ŷ=0, weights are increased to make z larger. If y=0 and ŷ=1, weights are decreased to make z smaller.

4. Implementation in Python

Here is a clean, production-ready implementation using NumPy.

import numpy as np

class Perceptron:
  def __init__(self, learning_rate=0.01, n_iters=1000):
    self.lr = learning_rate
    self.n_iters = n_iters
    self.weights = None
    self.bias = None

  def fit(self, X, y):
    n_samples, n_features = X.shape

    # Initialize weights and bias
    self.weights = np.zeros(n_features)
    self.bias = 0

    for _ in range(self.n_iters):
      for idx, x_i in enumerate(X):
        # Linear combination
        linear_output = np.dot(x_i, self.weights) + self.bias

        # Step function activation
        y_predicted = 1 if linear_output > 0 else 0

        # Perceptron update rule
        update = self.lr * (y[idx] - y_predicted)
        self.weights += update * x_i
        self.bias += update

  def predict(self, X):
    linear_output = np.dot(X, self.weights) + self.bias
    y_predicted = np.where(linear_output > 0, 1, 0)
    return y_predicted

# Usage
if __name__ == "__main__":
  X = np.array([[1, 1], [1, 0], [0, 1], [0, 0]])
  y = np.array([1, 1, 1, 0]) # OR gate logic

  p = Perceptron()
  p.fit(X, y)
  print(p.predict(X)) # Output: [1 1 1 0]

5. Limitations: The XOR Problem

In 1969, Marvin Minsky and Seymour Papert published the book Perceptrons, where they proved a devastating limitation: A single Perceptron can only solve linearly separable problems.

It cannot solve the XOR (Exclusive OR) problem because there is no single straight line that can separate the classes (0,0) → 0 and (1,1) → 0 from (0,1) → 1 and (1,0) → 1.

[!IMPORTANT] This limitation led to the first “AI Winter,” where funding for neural network research dried up for years. The solution was to stack multiple perceptrons together, creating Multi-Layer Perceptrons (MLPs), and adding non-linear activation functions.

6. Summary

  • Perceptron: A single-layer binary linear classifier.
  • Learning: Adjusts weights based on error direction.
  • Limitation: Cannot solve non-linear problems like XOR.
  • Solution: Deep Learning (Multi-Layer Networks).