Matrix Multiplication: The Engine of Neural Nets

[!NOTE] This module explores the core principles of Matrix Multiplication: The Engine of Neural Nets, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction

If you open the source code of any Deep Learning library (PyTorch, TensorFlow), 90% of the compute time is spent on one operation: Matrix Multiplication (GEMM - General Matrix Multiply).

Why? Because a Neural Network layer is just a matrix multiplication followed by an activation function:

output = σ(W ċ x + b)

2. The Dot Product

The fundamental building block is the Dot Product (or Scalar Product) of two vectors. It returns a single number.

a &cdot; b = ∑ a_ib_i = a₁b₁ + a₂b₂ + … + a_nb_n

Geometric Interpretation

a &cdot; b = ||a|| ||b|| cos(θ)

If vectors point in the same direction, dot product is Positive (High Similarity).
If vectors are perpendicular (90°), dot product is Zero (Orthogonal/Unrelated).
If vectors point in opposite directions, dot product is Negative.

[!TIP] ML Application: In Recommendation Systems, if User Vector u and Movie Vector m have a high dot product, the user will likely enjoy the movie. This is the basis of Cosine Similarity.

Python Implementation

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Dot Product
dot_prod = np.dot(a, b)  # 1*4 + 2*5 + 3*6 = 32
# or
dot_prod = a @ b

print(dot_prod)

3. Matrix-Vector Multiplication (Ax)

When we multiply a matrix A by a vector x, we are transforming the vector x. Ax = b

The matrix A acts as a function f(x). It can rotate, scale, or skew the vector space.

(This matrix stretched the y-axis by 2).

4. Matrix-Matrix Multiplication (AB)

Multiplying two matrices is just applying two transformations in sequence (Composition). C = AB

Calculating C_ij involves taking the dot product of Row i of A and Column j of B.

Rule: Inner dimensions must match! (m × n) &cdot; (n × p) → (m × p)

[!WARNING] Order Matters! Unlike scalar multiplication (2 × 3 = 3 × 2), Matrix Multiplication is not commutative. AB ≠ BA. Applying a Rotation then a Shear is different from a Shear then a Rotation.

Python Implementation

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix Multiplication
C = np.matmul(A, B)
# or
C = A @ B

print(C)
# [[19 22]
#  [43 50]]

5. Interactive Visualizer: The Linear Transformer

Modify the 2x2 Matrix M to see how it transforms the grid space. The basis vectors i (Red) and j (Green) show where the x and y axes land.

M =

The columns of M tell us where i and j land.

6. Summary

Dot Product: Measures similarity between vectors.
Matrix-Vector: Transforms a vector (Scale, Rotate, Skew).
Matrix-Matrix: Combines multiple transformations.