Matrix Multiplication: The Engine of Neural Nets

1. Introduction

If you open the source code of any Deep Learning library (PyTorch, TensorFlow), 90% of the compute time is spent on one operation: Matrix Multiplication (GEMM - General Matrix Multiply).

Why? Because a Neural Network layer is just a matrix multiplication followed by an activation function:

output = σ(W ċ x + b)

2. The Dot Product

The fundamental building block is the Dot Product (or Scalar Product) of two vectors. It returns a single number.

a &cdot; b = ∑ a_ib_i = a₁b₁ + a₂b₂ + … + a_nb_n

Geometric Interpretation

a &cdot; b = ||a|| ||b|| cos(θ)

If vectors point in the same direction, dot product is Positive (High Similarity).
If vectors are perpendicular (90°), dot product is Zero (Orthogonal/Unrelated).
If vectors point in opposite directions, dot product is Negative.

[!TIP] ML Application: In Recommendation Systems, if User Vector u and Movie Vector m have a high dot product, the user will likely enjoy the movie. This is the basis of Cosine Similarity.

3. Matrix-Vector Multiplication (Ax)

When we multiply a matrix A by a vector x, we are transforming the vector x. Ax = b

The matrix A acts as a function f(x). It can rotate, scale, or skew the vector space.

(This matrix stretched the y-axis by 2).

4. Matrix-Matrix Multiplication (AB)

Multiplying two matrices is just applying two transformations in sequence (Composition). C = AB

Calculating C_ij involves taking the dot product of Row i of A and Column j of B.

Rule: Inner dimensions must match! (m × n) &cdot; (n × p) → (m × p)

[!WARNING] Order Matters! unlike scalar multiplication (2 × 3 = 3 × 2), Matrix Multiplication is not commutative. AB ≠ BA. Applying a Rotation then a Shear is different from a Shear then a Rotation.

5. Interactive Visualizer: The Linear Transformer

Modify the 2x2 Matrix M to see how it transforms the grid space. The basis vectors i (Red) and j (Green) show where the x and y axes land.

M =

The columns of M tell us where i and j land.

6. Summary

Dot Product: Measures similarity between vectors.
Matrix-Vector: Transforms a vector (Scale, Rotate, Skew).
Matrix-Matrix: Combines multiple transformations.

Next: Systems of Equations →