Matrix Multiplication: The Engine of Neural Nets
1. Introduction
If you open the source code of any Deep Learning library (PyTorch, TensorFlow), 90% of the compute time is spent on one operation: Matrix Multiplication (GEMM - General Matrix Multiply).
Why? Because a Neural Network layer is just a matrix multiplication followed by an activation function:
2. The Dot Product
The fundamental building block is the Dot Product (or Scalar Product) of two vectors. It returns a single number.
a ċ b = ∑ aibi = a1b1 + a2b2 + … + anbn
Geometric Interpretation
a ċ b = ||a|| ||b|| cos(θ)
- If vectors point in the same direction, dot product is Positive (High Similarity).
- If vectors are perpendicular (90°), dot product is Zero (Orthogonal/Unrelated).
- If vectors point in opposite directions, dot product is Negative.
[!TIP] ML Application: In Recommendation Systems, if User Vector u and Movie Vector m have a high dot product, the user will likely enjoy the movie. This is the basis of Cosine Similarity.
3. Matrix-Vector Multiplication (Ax)
When we multiply a matrix A by a vector x, we are transforming the vector x. Ax = b
The matrix A acts as a function f(x). It can rotate, scale, or skew the vector space.
| 1 | 0 |
| 0 | 2 |
| 1 |
| 1 |
| 1 |
| 2 |
(This matrix stretched the y-axis by 2).
4. Matrix-Matrix Multiplication (AB)
Multiplying two matrices is just applying two transformations in sequence (Composition). C = AB
Calculating Cij involves taking the dot product of Row i of A and Column j of B.
Rule: Inner dimensions must match! (m × n) ċ (n × p) → (m × p)
[!WARNING] Order Matters! unlike scalar multiplication (2 × 3 = 3 × 2), Matrix Multiplication is not commutative. AB ≠ BA. Applying a Rotation then a Shear is different from a Shear then a Rotation.
5. Interactive Visualizer: The Linear Transformer
Modify the 2x2 Matrix M to see how it transforms the grid space. The basis vectors i (Red) and j (Green) show where the x and y axes land.
6. Summary
- Dot Product: Measures similarity between vectors.
- Matrix-Vector: Transforms a vector (Scale, Rotate, Skew).
- Matrix-Matrix: Combines multiple transformations.