Recommendation Engines
[!NOTE] This module explores the core principles of Recommendation Engines, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Hook: The Discovery Problem
We have 10 million movies. You can only watch 5. How do we rank the 10 million for you? This is the Reference vs. Discovery problem. Search is finding what you look for. Recommendation is finding what you didn’t know you wanted.
2. Techniques
A. Content-Based Filtering
“You liked Terminator 1. Terminator 2 has the same tags (Action, Sci-Fi, Arnold). You will like it.”
- Pros: No need for other user data.
- Cons: Can’t recommend “novel” things across genres.
B. Collaborative Filtering (User-User)
“You are like Bob. Bob liked Titanic. You might like Titanic.”
- Core Algo: Find K-Nearest Neighbors to User A based on history vectors.
C. Matrix Factorization (The Netflix Prize)
Represent Users and Movies as vectors in a “Latent Space” (e.g., 50 dimensions).
Rating(User, Movie) ≈ DotProduct(UserVector, MovieVector).
3. Vector Search (Modern Era)
Represent everything (Users, Items, Queries) as Embeddings (High-dimensional vectors). Recommendation = Nearest Neighbor Search in vector space.
- Metric: Cosine Similarity (Angle between vectors). \cos(\theta) = \frac{A \cdot B}{||A|| ||B||}
4. Implementation: Cosine Similarity (Java)
public double cosineSimilarity(double[] vectorA, double[] vectorB) {
double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
for (int i = 0; i < vectorA.length; i++) {
dotProduct += vectorA[i] * vectorB[i];
normA += Math.pow(vectorA[i], 2);
normB += Math.pow(vectorB[i], 2);
}
if (dotProduct == 0) return 0;
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
5. Deep Dive Strategy Lab: Recommendations
Intuition Through Analogy
The Record Store Clerk. The clerk has a mental map of music. “Oh, you like The Beatles? They are ‘over here’ on the map. Close to them are The Beach Boys.” Embeddings literally create this geometric map inside the computer’s memory.
Mathematical Anchor: Sparsity
The User-Item Matrix is 99.99% empty (sparse). Most users haven’t rated most items. Naive storage (N \times M) is impossible. Matrix Factorization compresses this into (N \times K) and (M \times K) where K \ll N, M. This is Dimensionality Reduction.