Case Study: Data as Vectors (Embeddings)

1. Introduction: “Everything is a Vector”

The most powerful idea in Deep Learning is that any data—images, text, audio, user behavior—can be represented as a Vector of numbers.

Once data is a vector, we can use Linear Algebra to solve problems:

Is this email spam? → Angle between Email Vector and Spam Vector.
What movie should I watch? → Dot product of User Vector and Movie Vector.

2. Words as Vectors (Word Embeddings)

How do you represent the word “King” to a computer?

Old Way (One-Hot): [0, 0, ..., 1, ..., 0] (A huge vector with one 1). No semantic meaning.
New Way (Embedding): A dense vector like [0.9, -1.2, 0.4, ...].

These vectors are learned such that similar words appear close together in space.

The Magic Arithmetic

In a well-trained embedding space (like Word2Vec), algebra works on concepts:

King - Man + Woman ≈ Queen

This works because the vector King - Man represents the concept of “Royalty” or “Power” stripped of gender, and adding Woman adds the “Female” direction.

3. Measuring Similarity

How do we know if two vectors are similar? We use Cosine Similarity. It measures the cosine of the angle θ between them.

Similarity = cos(θ) = (A ċ B) / (||A|| ||B||)

1.0: Identical direction (Synonyms).
0.0: Orthogonal (Unrelated).
-1.0: Opposite direction (Antonyms).

4. Interactive Visualizer: The Embedding Space

Explore a 2D projection of a word embedding space.

Click two points to measure their similarity.
Observe how related words (King/Queen, Apple/Orange) cluster together.

Mode: Select 2 points

        > Select two words to compare...
    

5. Summary

Embeddings turn real-world objects into dense vectors.
Vector Algebra captures semantic relationships (King - Man + Woman).
Cosine Similarity is the standard ruler for semantic distance.

Next: Module Review →