Module Review: RAG
Key Takeaways
- RAG = Retrieval + Generation: It solves LLM hallucinations and knowledge cutoffs by providing external context.
- Embeddings: Vectors that represent semantic meaning. Similar concepts are close in vector space.
- Vector Databases: Specialized stores (Pinecone, ChromaDB) optimized for high-dimensional similarity search using ANN (HNSW).
- Chunking Matters: How you split text affects retrieval quality. Recursive chunking is generally better than fixed-size.
- Hybrid Search: Combining Keyword Search (BM25) and Vector Search yields the best results (Recall).
- Re-ranking: A second pass using a Cross-Encoder drastically improves precision.
- Production RAG: Is not a linear pipeline but a complex system with query expansion, routing, and self-correction.
Interactive Flashcards
Test your knowledge by flipping the cards.
What are the two main problems RAG solves?
(Click to flip)1. Hallucinations (making up facts)
2. Knowledge Cutoffs (outdated data)
What is an Embedding?
A vector (list of numbers) representing the semantic meaning of text.
Which distance metric is most common for text similarity?
Cosine Similarity (measures the angle between vectors).
What is the trade-off of Re-ranking?
It improves accuracy (precision) but increases latency (slower) and cost.
What does HNSW stand for?
Hierarchical Navigable Small World (an algorithm for fast approximate nearest neighbor search).
RAG Cheat Sheet
Common Hyperparameters
| Parameter | Recommended Start | Description |
|---|---|---|
| Chunk Size | 512 - 1024 tokens | Size of each text block. |
| Chunk Overlap | 10% - 20% | Characters shared between chunks to preserve context. |
| Top K | 3 - 5 | Number of documents to retrieve. |
| Temperature | 0.0 - 0.3 | Lower temperature reduces hallucinations in RAG. |
RAG Components
| Component | Popular Tools |
|---|---|
| Orchestration | LangChain, LlamaIndex |
| Vector DB | Pinecone, ChromaDB, Weaviate, pgvector |
| Embeddings | OpenAI text-embedding-3, HuggingFace all-MiniLM-L6-v2 |
| Evaluation | RAGAS, TruLens |
Next Steps
Now that you understand how to augment LLMs with external data, letโs learn how to permanently teach them new skills.
Module 04: Fine-Tuning (Coming Soon)