Module Review: RAG
This chapter reviews the key concepts, architectures, and implementation details of Retrieval-Augmented Generation (RAG).
Key Takeaways
- RAG = Retrieval + Generation: It solves LLM hallucinations and knowledge cutoffs by providing external context.
- Embeddings: Vectors that represent semantic meaning. Similar concepts are close in vector space.
- Vector Databases: Specialized stores (Pinecone, ChromaDB) optimized for high-dimensional similarity search using ANN (HNSW).
- Chunking Matters: How you split text affects retrieval quality. Recursive chunking is generally better than fixed-size.
- Hybrid Search: Combining Keyword Search (BM25) and Vector Search yields the best results (Recall).
- Re-ranking: A second pass using a Cross-Encoder drastically improves precision.
- Production RAG: Is not a linear pipeline but a complex system with query expansion, routing, and self-correction.
Interactive Flashcards
Test your knowledge by flipping the cards.
What are the two main problems RAG solves?
(Click to flip)1. Hallucinations (making up facts)
2. Knowledge Cutoffs (outdated data)
What is an Embedding?
A vector (list of numbers) representing the semantic meaning of text.
Which distance metric is most common for text similarity?
Cosine Similarity (measures the angle between vectors).
What is the trade-off of Re-ranking?
It improves accuracy (precision) but increases latency (slower) and cost.
What does HNSW stand for?
Hierarchical Navigable Small World (an algorithm for fast approximate nearest neighbor search).
RAG Cheat Sheet
Common Hyperparameters
| Parameter | Recommended Start | Description |
|---|---|---|
| Chunk Size | 512 - 1024 tokens | Size of each text block. |
| Chunk Overlap | 10% - 20% | Characters shared between chunks to preserve context. |
| Top K | 3 - 5 | Number of documents to retrieve. |
| Temperature | 0.0 - 0.3 | Lower temperature reduces hallucinations in RAG. |
RAG Components
| Component | Popular Tools |
|---|---|
| Orchestration | LangChain, LlamaIndex |
| Vector DB | Pinecone, ChromaDB, Weaviate, pgvector |
| Embeddings | OpenAI text-embedding-3, HuggingFace all-MiniLM-L6-v2 |
| Evaluation | RAGAS, TruLens |
Quick Revision
- RAG Triad: Retriever, Augmenter, Generator.
- Vector Search: Uses cosine similarity to find semantically related documents in an N-dimensional space.
- Chunking: Breaking documents into optimal sizes. Recursive is preferred over fixed size.
- Hybrid Search: Combining keyword search (BM25) with vector search and fusing results (RRF).
- Re-ranking: An essential step to increase precision by scoring top-k results with a cross-encoder model.
Glossary Link
Next Steps
Now that you understand how to augment LLMs with external data, let’s learn how to permanently teach them new skills.
Module 04: Fine-Tuning (Coming Soon)