Week 04: RAG Pipelines — Basics to Agentic

Build: Enterprise document Q&A system with hybrid retrieval
Overview
Topics
Weekly Build
Resources
Homework

What You'll Learn

RAG is table stakes. Agentic RAG — where retrieval triggers actions, multi-hop reasoning, and re-ranking — is what production systems actually need.

Session Schedule

DayTimeFocus
Saturday8:00 - 11:00 PM WATRAG Architecture & Retrieval
Sunday8:00 - 11:00 PM WATHybrid Search & Evaluation Build

Pre-Requisites

  • Weeks 01-03 completed
  • PGVector running in Docker
  • Understanding of embeddings and vector search

Topics Covered

Chunking Strategies & Embedding Models

Fixed-size, semantic, recursive chunking, OpenAI vs HuggingFace embeddings. How you chunk determines how well you retrieve — get this wrong and nothing downstream matters.

Chunking Semantic Split Embeddings

Hybrid Search: Dense + Sparse

BM25 for keyword, vector for semantic, Reciprocal Rank Fusion. Why neither approach works alone and how to combine them for production-grade retrieval.

BM25 Vector Search RRF

Re-ranking & MMR

Cross-encoder re-ranking, Maximal Marginal Relevance, diversity vs relevance. The post-retrieval step that separates good RAG from great RAG.

Re-ranking MMR Cross-Encoder

Multi-hop Retrieval

Query decomposition, iterative retrieval, chain-of-retrieval patterns. When one retrieval pass isn't enough to answer complex questions.

Multi-hop Query Decomposition Iterative

Evaluation with RAGAS

Faithfulness, answer relevancy, context precision, context recall metrics. You can't improve what you can't measure — build evaluation into your pipeline from day one.

RAGAS Faithfulness Precision

Weekly Build: Enterprise Document Q&A

Build a full RAG pipeline with 7 layers: ingestion, chunking, embedding, storage, query understanding, access control, and hybrid search.

Architecture

Documents (PDF/DOCX/TXT)
    |
    v
Layer 1: Ingestion (document_loader.py)
    |
    v
Layer 2: Chunking (semantic + recursive)
    |
    v
Layer 3: Embedding (OpenAI text-embedding-3-small)
    |
    v
Layer 4: Storage (PGVector)
    |
    v
User Query → Layer 5: Query Understanding (expansion + intent)
    |
    v
Layer 6: Access Control (RBAC tier filter)
    |
    v
Layer 7: Hybrid Search (BM25 + Vector + RRF)
    |
    v
Grounded LLM Answer + P@5 Score

Key Files

FilePurpose
rag/pipeline.pyMain pipeline orchestration
rag/ingestion/Document loaders (PDF, DOCX, TXT)
rag/chunking.pySemantic & recursive chunking
rag/embeddings.pyEmbedding generation
rag/vector_store.pyPGVector CRUD operations
rag/query_understanding.pyQuery expansion & intent detection
rag/access_control.pyRBAC tier filtering
rag/hybrid_search.pyBM25 + Vector + RRF fusion
rag/evaluation.pyRAGAS metrics & scoring

Resources

Required Reading

  • LangChain RAG documentation
  • RAGAS evaluation framework docs
  • "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" paper

Code Repository

Switch to the week-04 branch:

git checkout week-04

Session Recording

Recording will be available within 24 hours after the live session. Check the WhatsApp group for the link.

Homework

Due before CORTEX project kickoff session.

  1. Complete the enterprise Q&A build — push your code to the bootcamp repo
  2. Run RAGAS evaluation — generate faithfulness, relevancy, and precision scores on at least 20 test questions
  3. Compare chunking strategies — benchmark fixed-size vs semantic vs recursive on the same document set
  4. Write a 1-page reflection on retrieval quality — what surprised you about how chunking affects answers? Share in the WhatsApp group