Week 08 — Capstone | SoftBricks Academy

What You'll Learn

The final differentiator. You'll instrument your system with Langfuse and LangSmith, write evals with pytest, set up alerting, and ship your capstone project — a complete production AI system that demonstrates real engineering judgment.

Session Schedule

Day	Time	Focus
Saturday	8:00 - 11:00 PM WAT	Observability & Evaluation
Sunday	8:00 - 11:00 PM WAT	Capstone Review & Presentations

Pre-Requisites

ALL weeks completed
CORTEX project at M2+
Deployed API from Week 07

Topics Covered

Langfuse & LangSmith Tracing

Trace setup, span tracking, cost attribution, latency analysis. See exactly what your agents are doing and how much it costs.

Langfuse LangSmith Tracing

LLM Evaluation Frameworks

LLM-as-judge, pairwise comparison, rubric-based scoring. Measure quality systematically instead of vibes-based testing.

LLM Judge Pairwise Rubric

pytest for AI Systems

Deterministic tests, golden dataset tests, flaky test handling, fixtures. Build a test suite that catches regressions before users do.

pytest Golden Dataset Fixtures

Cost Monitoring & Optimization

Token tracking, cost per query, budget alerts, model routing for cost. Keep your AI system profitable, not just functional.

Cost Tracking Budget Model Routing

Capstone Project Review

Architecture review, code review, demo preparation, portfolio packaging. Polish your capstone into something you're proud to show employers.

Architecture Review Demo Portfolio

Weekly Build: Full Production System

Ship your capstone: a complete production AI system with observability, evaluation dashboard, and portfolio-ready documentation.

Architecture

CAPSTONE SYSTEM
    |
    ├── Agent Layer (LangGraph supervisor + specialists)
    ├── RAG Layer (7-layer pipeline + hybrid search)
    ├── Memory Layer (Redis + PostgreSQL)
    ├── API Layer (FastAPI + auth + rate limiting)
    ├── Async Layer (Celery + webhooks)
    |
    v
OBSERVABILITY
    ├── Langfuse: trace every query
    ├── Cost Dashboard: $/query tracking
    ├── Eval Suite: P@5, MRR, LLM judge
    └── Alerting: latency + error rate
    |
    v
PORTFOLIO
    ├── README with architecture diagram
    ├── Benchmark numbers (P@5 ≥ 0.70)
    ├── Docker: one-command deploy
    └── 3-min video demo (optional)

Key Files

File	Purpose
`observability/langfuse_setup.py`	Langfuse trace configuration
`observability/cost_dashboard.py`	Cost tracking dashboard
`tests/evaluation/test_llm_judge.py`	LLM-as-judge evaluation tests
`tests/evaluation/golden_dataset.json`	Golden dataset for regression testing
`README.md`	Portfolio writeup

Resources

Required Reading

Langfuse Documentation — Tracing & Evaluation
LangSmith Documentation — Testing & Monitoring
Hamel Husain — "Your AI Product Needs Evals"

Code Repository

Clone the bootcamp repo and switch to the week-08 branch:

git clone https://github.com/softbricks-academy/agentic-ai-bootcamp.git
cd agentic-ai-bootcamp
git checkout week-08

Session Recording

Recording will be available within 24 hours after the live session. Check the WhatsApp group for the link.

Homework

Final submissions — due by end of bootcamp.

Complete capstone and push final code — all layers working, tests passing, deployed to cloud
Record 3-minute demo video — use Loom to walk through your system architecture and live demo
Write README with architecture diagram — include benchmark numbers (P@5, MRR, latency)
Submit capstone for review — share repo link and deployed URL in the WhatsApp group

Week 08: Observability, Evals & Capstone