Why This Playbook?
RAG is often presented as “retrieve then generate,” but production reliability depends on strict grounding behavior. This playbook focuses on that engineering gap: not just answering from docs, but also refusing unsupported claims in a predictable way.
Repo:
-
RAG playbook: github.com/amiya-pattnaik/rag-engineering-playbook
-
Other Related Repo:
- Generative playbook: github.com/amiya-pattnaik/generativeAI-engineering-playbook
- Agentic playbook: github.com/amiya-pattnaik/agentic-engineering-playbook
What It Demonstrates
- Local knowledge retrieval:
.md/.txtfiles are chunked, embedded, and indexed in memory. - Grounding-first responses: answers must include valid citations from retrieved chunks.
- Guarded abstention: low-confidence retrieval or unsupported questions return safe fallback instead of fabricated answers.
- Coverage-aware behavior: multi-part questions can return grounded known facts while explicitly marking unsupported parts.
- UI grounding indicator: clear status for grounded vs blocked outcomes.
- Scenario-based evals: answerable, unanswerable, and partial cases with pass/fail scoring.
Architecture (Simple and Practical)
- Node.js + Express backend with static HTML UI.
- Ingestion pipeline: chunking -> embedding -> vector store.
- Retrieval pipeline: cosine similarity ranking + top-k selection.
- RAG route validates citations and normalizes unsafe outputs.
- Scenario runner generates JSON/Markdown reports for repeatable checks.
Key controls:
MIN_RETRIEVAL_SCORE: blocks generation when top retrieval quality is weak.MIN_QUESTION_COVERAGE: blocks questions not sufficiently covered by retrieved docs.- Citation validation: only retrieved chunk IDs are accepted as grounded references.
Grounding Contract in UI
- Show
Grounding status: groundedonly when the answer has valid citations from retrieved chunks. - For unsupported questions, show blocked/ungrounded status, not
grounded.
Blocked reasons include:
retrieval_below_thresholdquestion_not_coveredinvalid_citationsno_grounded_citationsnon_json_model_outputempty_answer
Quickstart
git clone https://github.com/amiya-pattnaik/rag-engineering-playbook.git
cd rag-engineering-playbook/demo-app
cp .env.example .env
npm install
npm run dev
# open http://localhost:3000
Mock mode works offline by default.
Use OpenAI provider mode:
- Set
OPENAI_API_KEYin.env. - Keep
OPENAI_TEMPERATURE=0for deterministic outputs.
Scenario Runner and Anti-Hallucination Eval
# run all default scenarios
npm run demo:scenarios
# run dedicated anti-hallucination suite
npm run demo:anti-hallucination
The anti-hallucination suite validates:
- Answerable: must be grounded and include expected facts.
- Unanswerable: must abstain and avoid grounded factual claims.
- Partial: must provide grounded known facts and explicitly abstain on unknown parts.
Latest run in this repo: 10/10 scenarios passed.
Why This Matters for Engineering and QA
- Behavioral confidence: you can test grounding policy, not just response style.
- Safer demos and pilots: unsupported asks are blocked with explicit reasons.
- CI-friendly quality gate: eval exits non-zero on failures.
- Extensible foundation: add knowledge docs, providers, or workflows without rewriting the core contract.
Extending
- Add domain docs under
demo-app/data/knowledge/and runnpm run ingest. - Tune retrieval/grounding with
TOP_K,MIN_RETRIEVAL_SCORE, andMIN_QUESTION_COVERAGE. - Expand
demo-app/scenarios/anti-hallucination-eval.jsonwith domain-specific answerable/unanswerable/partial cases. - Add providers in
demo-app/src/providers/and keep strict JSON + citation validation indemo-app/src/services/rag.js.
Notes
- The vector store is in-memory for demo simplicity; server restart rebuilds state from knowledge files.
groundedmeans citations are valid against retrieved chunks, not global factual correctness outside provided docs.- Mock mode is deterministic for offline demos; provider mode quality depends on model and prompt adherence.
Closing Thought
Reliable RAG is less about elegant prompts and more about explicit engineering contracts: retrieval thresholds, coverage checks, citation checks, and measurable evals. This playbook keeps those contracts visible and runnable.