Skip to content
Go back

RAG Is Not Just Chunking → Embedding → Retrieval → Generation

Updated:

Cover



If I had a dollar $ for every time someone explained RAG in exactly four boxes and an arrow between each, I’d have enough to fine-tune a small LLM by now.

Here’s the thing — those four boxes aren’t wrong. They’re just the skeleton. And a skeleton without organs, blood flow, and a nervous system doesn’t walk anywhere. It just lies there looking like it should work.

So before you nod along to the “it’s simple” version, sit with these for a second:

That’s not pedantry. That’s the entire difference between a RAG demo that wows your manager once and a RAG system that survives contact with real users and real documents.


The Real Flow (Bird’s-Eye View)

Think of it less like a pipe and more like a relay race with judges at every handoff:

StageWhat’s actually happeningThe question nobody asks
ParsingDocuments → clean structured textDid tables/images survive, or vanish?
ChunkingSplitting text into digestible piecesWhy this size? Why this overlap?
EmbeddingTurning chunks into vectorsDoes this model “get” your domain?
StorageVectors land in a DBPicked for hype, or for your scale/latency needs?
Hybrid SearchKeyword (BM25) + semantic searchAre you only doing vector search and missing exact matches?
Metadata FilteringNarrowing by source/date/deptOr is everything just dumped into one giant pile?
RerankingCross-encoder re-scores top candidatesOr are you trusting raw similarity scores blindly?
Context SelectionPicking the final Top-K chunksToo few = missing info. Too many = confused LLM.
GenerationLLM writes the answerGrounded in your docs, or politely hallucinating?
Answer RelevancyDid it actually answer the questionAnyone checking, or just shipping it?



Every single row above has its own failure modes, its own trade-offs, and honestly — its own rabbit hole worth a blog post of its own.


Claude Opus 4.7


Why This Actually Matters

A “simple” RAG pipeline fails silently. It doesn’t crash — it just gives you a confidently wrong answer, citing a chunk that’s 70% irrelevant, built from a table your parser butchered, retrieved because it was vector-similar rather than actually-useful. And nobody notices until a user does.

Good RAG isn’t about stacking the four boxes. It’s about making every junction in that relay race accountable — parsing accountable for fidelity, chunking accountable for context, retrieval accountable for relevance, generation accountable for grounding.

What’s Next

This was the 30,000-ft view — intentionally not deep, just enough to make you go “oh, there’s way more going on here.” Up next, I’ll deep-dive each stage one by one, starting with the most underrated villain of every RAG pipeline: document parsing (yes, before you even think about chunking).

Stay tuned. 🧠


Inspired by my own hurdles :slightly_smiling_face:


Share this post on:

Next Post
Wait, how many P's are in "strawperry"?