One night only·First consultationfree·Book your AI Audit →
← Blog·STRATEGY·April 15, 2026 · 6 min read

RAG for small companies: do you really need it?

When Retrieval Augmented Generation is the right answer, and when it is just another expensive way to say "we searched through your documents".

Written by Andrea Droghetti

Every month I hear the same line in discovery: "we want to do RAG on our documents". CFOs say it, CTOs say it, operations teams say it. Even companies that do not have the infrastructure to manage a shared Excel sheet say it.

RAG — Retrieval Augmented Generation — is a serious technique. When you need it, you really need it. When you do not, it becomes the most expensive and fragile way to do full-text search on the market today.

Here are some practical rules.

When RAG makes sense

Three conditions, all three required:

1. You have a large, roughly static corpus. Large means: too many documents for a human to read in a useful time frame. Roughly static means: it changes week by week, not minute by minute. 2. The reader's question is in natural language, not pointed. If all you need is "find invoice 12345", you do not need RAG, you need grep. RAG is for queries like "what have we written in the past about contracts that include non-compete clauses extending to 24 months" — text, not key. 3. The output you expect is an argued synthesis, not a single fact. If you need the exact answer to a closed question, a SQL function on a structured DB does it much better. RAG shines when you have to pull together passages that live in different places.

All three. If even one is missing, you are almost always over-engineering.

When it does NOT make sense

Ten companies in a hundred actually have a real RAG problem. The other 90:

  • Have 200 documents. A GPT with all of it in context does perfectly well, costs less than setting up a vector DB, and the answer quality is better. Indicative threshold: under 50,000 total tokens you stay in-context, above that you need retrieval.
  • Have a corpus that changes every hour. Continuous reindexing is painful. For those use cases, almost always the right answer is an agent that calls an API/SQL tool in real time, not a vector store.
  • Need exact, auditable citations. RAG does not guarantee that the model actually cites what it found; it tends to paraphrase. For compliance you need tools that give you the source document, the page, the line — a classic search engine with highlighting works better.

What it really costs

Setting up a decent RAG for a mid-size SME client (10-50 thousand documents, 3-5 concurrent users, Italian-language domain):

  • Extraction + chunking + embedding: 2-3 weeks of work the first time, then 1-2 days for each large reindex.
  • Infrastructure: managed vector DB (Pinecone, Supabase pgvector) starts at around €30/month, climbs quickly with volume and QPS.
  • Cost per query: depends on the model, but roughly €0.005-€0.02 per search with synthesis.
  • Maintenance: 0.5-1 day per month of monitoring and re-tuning. It is not zero.

For many scenarios, a "no-RAG" solution — an agent doing SQL on Postgres + a search server like Meilisearch in front — costs half and works better.

The real case that convinced us

A Milan-based law firm, 18 thousand historical case files, with one constraint: "clients pay us for our institutional memory, but that memory sits in the heads of the two senior partners who are retiring in 18 months". There, RAG genuinely makes sense. We indexed the anonymised PDF archives, built an agent that answers questions like "have we ever seen an arbitration clause in sector X with outcome Y", and cites the case files. A junior recovers in 5 minutes the intuition that previously required walking down the hall to the partner's office.

Setup: 3 weeks of work, monthly infra cost ~€90, ROI calculated on partner time saved: payback in 2 months.

The question to ask yourself

When a client asks us "we would like to do RAG", the first thing we do is ask them to describe three typical queries they would want to run. If the queries are pointed facts, we propose SQL + search. If they are syntheses, and the corpus is large, then RAG. If they are syntheses and the corpus is small, GPT in-context.

70% of the time the right answer is not RAG. The difference between us and a company selling RAG as the only hammer: we do not have a single hammer.

§ Encore

Want to put it in your process?

Free one-day audit, no 80-slide PowerPoint. You walk out with a 3-month roadmap and numbers to challenge.

Book the free audit →
MORE ARTICLES

Keep reading.