Blog

Why We Built RAG Into Document Intelligence — And What It Actually Changes 

Mithran Bala

Strev

Every operations team I’ve spoken to has the same problem. 

The information they need is sitting inside documents — vendor contracts, maintenance agreements, service schedules, compliance records. Payment terms. Expiry dates. Liability caps. Auto-renewal clauses. It’s all there, in writing, somewhere inside a stack of PDFs that nobody has time to fully read. 

Ctrl+F isn’t an answer. Waiting on legal isn’t an answer. That’s the gap we built Document Intelligence to close. 

Keyword search finds words. It doesn’t find answers. 

Search for “payment” in a 40-page vendor agreement and you’ll get 30 results. None of them tell you what you actually asked. 

RAG works differently. When a document is uploaded, it’s broken into chunks that preserve meaning — not arbitrary splits, but actual semantic boundaries. Each chunk is converted into a representation of what it means, not just what it says. When a question comes in, the system finds the chunks that are closest in meaning to that question and hands them to the language model as context. 

The model answers from those chunks. Not from memory. Not from training data. From the document in front of it. 

That’s why citations are possible. We know exactly which chunk the answer came from because we gave it to the model. Page 4 isn’t decoration — it’s proof. 

What this looks like in practice 

A procurement manager uploads a Q3 vendor agreement. Before they’ve typed a single question, Document Intelligence surfaces a summary: three vendors with overdue invoices, two auto-renewal clauses triggering within 30 days, two liability caps flagged below standard. 

Then the questions start. 

“What are the payment terms for our Tier 2 suppliers?” 

Answer in seconds: Net-60 terms, two early-payment discounts negotiated. Cited to page 4. 

“Which of those have auto-renewal clauses triggering before Q2?” 

The system retains context from the previous turn, finds the relevant sections, returns the specific contracts at risk with page references for each. 

The conversation builds on itself. Follow-up questions don’t start from scratch — they build on what came before, the way a conversation with a colleague would. Except this colleague has read every page. 

— 

What we got wrong the first time 

Chunking. Almost every RAG quality problem we hit traced back to chunking strategy. 

Split a document at arbitrary character counts and you break the context that makes individual sentences meaningful. You retrieve one half of an argument without the other. The model either hedges or fills the gap confidently — and confident wrong answers in a contract context are worse than no answer. 

Getting this right — respecting document structure, preserving meaning across boundaries, handling tables and structured data differently from prose — took longer than the retrieval and generation work combined. It’s the part of RAG that gets the least attention and causes the most production failures. 

The other thing: RAG doesn’t fix a bad document. If a contract is ambiguous, the answer will reflect that ambiguity. We treat this as a feature. If the system can’t answer confidently, it says so. That’s more useful than false precision. 

Why the citation is the feature 

An accurate answer you can’t verify is still a black box. 

A cited answer is different. You can go to page 4 and read the clause yourself. You can decide whether the summary captured it correctly or missed a nuance. The citation makes the answer auditable — which matters more than accuracy in a context where a wrong read of a renewal clause or a liability cap has real financial consequences. 

The constraint RAG puts on the model — answer from the document, not from memory — isn’t a limitation. It’s the design decision that makes the output trustworthy enough to act on. 

Mithran Bala

Strev

Share this blog:

Still juggling spreadsheets, emails, and disconnected systems?

Bring everything into one platform and finally see what’s happening across your business in real time.

Try it for freeBook a demo

Subscribe to our newsletter:

Subscription Form

Share this blog:

You Might Also Like

Explore additional resources that provide deeper insights into asset management and related business practices.

  • All Posts
  • Asset Lifecycle Management
  • Asset Management
  • Blog
  • Boosting Efficiency, ower of Productivity, Task Management
  • Business
  • Contract Management
  • Event
  • Finance
  • Legal
  • News
  • Success Stories

1900 Powell St. Suite 700,
Emeryville, CA, 94608, USA

+1 (855) 873-8683