Building a private LLM that knows your business

Off-the-shelf chatbots are useful, until the question they're asked is specific to your business. Then they hallucinate or hedge — and rightly so, because they have no idea what your product is, who your customers are, or how your team uses certain words to mean very specific things. The fix isn't a better model; it's a model that has access to your knowledge. That's what people mean by "a private LLM" in 2025, and the way you get there is a stack of two pieces: a good model, plus retrieval (RAG).

Retrieval-augmented generation, simplified: when somebody asks the assistant a question, the system searches your internal documents (Notion, Drive, knowledge base, etc.) for relevant chunks, stuffs those chunks into the prompt alongside the question, and asks the model to answer using that context. The model is doing what it always does — generating plausible text — but now it's grounded in your actual material, with citations back to where the information came from. Done well, this is the difference between a chatbot that knows trivia and an assistant that knows your business.

The pieces are mostly commodity now. A vector database (Pinecone, pgvector, others). An embedding model (OpenAI, Cohere, open-source). A chat model (GPT, Claude, Gemini, or self-hosted). An ingestion pipeline that pulls your docs in, chunks them, embeds them, indexes them. And a UI somewhere. The total cost for a 50-person business to run this internally is in the low four figures per year for tooling, plus the engineering time to wire it up and the ongoing care to keep the index fresh. Nothing about it requires custom ML expertise.

What does require care is the boring part — keeping the index up to date as your docs change, handling permissions so different users only see what they should, monitoring the system for embarrassing failures, evaluating new models without breaking the old answers. This is operational work, not ML work, and it's where most internal AI projects falter. The model isn't the problem. The plumbing around the model is. Build that plumbing first, with the cheapest model you can find, and only then start optimizing for quality. You'll know what to optimize for once you have something in production that real people are using.

Want to talk about something in this post? Get in touch.More on AI

Building a private LLM that knows your business

Why your first AI agent should be embarrassingly small

Model selection isn't a model decision