pocket llmLocal-first AI knowledge base

A CLI that turns a folder of documents and shell commands into a queryable AI knowledge base. Runs fully offline on CPU. Ship the result as one zip — anyone can chat against it on a plane.

Get started →

Where we stand

📦

One binary, no Docker

Single Go binary with llama.cpp statically linked via cgo. The web dashboard is embedded too. No subprocess at runtime, no compose file, no port juggling. Download, run.

How →

📁

Project folder is the unit

Every knowledge base is a directory. data/ holds raw sources, wiki/ holds LLM-generated articles, skills/ holds tool integrations. Portable. Git-able. Shareable.

Layout →

🔍

Wiki + RAG, picked by the LLM

At chat time the model decides whether to call search_wiki (structured, deep) or search_rag (fast, fuzzy) via tool calls — both retrieval modes co-exist instead of competing.

Why hybrid →

✈️

Offline share is the hero use case

build --self-contained packages the model + RAG index + wiki + skills into a zip. Recipient runs import, then chat or serve. No network. The plane test.

How →

The flow in one paragraph

local-agents init my-handbook scaffolds a folder and lets you pick a CPU-tested chat model from a curated short list. You drop PDFs into data/, edit wiki-instructions.md (the LLM prompt template), and run local-agents wiki generate to turn each source into a structured wiki article. local-agents build packages everything — wiki + RAG chunks + skills — into a zip. Send the zip. The recipient runs local-agents import, then local-agents serve opens the dashboard, or local-agents chat drops into a terminal REPL. Throughout, no telemetry, no accounts, no cloud.

What this is not

Not a SaaS. No login, no OAuth, no multi-tenancy.
Not a Docker app. Single binary; if you want a container, you can write your own Dockerfile.
Not a model zoo. One curated chat model per project, picked from a top-N list of CPU-tested options that all pass a tool-calling benchmark floor.
Not online at runtime. chat and serve make zero outbound calls. eval is the dev-time exception and needs an API key.

Where it is right now

The scaffold + init + doctor commands work end-to-end. wiki generate, chat, build, serve, import, eval are stubs that return not implemented. See where we stand for the milestone-by-milestone status.