What pocket llm is
pocket llm is a CLI named local-agents that turns a folder of documents and shell commands into a queryable AI knowledge base. The output is a portable zip — anyone can chat against it with zero internet access.
Why it exists
Most teams have scattered knowledge: PDFs in Drive, runbooks in Notion, tickets in Jira, code in GitHub. Existing RAG tools either force you online (OpenAI, Anthropic) or are too heavy to deploy (vector DBs, orchestration frameworks). Karpathy's LLM-wiki idea is more powerful than RAG alone — an LLM-maintained markdown wiki that compounds over time. Combined with classic RAG for fuzzy lookup, the chat LLM picks the right tool per question.
A team should be able to ship a knowledge base to a stakeholder over Slack and have them ask questions on a plane. That's the product.
Headline moves
- Project folder is the unit.
local-agents init my-handbookcreates a folder. Likecargo neworrails new. The folder is the source, the build artifact, and the share unit. - Three folders, one source of truth.
data/(raw),wiki/(articles — both generated and human-edited),skills/. No second ownership folder, no shadow rules. - Wiki generation is its own step.
wiki generateturns eachdata/<name>into awiki/<name>.mdarticle usingwiki-instructions.mdas the prompt template.buildis a cheap packaging step that doesn't run the LLM. - The instructions file is portable. Same
wiki-instructions.mdfeeds the local model OR claude-code by hand. Drop the result intowiki/<name>.md;builddoesn't care which path produced it. - Wiki + RAG, picked by the LLM via tool calls. At chat time the model calls
search_wiki(structured) orsearch_rag(fuzzy) per question. - SKILL.md is the industry standard. Same skill format Claude Code / OpenAI Codex / Microsoft Agent Framework / Google ADK use. Four-stage progressive disclosure for context efficiency.
- Offline share is non-negotiable.
build --self-containedbundles the model too. The recipient never goes online.
Hard requirements
| # | Requirement | Why |
|---|---|---|
| R1 | All inference offline on CPU after install. | "Works on a plane" is the product. |
| R2 | Single Go binary with llama.cpp statically linked. | One file, fewer moving parts. |
| R3 | Every command works interactively (TTY) and non-interactively (flags). | Terminal-friendly, scriptable. |
| R4 | Project state lives entirely in the project folder. | Portable. Git-able. Shareable. |
| R5 | No auth, no accounts, no telemetry, no call-home. | Personal-use product. |