Skip to content

What pocket llm is

pocket llm is a CLI named local-agents that turns a folder of documents and shell commands into a queryable AI knowledge base. The output is a portable zip — anyone can chat against it with zero internet access.

Why it exists

Most teams have scattered knowledge: PDFs in Drive, runbooks in Notion, tickets in Jira, code in GitHub. Existing RAG tools either force you online (OpenAI, Anthropic) or are too heavy to deploy (vector DBs, orchestration frameworks). Karpathy's LLM-wiki idea is more powerful than RAG alone — an LLM-maintained markdown wiki that compounds over time. Combined with classic RAG for fuzzy lookup, the chat LLM picks the right tool per question.

A team should be able to ship a knowledge base to a stakeholder over Slack and have them ask questions on a plane. That's the product.

Headline moves

  1. Project folder is the unit. local-agents init my-handbook creates a folder. Like cargo new or rails new. The folder is the source, the build artifact, and the share unit.
  2. Three folders, one source of truth. data/ (raw), wiki/ (articles — both generated and human-edited), skills/. No second ownership folder, no shadow rules.
  3. Wiki generation is its own step. wiki generate turns each data/<name> into a wiki/<name>.md article using wiki-instructions.md as the prompt template. build is a cheap packaging step that doesn't run the LLM.
  4. The instructions file is portable. Same wiki-instructions.md feeds the local model OR claude-code by hand. Drop the result into wiki/<name>.md; build doesn't care which path produced it.
  5. Wiki + RAG, picked by the LLM via tool calls. At chat time the model calls search_wiki (structured) or search_rag (fuzzy) per question.
  6. SKILL.md is the industry standard. Same skill format Claude Code / OpenAI Codex / Microsoft Agent Framework / Google ADK use. Four-stage progressive disclosure for context efficiency.
  7. Offline share is non-negotiable. build --self-contained bundles the model too. The recipient never goes online.

Hard requirements

#RequirementWhy
R1All inference offline on CPU after install."Works on a plane" is the product.
R2Single Go binary with llama.cpp statically linked.One file, fewer moving parts.
R3Every command works interactively (TTY) and non-interactively (flags).Terminal-friendly, scriptable.
R4Project state lives entirely in the project folder.Portable. Git-able. Shareable.
R5No auth, no accounts, no telemetry, no call-home.Personal-use product.

Full TRD on GitHub →

pocket llm — local-first, offline, no telemetry. MIT licensed.