Architecture overview

   user shell                                       browser
        │                                              │
        ▼                                              ▼
  ┌───────────┐                               ┌─────────────┐
  │ cobra CLI │                               │  dashboard  │  (static, embedded via embed.FS)
  └─────┬─────┘                               └──────┬──────┘
        │                                              │  HTTP (localhost, no auth)
        ▼                                              ▼
  ┌───────────────────────────────────────────────────────────────┐
  │         apps/cli/internal/service/   ←  FUNCTION LAYER          │
  │  (CLI- and HTTP-agnostic Go funcs: Init, Doctor, Build, …)        │
  └───┬──────────────────┬───────────────┬──────────────────────┘
        ▼                  ▼                 ▼
  ┌─────────┐         ┌──────────┐      ┌─────────────┐
  │ wiki +  │         │  skills  │      │ apps/llama  │  (cgo, in-process)
  │ RAG     │         └──────────┘      │   │         │
  └─────────┘                          │   ▼         │
                                       │ libllama.a  │  (statically linked)
                                       └─────────────┘

One binary, one process. No subprocess, no port allocation, no HTTP round-trip per token. llama.cpp is linked statically into the Go binary via cgo. The dashboard assets are baked in via embed.FS.

Layered design

apps/cli/internal/cli/ (cobra commands) — thin presentation layer. Parses flags, builds option structs, calls into service/.
apps/cli/internal/tui/ (Bubble Tea) — interactive prompts (model picker so far). Also a thin layer above service/.
apps/cli/internal/http/ (HTTP handlers) — used by serve. Pure presentation; routes to service/.
apps/cli/internal/service/ — the function layer. All business logic. CLI-agnostic, HTTP-agnostic. Returns errors and values. Never calls os.Exit, never prints to stdout.
Foundation packages — manifest/, project/, wiki/, wikigen/, rag/, skills/, packzip/, manifest/. Each owns one slice of pure-Go logic.

Why function-layer matters →

Three apps, one binary

apps/cli — the Go binary local-agents. Hosts the embedded HTTP server when serve is run.
apps/llama — cgo bindings to llama.cpp (git submodule, statically linked).
apps/dashboard — Vite + React 18 + TS SPA. Built to dist/ and embedded via embed.FS into the CLI binary.

Deep dive →

Two retrieval modes

At chat time the LLM has two tool calls:

search_wiki — structured navigation over articles in wiki/. Deep, deterministic, citation-friendly.
search_rag — vector search over chunks of data/. Fast, fuzzy, fall-through for queries the wiki doesn't cover yet.

The model decides per question. Both modes co-exist instead of competing. Why hybrid →

Where the model lives

Platform	Default path	Env override
linux	`$XDG_CONFIG_HOME/local-agents/` (default `~/.config/local-agents/`)	`$LOCAL_AGENTS_HOME`
darwin	`~/.config/local-agents/`	`$LOCAL_AGENTS_HOME`
windows	`%USERPROFILE%\.config\local-agents\`	`%LOCAL_AGENTS_HOME%`

Layout:

.config/local-agents/
├── model/<id>.gguf       # the chat model
└── embed/<id>.onnx       # the embedding model (bge-small by default)

Machine-wide cache. Same blob serves every project on the host.

Why cgo, not subprocess

We considered running prebuilt llama-server as a child process. The trade-offs flipped because:

Subprocess management is non-trivial: ports, health checks, crash recovery, version compatibility.
HTTP round-trip per token isn't free at our volumes.
share --self-contained would have to bundle a llama-server binary per OS/arch — multiplying zip size.
One static binary is a strictly better UX: "download one file, run it."

The cost: a cgo build step + a C toolchain on dev machines + a CI build per OS/arch. All accepted.

Read the specs

The full design lives under docs/specs/ in the repo — browse on GitHub →

Architecture overview ​

Layered design ​

Three apps, one binary ​

Two retrieval modes ​

Where the model lives ​

Why cgo, not subprocess ​

Read the specs ​