Add search in five minutes or press ⌘K to watch it search these docs.

API

Search endpoint

The integration injects one on-demand route tree (default /api/ask). The base route serves the overlay: keyword mode returns JSON; agentic mode streams a grounded answer as Server-Sent Events (text/event-stream). Keyless sub-routes expose the committed knowledge graph for CLIs, MCP servers, and generated clients.

The OpenAPI 3.1 contract is published at /openapi.yaml.

Suggested questions (GET)

GET /api/ask returns the knowledge graph’s baked-in suggestions and the loop model — no query, no model call:

{
  "suggestions": ["How does the knowledge graph stay fresh?"],
  "model": "claude-haiku-4-5"
}

The overlay fetches this once on first open (when AI is on) to populate its suggested questions. An empty suggestions array — a graph without them, or no graph at all — just means the overlay shows none.

Knowledge graph reads (GET)

These routes read virtual:hev-ask/kg, never call a model, and never require an API key:

RouteResponse
GET /api/ask/glossary{ "terms": GlossaryEntry[] }
GET /api/ask/glossary/{term}one GlossaryEntry, matched by term or alias
GET /api/ask/sections{ "sections": SectionSummary[] }
GET /api/ask/sections?group=APIsection summaries filtered by group
GET /api/ask/sections/{id}one full KnowledgeNode
GET /api/ask/overview{ "overview": string, "context": string }

A SectionSummary is the lightweight shape { id, title, heading, group, url }. For section IDs that contain / or #, URL-encode the ID when placing it in the path, for example /api/ask/sections/api%2Fcli%23flags.

Missing glossary terms, section IDs, or unknown read routes return 404 with a JSON error:

{ "error": "Not found." }

Request

POST with a JSON body:

{
  "query": "how does autoscaling work",
  "mode": "agentic"
}
FieldTypeDescription
querystringThe search query. Empty or whitespace returns an empty result set.
mode'keyword' | 'agentic'Optional. keyword forces the instant path; agentic requests the loop. Omitted behaves like agentic when a key is present.

Keyword response (JSON)

Keyword mode returns a 200 JSON envelope:

{
  "results": [
    {
      "title": "Concepts",
      "heading": "The agentic search loop",
      "url": "/docs/concepts#the-agentic-search-loop",
      "group": "Overview",
      "snippet": "When the reader presses Enter, the query goes to a bounded loop…"
    }
  ],
  "query": "how does agentic search work",
  "model": "claude-haiku-4-5",
  "mode": "keyword"
}
FieldTypeDescription
resultsResult[]Ranked keyword matches (title, heading?, url, group?, snippet).
querystringEchoed back.
modelstringThe configured loop model.
mode'keyword'The mode that ran.
warningstring?Present when agentic was requested but no key is configured (downgrade).

The url field carries the deep link — the page URL with #anchor appended for a section, absent only for a document’s intro chunk.

Agentic response (SSE)

When a key is present and mode is agentic, the endpoint responds with content-type: text/event-stream and streams the answer as it is generated. Each event is a named SSE frame:

event: search
data: {"query":"autoscaling"}

event: sources
data: {"sources":[{"title":"Core Concepts","heading":"Kubernetes autoscaling","url":"/docs/concepts#kubernetes-autoscaling","group":"Overview"}],"model":"claude-haiku-4-5","mode":"agentic"}

event: token
data: {"text":"Autoscaling scales workers based on "}

event: token
data: {"text":"lag signals. See [autoscaling](/docs/concepts#kubernetes-autoscaling)."}

event: done
data: {}
EventDataMeaning
search{ query }Context the model gathered — a search sub-query, or the heading of a section it opened. May fire several times.
sources{ sources: Source[], model, mode }The grounding source set, sent once before any token. Clients validate answer links against it.
token{ text }One delta of the streamed Markdown answer.
done{}The stream is complete.
error{ error }A failure that occurred after streaming began (HTTP status is already 200).

A Source is { title, heading?, url, group? } — note there is no snippet; the answer prose carries the substance, and links point at url.

Mode selection

The endpoint decides what to run:

  • Empty query{ results: [], query: "", model, mode: "keyword" } (JSON).
  • mode: "keyword", or no API key → keyword JSON, mode: "keyword".
  • mode: "agentic" but no key → keyword JSON plus a warning, and mode: "keyword".
  • otherwise → the agentic SSE stream.

Errors

StatusBodyCause
400{ "error": "Invalid JSON body." }The request body wasn’t valid JSON.
404{ "error": "…" }A knowledge-graph read route, glossary term, or section ID wasn’t found.
500{ "error": "…" }The chunk index failed to build (e.g. a misconfigured collection).
event: errorA failure during the agentic stream. The HTTP status is already 200, so errors arrive as a final SSE error event rather than a status code.

The API key

The endpoint resolves ANTHROPIC_API_KEY from, in order: the adapter runtime env (locals.runtime.env, e.g. Cloudflare), process.env, then import.meta.env. Set it wherever your host injects server secrets; it is never sent to the browser.

LLM tracing

Set POSTHOG_KEY (or POSTHOG_API_KEY) in the same environment and every agentic answer emits a PostHog $ai_generation trace — model, tokens, latency, and the loop’s tool calls. POSTHOG_HOST overrides the US-cloud ingestion host, and POSTHOG_CAPTURE_CONTENT (off | redacted | full, default full) controls how much prompt and answer text ships with each event. No key → no-op; the answer path never depends on telemetry.

Index lifecycle

The chunk index is built once per server instance on the first request and cached for the process lifetime. On the first request the endpoint also compares the live content hash against the knowledge graph’s hash and logs a one-time warning if they differ — your cue to run ask kg build.

esc