Mirrors kimi-cli's refresh_managed_models behavior: at login (and on every
token refresh) the plugin now GETs /coding/v1/models with the user's JWT,
caches the first returned {id, context_length, display_name} in auth.json,
and rewrites the wire 'model' body field to the cached id inside
loader.fetch. K2.5 accounts (server returns e.g. 'k2p5') and K2.6 accounts
(server returns 'kimi-for-coding') now share identical opencode config.
After successful login, the authorize callback prints a ready-to-paste
provider config block with the discovered values filled in.
- src/oauth.ts: added listModels() (GET /coding/v1/models).
- src/index.ts: OAuthAuth extended with model_id/context_length/model_display;
ensureFresh() refetches on refresh; loader.fetch rewrites JSON body
'model' when cached id differs from MODEL_ID; authorize callback runs
listModels() + console.logs a ready-to-paste config block.
- README.md: drop hardcoded context/output limits and K2.6-specific name;
explain auto-discovery and the K2.5/K2.6 alias mechanism.
- AGENTS.md: rule 6 rewritten to describe the wire-rewrite contract.
- test/plugin.test.ts: +3 tests (discovery on refresh persists metadata,
graceful /models failure, body.model rewrite K2.5 + K2.6 no-op);
existing 401-retry and refresh tests updated for the extra /models call.
Tested live against a real K2.6 account — listModels() returns
[{id: 'kimi-for-coding', context_length: 262144, ...}].
12 KiB
AGENTS.md — working notes for coding agents (and humans)
This file is the single source of truth for any AI agent (or human) modifying this repo. Read it top-to-bottom before touching code. If something you learn here contradicts what you see in the code, the code wins — update this file in the same commit.
User-facing install / usage documentation lives in README.md. Do not duplicate it here.
Purpose
One plugin, one job: make opencode talk to Kimi's kimi-for-coding endpoint exactly the way the official kimi-cli does. Everything in this repo exists to minimize drift from upstream kimi-cli.
The one rule that matters
Moonshot's backend picks the model (K2.5 vs K2.6) from the auth token type, not the model-name string.
- Static
sk-kimi-...API key → K2.5. - OAuth JWT with
scope: kimi-code→ K2.6.
Every design decision here follows from that: we do device-flow OAuth, we do not accept API keys, we do not let the upstream SDK attach its own Authorization header.
Non-goals
- No support for K2.5 or any non-
kimi-for-codingmodel. opencode already handles those via Moonshot / Baseten / Alibaba-CN / etc. - No support for static API keys. Users who want that can use a different opencode provider entry.
- No custom SSE parser, tool-call normalizer, or message rewriter.
@ai-sdk/openai-compatiblealready does SSE/reasoning_contentcorrectly.
Architecture
Three files, 1 job each. Do not add a fourth unless the existing three genuinely can't hold a new concern.
| File | Responsibility |
|---|---|
src/constants.ts |
Pinned strings that must mirror upstream kimi-cli (version, endpoints, client id, scope). |
src/headers.ts |
The seven X-Msh-* / UA headers + the persistent ~/.kimi/device_id file. |
src/oauth.ts |
Device-code start, device-code poll, refresh-token exchange, and GET /coding/v1/models discovery. |
src/index.ts |
Plugin entry. Wires auth hook (login + loader) and chat.params hook. |
Data flow on a chat request:
- opencode asks the
@ai-sdk/openai-compatibleprovider for a language model. - Before instantiating it, opencode calls our
auth.loader. We return{ apiKey, fetch }. - The SDK uses our
fetchfor every HTTP call (models, chat, whatever). - Our
fetchcallsensureFresh()→ maybe refreshes → sets Authorization + the sevenX-Msh-*headers → on 401 refreshes once and retries. - Separately, opencode runs the
chat.paramshook and writesthinking,reasoning_effort,prompt_cache_keyintooutput.options. opencode wraps those as{ [providerID]: options }and the openai-compatible SDK forwards them as top-level body fields. That is why those keys must use exactly the wire names (prompt_cache_key,reasoning_effort,thinking).
Contracts to keep intact
These are the invariants that, if broken, silently degrade K2.6 → K2.5 or produce fingerprint-based throttling. Do not "clean them up" without reading the linked upstream.
-
X-Msh-VersionandUser-Agentmust trackkimi-cli. Bumping involves exactly one line insrc/constants.ts. See upstreamresearch/kimi-cli/src/kimi_cli/constant.py. The UA prefix isKimiCLI/(notKimiCodeCLI/) — Moonshot'skimi-for-codingbackend 403s withaccess_terminated_error: only available for Coding Agents such as Kimi CLI, Claude Code, Roo Code…on any other prefix. Likewise,X-Msh-Device-Modelis"{system} {release} {machine}"(e.g.Linux 7.0.0 x86_64) — NOT just{arch}— andX-Msh-Os-Versionis the kernel build string fromos.version(), NOT"{type} {release}". Tested live againstapi.kimi.com/coding/v1on 2026-04-17 — any of those three fields off-spec → 403. -
X-Msh-Device-Idmust be stable across runs. Never regenerate a fresh UUID at import time.getDeviceId()reads/writes~/.kimi/device_id; that path is shared withkimi-clion purpose. -
Authorizationheader is owned byloader.fetch. Anything else (opencode core, the SDK, future hooks) must be overridden. Ourloaderdeletes bothauthorizationandAuthorizationbefore setting its own. -
Effort ↔ fields mapping (kimi-cli
llm.py/kosong/chat_provider/kimi.py):Effort reasoning_effortthinkingauto(omitted) (omitted) off(omitted) {type:"disabled"}low"low"{type:"enabled"}medium"medium"{type:"enabled"}high"high"{type:"enabled"}autois the "let the server decide dynamically" variant — neither field is sent, matching kimi-cli's "nothing passed" default. When no effort is set at all, the plugin still emitsthinking: {type: "enabled"}because the model is a reasoner. Gate the hook oninput.model.providerID— NOTinput.provider.info.id. The@opencode-ai/pluginProviderContexttype claims.info.idexists, but the runtime shape opencode passes (seeresearch/opencode/packages/opencode/src/session/llm.ts::stream, ~line 168,provider: item) is the flatProviderConfig(.id).input.model.providerIDis what every first-party plugin uses (cloudflare.ts, codex.ts, github-copilot/copilot.ts) and it avoids the runtime crash "undefined is not an object (evaluating 'input.provider.info.id')". Tested live 2026-04-17. -
prompt_cache_keyonly forkimi-for-coding. Never attach it to unrelated models. The check isinput.model.id === MODEL_IDinchat.params. -
Wire model id comes from
/coding/v1/models, not from user config. The opencode-side model id is a stable alias (MODEL_ID = "kimi-for-coding"); the plugin callsGET /coding/v1/modelsat login and on every token refresh (mirroring kimi-cli'srefresh_managed_modelsinresearch/kimi-cli/src/kimi_cli/auth/platforms.py), caches the first returned{id, context_length, display_name}inauth.jsonasmodel_id/context_length/model_display, and rewrites the JSON bodymodelfield insideloader.fetchwhenever the cached id differs fromMODEL_ID(the K2.5 case — server may returnk2p5instead). K2.6 accounts seeid: "kimi-for-coding"and the rewrite is a no-op. Do not strip thekimi-prefix; send whatever the server returned. Discovery failures are non-fatal (stale cached id still works; 401 retry flushes broken tokens). -
Auth store is opencode's, not kimi-cli's. We use
client.auth.get/setagainst thekimi-for-coding-oauthprovider id. Do not read/write~/.kimi/credentials/kimi-code.json; that's kimi-cli's file and sharing it across independent apps causes token-race bugs. -
Provider id must not collide with any id in the models.dev catalog. models.dev publishes
kimi-for-coding(staticKIMI_API_KEY→@ai-sdk/anthropic→ K2.5). If we registered under that same id,opencode auth login kimi-for-codingwould surface two methods under one entry and users picking the API-key one would silently land on K2.5. We deliberately usekimi-for-coding-oauthinstead;MODEL_IDon the wire stayskimi-for-coding(rule 6). -
src/index.tsmust have exactly one export — the default plugin function. opencode's plugin loader (research/opencode/packages/opencode/src/plugin/index.ts→getLegacyPlugins) iterates every export of the plugin module and throwsPlugin export is not a functionif any named export is not callable. The failure mode is silent in the CLI (the provider just doesn't appear inopencode auth login); the error only surfaces in~/.local/share/opencode/log/*.log. Keep constants insrc/constants.tsand import them insrc/index.tsrather than re-exporting.test/exports.test.tsguards this.
Working on this repo
- Code style: see
tsconfig.json(strict,noUncheckedIndexedAccess, ES2022). Prefer small pure functions, avoidtry/catchexcept where we genuinely convert one error shape to another. - Comments: match the existing density — only explain non-obvious upstream-parity reasoning. Do not narrate the obvious ("// refresh the token"); instead reference upstream files when the reasoning is "because kimi-cli does it that way".
- Dependencies: runtime deps stay at zero. The only dev/peer dep is
@opencode-ai/pluginfor types. - Git commits: small, logical, imperative subject ("Add oauth device flow"). Do not add a
Co-authored-bytrailer. - Upstream research: the
research/directory is a read-only git-ignored pair of shallow clones (opencode + kimi-cli) for grep. Never edit files there; re-clone if you suspect drift. When citing upstream in a comment, use theresearch/…path so the reference is resolvable. - Version bumps: when kimi-cli bumps, (1) pull a fresh
research/kimi-cli, (2) updateKIMI_CLI_VERSIONinsrc/constants.ts, (3) re-diff_kimi_default_headers()/oauth.pyagainstsrc/headers.tsandsrc/oauth.ts, (4) smoke-test withopencode auth login kimi-for-coding-oauthand a one-turn chat, (5) tag release. - Tests:
test/holds one file per source file plustest/exports.test.ts(the rule-9 guard). Tests mockfetchviatest/_util/fetchMock.ts; no real credentials or network. They use the real~/.kimi/device_idon purpose — it is shared with kimi-cli by design andgetDeviceIdis idempotent, so tests don't clobber state. When adding a new contract to the list above, add the matching offline check to the corresponding test file rather than creating new ones.
What not to do
- ❌ Don't add heuristics that look at the model id outside of
chat.params. Theauth.loaderfetch is already scoped to this provider; the only place that needs to match onkimi-for-codingis the params hook. - ❌ Don't rename the provider id back to
kimi-for-codingor to anything else listed in models.dev. See rule 8. - ❌ Don't add new header values that kimi-cli doesn't send. The fingerprint matters.
- ❌ Don't call out to other files to "share" the kimi-cli credentials. Different OAuth consumers must have independent refresh-token chains or one will invalidate the other.
- ❌ Don't introduce a build step. The plugin ships as
.tsand opencode's bun-based loader handles it. - ❌ Don't add tests that require real Kimi credentials and check them in. If you add offline unit tests, put them under
test/and mockfetch. - ❌ Don't add named exports to
src/index.ts. See rule 9.
How to verify a change
Offline:
bunx tsc --noEmit # type-check
bun build --target=node --no-bundle src/index.ts # syntax check
bun test # offline unit tests
Online (requires a real Kimi-for-coding account):
cd ~/.opencode && bun add /path/to/this/repo- Paste the provider block from
README.mdinto your opencode config. opencode auth login kimi-for-coding-oauth— confirm a token lands in opencode'sauth.jsonwithtype: "oauth", a JWTaccess, andexpires~15 min in the future.- Start opencode, select
kimi-for-coding-oauth/kimi-for-coding, and ask the model to self-identify. It should claim to be K2.6 /kimi-for-coding. - Confirm
reasoning_contentdeltas render as thinking content (not assistant text). - In a second turn of the same session, confirm the response comes back faster (cache hit via
prompt_cache_key).
If any of 3–6 fails, diff research/kimi-cli against the contracts above.
House rules for AI agents
- Read this file first. Every time.
- Don't grow the dependency footprint to "simplify" something; this plugin's value is being small and audit-able.
- When in doubt, mirror kimi-cli exactly, then comment the upstream reference. "We used to deviate, it broke" — document it here.
- Keep
README.mduser-focused and this file contributor-focused. If you catch yourself duplicating, move content here and link from the README. - Any new rule you add here must have a real incident or a grep-verified upstream source behind it. No speculative "best practices".