Backfill Kimi image input capability

2026-06-02 06:14:16 +02:00 · 2026-04-23 01:28:44 -04:00
parent a40a340dd2
commit 5efc6c39a3
5 changed files with 195 additions and 22 deletions
@@ -69,6 +69,7 @@ These are the invariants that, if broken, silently route requests onto the wrong
 9. **`src/index.ts` must have exactly one export — the default `PluginModule` object `{ id, server }`.** opencode's plugin loader (`research/opencode/packages/opencode/src/plugin/index.ts`) first tries `readV1Plugin` (detect mode) on the default export. If it finds an object with `server` (and optional `id`), it uses the v1 path directly. The older legacy path (`getLegacyPlugins`) iterates every export and throws `Plugin export is not a function` on any non-callable value — a problem that surfaced on Windows where Bun's standalone-binary dynamic imports can produce module namespace objects with unexpected non-function metadata. The v1 format bypasses `getLegacyPlugins` entirely. Keep constants in `src/constants.ts` and import them in `src/index.ts` rather than re-exporting. `test/exports.test.ts` guards this. The failure mode of a broken export is silent in the CLI (the provider just doesn't appear in `opencode auth login`); the error only surfaces in `~/.local/share/opencode/log/*.log`.
 10. **The post-login config hint must not emit a partial `limit` object.** opencode's live config schema at `https://opencode.ai/config.json` requires both `limit.context` and `limit.output` whenever `limit` is present, while Kimi's `GET /coding/v1/models` only gives us `context_length`. Therefore `buildConfigBlock()` omits `limit` entirely and leaves `provider.models` to backfill `limit.context` at runtime. Do not invent `output` or set `input` heuristically; opencode's overflow logic treats `limit.input` as authoritative (`research/opencode/packages/opencode/src/session/overflow.ts`).
 11. **Concurrent refreshes must collapse to one in-flight OAuth exchange, even across plugin instances.** `provider.models` and `auth.loader` can both notice an expiring token at about the same time, and separate opencode workspace/plugin instances can inherit stale auth snapshots. `refreshAuth()` in `src/index.ts` therefore shares one promise across overlapping callers, takes a provider-scoped auth-store lock before refreshing, re-reads opencode's live auth-store entry under that lock, and treats a changed on-disk token chain as authoritative. `test/plugin.test.ts` covers loader-vs-loader, provider.models-vs-loader, cross-instance lock reuse, and the `invalid_grant` self-heal path where another process already rotated the refresh token.
+12. **Image-input capability must be backfilled from `/coding/v1/models`.** `supports_image_in` from Kimi discovery is not cosmetic metadata: opencode's provider transform (`research/opencode/packages/opencode/src/provider/transform.ts::unsupportedParts`) rewrites every image part into local `ERROR: Cannot read ... (this model does not support image input)` text before the request reaches our loader when `capabilities.input.image` is false. Therefore `provider.models` must patch runtime model metadata for `kimi-for-coding`, and `buildConfigBlock()` must include `attachment: true` plus `modalities.input = ["text","image"]` / `modalities.output = ["text"]` when discovery says images are supported. `test/plugin.test.ts` covers both paths.

 ### Working on this repo

@@ -9,7 +9,7 @@ Compared with stock opencode Kimi setups, this plugin:
 - sends the same `User-Agent` / `X-Msh-*` fingerprint headers as `kimi-cli`
 - reuses `~/.kimi/device_id` for `X-Msh-Device-Id`
 - adds `prompt_cache_key`, `thinking`, and `reasoning_effort` for `kimi-for-coding` requests
- discovers the authoritative wire model slug, API display name, and context length from `/coding/v1/models`
+- discovers the authoritative wire model slug, API display name, context length, and image-input capability from `/coding/v1/models`
 - keeps tokens in opencode's auth store while mirroring `kimi-cli`'s refresh / retry behavior

 That is the value of using this plugin instead of a plain opencode provider entry: it preserves the Kimi-only OAuth path, fingerprint, and request extensions that the generic route does not.
@@ -123,7 +123,7 @@ During login the plugin:

 - shows a verification URL and user code
 - stores the OAuth token in opencode's auth store
- discovers the exact model slug, display name, and context length your account should send to Kimi
+- discovers the exact model slug, display name, context length, and image-input capability your account should send to Kimi
 - prints a config hint that uses the discovered display name and leaves context backfill to runtime metadata discovery

 Access tokens refresh automatically while you use the model.
@@ -152,6 +152,7 @@ Fastest fix:
 <summary><strong>Login and refresh details</strong></summary>

 - The plugin queries `/coding/v1/models` during login so it can discover the current wire model id and context length for your account.
+- The plugin also uses that discovery response to backfill image-input support into opencode's runtime model metadata, so pasted or dropped images reach Kimi instead of being downgraded into local error text.
 - The printed config hint intentionally omits `limit`, because opencode requires both `limit.context` and `limit.output`, while Kimi's models endpoint only exposes `context_length`.
 - Model discovery runs again on every token refresh, and a fresh loader instance can re-query `/coding/v1/models` on first use if it needs the current wire model id.
 - On a `401`, the loader refreshes the access token once and retries the request once.
@@ -1,6 +1,6 @@
 {
  "name": "opencode-kimi-full",
-  "version": "1.2.8",
+  "version": "1.2.9",
  "description": "OpenCode plugin that brings the official Kimi OAuth device flow and Kimi-specific coding request fields to opencode, matching upstream kimi-cli.",
  "license": "MIT",
  "repository": {
@@ -26,6 +26,7 @@ type ModelDiscovery = {
  model_id?: string
  context_length?: number
  model_display?: string
+  supports_image_in?: boolean
 }

 type ThinkingType = "enabled" | "disabled"
@@ -38,9 +39,20 @@ type KimiBodyFields = {

 type ModelWithDiscoveryMetadata = {
  name?: string
+  attachment?: boolean
  limit?: {
    context?: number
  }
+  modalities?: {
+    input?: string[]
+    output?: string[]
+  }
+  capabilities?: {
+    attachment?: boolean
+    input?: {
+      image?: boolean
+    }
+  }
 }

 type KimiHookInput = {
@@ -252,6 +264,7 @@ function pickModelInfo(models: KimiModelInfo[]): ModelDiscovery {
    model_id: picked.id,
    context_length: picked.context_length,
    model_display: picked.display_name,
+    supports_image_in: picked.supports_image_in,
  }
 }

@@ -275,10 +288,89 @@ function withDiscoveredDisplayName<T extends ModelWithDiscoveryMetadata>(model:
  }
 }

+function sameStrings(left: string[] | undefined, right: string[] | undefined) {
+  if (left === right) return true
+  if (!left || !right) return false
+  if (left.length !== right.length) return false
+  return left.every((value, index) => value === right[index])
+}
+
+function uniqueStrings(values: string[]) {
+  return [...new Set(values)]
+}
+
+function withDiscoveredImageInput<T extends ModelWithDiscoveryMetadata>(model: T, supportsImageIn: boolean | undefined): T {
+  if (supportsImageIn === undefined) return model
+
+  let changed = false
+  let nextAttachment = model.attachment
+  let nextModalities = model.modalities
+  let nextCapabilities = model.capabilities
+
+  if (supportsImageIn && model.attachment !== true) {
+    nextAttachment = true
+    changed = true
+  }
+
+  const currentInputModalities = model.modalities?.input
+  const currentOutputModalities = model.modalities?.output
+  const shouldPatchModalities = supportsImageIn || currentInputModalities?.includes("image") === true
+  if (shouldPatchModalities) {
+    const nextInputModalities = uniqueStrings([
+      "text",
+      ...(currentInputModalities ?? []),
+      ...(supportsImageIn ? ["image"] : []),
+    ]).filter((value) => value !== "image" || supportsImageIn)
+    const nextOutputModalities = uniqueStrings(["text", ...(currentOutputModalities ?? [])])
+    if (
+      !sameStrings(currentInputModalities, nextInputModalities) ||
+      !sameStrings(currentOutputModalities, nextOutputModalities)
+    ) {
+      nextModalities = {
+        ...model.modalities,
+        input: nextInputModalities,
+        output: nextOutputModalities,
+      }
+      changed = true
+    }
+  }
+
+  const currentCapabilityImage = model.capabilities?.input?.image
+  const currentCapabilityAttachment = model.capabilities?.attachment
+  if (currentCapabilityImage !== undefined && currentCapabilityImage !== supportsImageIn) {
+    nextCapabilities = {
+      ...nextCapabilities,
+      input: {
+        ...nextCapabilities?.input,
+        image: supportsImageIn,
+      },
+    }
+    changed = true
+  }
+  if (supportsImageIn && currentCapabilityAttachment !== undefined && currentCapabilityAttachment !== true) {
+    nextCapabilities = {
+      ...nextCapabilities,
+      attachment: true,
+    }
+    changed = true
+  }
+
+  if (!changed) return model
+  return {
+    ...model,
+    ...(nextAttachment === undefined ? {} : { attachment: nextAttachment }),
+    ...(nextModalities ? { modalities: nextModalities } : {}),
+    ...(nextCapabilities ? { capabilities: nextCapabilities } : {}),
+  }
+}
+
 function applyDiscoveryToModels<T extends Record<string, ModelWithDiscoveryMetadata>>(models: T, discovery: ModelDiscovery): T {
  const current = models[MODEL_ID]
  if (!current) return models
-  const next = withDiscoveredContext(withDiscoveredDisplayName(current, discovery.model_display), discovery.context_length)
+  const next = withDiscoveredImageInput(
+    withDiscoveredContext(withDiscoveredDisplayName(current, discovery.model_display), discovery.context_length),
+    discovery.supports_image_in,
+  )
  if (next === current) return models
  return {
    ...models,
@@ -286,7 +378,7 @@ function applyDiscoveryToModels<T extends Record<string, ModelWithDiscoveryMetad
  }
 }

-function buildConfigBlock(info: { model_id: string; display?: string }) {
+function buildConfigBlock(info: { model_id: string; display?: string; supports_image_in?: boolean }) {
  const name = info.display ?? "Kimi For Coding"
  // The opencode-side model key is always MODEL_ID ("kimi-for-coding"); the
  // plugin rewrites the wire `model` body field to `info.model_id` inside
@@ -297,6 +389,29 @@ function buildConfigBlock(info: { model_id: string; display?: string }) {
  // `limit.output` whenever a `limit` object is present, but Kimi's
  // `/coding/v1/models` discovery only tells us `context_length`. The
  // provider.models hook backfills `limit.context` at runtime.
+  const modelConfig: Record<string, unknown> = {
+    name,
+    reasoning: true,
+    options: {},
+    variants: {
+      off: { reasoning_effort: "off" },
+      auto: { reasoning_effort: "auto" },
+      low: { reasoning_effort: "low" },
+      medium: { reasoning_effort: "medium" },
+      high: { reasoning_effort: "high" },
+    },
+  }
+  if (info.supports_image_in) {
+    // opencode's provider transform gates image parts on model metadata
+    // before the request reaches our loader. Mirror Kimi's discovered
+    // capability here so pasted images survive into the upstream SDK.
+    modelConfig.attachment = true
+    modelConfig.modalities = {
+      input: ["text", "image"],
+      output: ["text"],
+    }
+  }
+
  return JSON.stringify(
    {
      provider: {
@@ -305,18 +420,7 @@ function buildConfigBlock(info: { model_id: string; display?: string }) {
          name: "Kimi For Coding (OAuth)",
          options: { baseURL: API_BASE_URL },
          models: {
-            [MODEL_ID]: {
-              name,
-              reasoning: true,
-              options: {},
-              variants: {
-                off: { reasoning_effort: "off" },
-                auto: { reasoning_effort: "auto" },
-                low: { reasoning_effort: "low" },
-                medium: { reasoning_effort: "medium" },
-                high: { reasoning_effort: "high" },
-              },
-            },
+            [MODEL_ID]: modelConfig,
          },
        },
      },
@@ -635,6 +739,7 @@ const plugin: Plugin = async ({ client }) => {
                      const block = buildConfigBlock({
                        model_id: discovered.model_id,
                        display: discovered.model_display,
+                        supports_image_in: discovered.supports_image_in,
                      })
                      console.log(
                        `\n✓ Authorized for Kimi For Coding (model: ${discovered.model_id}${
@@ -292,19 +292,55 @@ function makeProviderState(context = 0) {
    id: PROVIDER_ID,
    models: {
      [MODEL_ID]: {
+        id: MODEL_ID,
+        providerID: PROVIDER_ID,
+        api: {
+          id: MODEL_ID,
+          npm: "@ai-sdk/openai-compatible",
+          url: "https://api.kimi.com/coding/v1",
+        },
+        status: "active",
+        headers: {},
        name: "Kimi For Coding",
-        reasoning: true,
        options: {},
-        limit: { context },
+        cost: { input: 0, output: 0, cache: { read: 0, write: 0 } },
+        limit: { context, output: 8192 },
+        capabilities: {
+          temperature: false,
+          reasoning: true,
+          attachment: false,
+          toolcall: true,
+          input: { text: true, audio: false, image: false, video: false, pdf: false },
+          output: { text: true, audio: false, image: false, video: false, pdf: false },
+          interleaved: false,
+        },
        variants: {
          auto: { reasoning_effort: "auto" },
        },
      },
      "some-other-model": {
+        id: "some-other-model",
+        providerID: PROVIDER_ID,
+        api: {
+          id: "some-other-model",
+          npm: "@ai-sdk/openai-compatible",
+          url: "https://api.kimi.com/coding/v1",
+        },
+        status: "active",
+        headers: {},
        name: "Other",
-        reasoning: false,
        options: {},
-        limit: { context: 1234 },
+        cost: { input: 0, output: 0, cache: { read: 0, write: 0 } },
+        limit: { context: 1234, output: 4096 },
+        capabilities: {
+          temperature: false,
+          reasoning: false,
+          attachment: false,
+          toolcall: true,
+          input: { text: true, audio: false, image: false, video: false, pdf: false },
+          output: { text: true, audio: false, image: false, video: false, pdf: false },
+          interleaved: false,
+        },
      },
    },
  }
@@ -343,6 +379,25 @@ test("provider.models: surfaces discovered display_name in runtime model metadat
  expect(provider.models[MODEL_ID]!.name).toBe("Kimi For Coding")
 })

+test("provider.models: surfaces discovered image input capability so opencode does not strip images", async () => {
+  mock = installFetchMock((call) => {
+    if (call.url.endsWith("/coding/v1/models")) {
+      return {
+        body: {
+          data: [{ id: MODEL_ID, display_name: "Kimi Code", context_length: 262144, supports_image_in: true }],
+        },
+      }
+    }
+    return { body: { ok: true } }
+  })
+  const { hooks } = await getHooks()
+  const provider = makeProviderState()
+  const next = await hooks.provider!.models!(provider as any, { auth: validAuth() } as any)
+  expect(next[MODEL_ID]!.capabilities.input.image).toBe(true)
+  expect(next[MODEL_ID]!.capabilities.attachment).toBe(true)
+  expect(provider.models[MODEL_ID]!.capabilities.input.image).toBe(false)
+})
+
 test("provider.models: preserves an explicit user context limit", async () => {
  mock = installFetchMock((call) => {
    if (call.url.endsWith("/coding/v1/models")) {
@@ -946,7 +1001,11 @@ test("auth callback prints a schema-valid config snippet with top-level model va
      }
    }
    if (call.url.endsWith("/coding/v1/models")) {
-      return { body: { data: [{ id: "kimi-for-coding", display_name: "Kimi Code", context_length: 262144 }] } }
+      return {
+        body: {
+          data: [{ id: "kimi-for-coding", display_name: "Kimi Code", context_length: 262144, supports_image_in: true }],
+        },
+      }
    }
    return { body: { access_token: "A", refresh_token: "R", token_type: "Bearer", expires_in: 900 } }
  })
@@ -974,7 +1033,9 @@ test("auth callback prints a schema-valid config snippet with top-level model va
      [key: string]: {
        models: {
          [key: string]: {
+            attachment?: boolean
            limit?: { context?: number }
+            modalities?: { input?: string[]; output?: string[] }
            options?: Record<string, unknown>
            variants?: Record<string, { reasoning_effort?: string }>
          }
@@ -984,7 +1045,12 @@ test("auth callback prints a schema-valid config snippet with top-level model va
  }
  const model = parsed.provider[PROVIDER_ID]!.models[MODEL_ID]!
  expect(text).toContain("context 262144")
+  expect(model.attachment).toBe(true)
  expect(model.limit).toBeUndefined()
+  expect(model.modalities).toEqual({
+    input: ["text", "image"],
+    output: ["text"],
+  })
  expect(model.options).toEqual({})
  expect(model.variants?.off).toEqual({ reasoning_effort: "off" })
  expect(model.variants?.auto).toEqual({ reasoning_effort: "auto" })