Commit Graph

4644 Commits

Author SHA1 Message Date
Michael Neale b3be0935f7 fix(model): recognize version-first Claude 4+ ids in max_output_tokens floor
The Claude 4+ fallback floor in max_output_tokens() only matched
family-first ids (claude-opus-4-8). Version-first ids used by
Snowflake/Databricks (claude-4-sonnet, claude-4-opus, claude-4-5-sonnet,
goose-claude-4-sonnet-bedrock) did not match, so a version-first Claude 4
model not yet in the canonical table would still fall back to 4096 and
truncate tool calls.

Extend the regex to match the major version in either ordering. Older
3.x ids in either ordering keep the conservative 4096 default. Added
version-first cases to both the positive and negative tests.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
2026-05-31 14:46:00 +10:00
Michael Neale eca5943865 fix(canonical): pick newest snapshot when model ids collapse to one key
The generator strips date suffixes when building canonical ids, so
models.dev entries like gpt-4o, gpt-4o-2024-05-13, gpt-4o-2024-08-06
and gpt-4o-2024-11-20 all collapse onto openai/gpt-4o. Registration
was last-write-wins over models.dev's (unstable) listing order, so a
regeneration could land on the 2024-05-13 snapshot and drop
limit.output from 16384 to 4096 — capping gpt-4o max_tokens and
truncating output/tool calls.

The builder now keeps the entry with the newest release_date on a
canonical-key collision, so the result is deterministic regardless of
upstream ordering. Regenerated the registry: openai/gpt-4o is back to
16384 (2024-11-20) and azure/gpt-3.5-turbo to 16384, with no native
provider output-limit downgrades versus main.

This supersedes the manual data edit in the previous commit with a
fix in the generator itself, so future release regenerations stay
correct.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
2026-05-31 10:56:55 +10:00
Michael Neale 8a568aeacc fix(canonical): restore GPT-4o and Azure GPT-3.5-turbo output limits
Address codex review feedback on #9484 (P2).

The bulk canonical regeneration in this branch flipped the bare
`openai/gpt-4o` alias from the 2024-08-06 snapshot to the older
2024-05-13 snapshot, dropping limit.output from 16384 to 4096 (and
reverting pricing). Since ModelConfig::with_canonical_limits copies
limit.output into max_tokens when GOOSE_MAX_TOKENS is unset, ordinary
gpt-4o responses would be capped at 4096 and truncate output/tool calls.

Restore openai/gpt-4o to the current 2024-08-06 values (output 16384,
input 2.5 / output 10.0 / cache_read 1.25). Apply the same fix to
azure/gpt-3.5-turbo, which the regeneration regressed the same way
(0613 -> 0301 snapshot, context/output 16384 -> 4096).

These were the only native-provider (openai/anthropic/google/azure)
output-limit downgrades introduced by the regeneration; remaining
changes are upstream drift on third-party router prefixes.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
2026-05-31 10:30:04 +10:00
Michael Neale 168d45d2ab fix(anthropic): legacy budget maps adaptive-only models to adaptive
Address codex review (new P1) on #9484:

thinking_type() returned Enabled whenever only a legacy
ANTHROPIC_THINKING_BUDGET / CLAUDE_THINKING_BUDGET was set, before
checking is_adaptive_model. For adaptive-only Opus 4.7/4.8 that
produced thinking: {"type":"enabled","budget_tokens":...}, which
Anthropic rejects with HTTP 400. Now the legacy-budget path maps
adaptive models to the adaptive payload while non-adaptive models
keep the enabled/budget behavior.

Adds a test covering the legacy-budget-without-effort path.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
2026-05-30 16:12:42 +10:00
Michael Neale 2dab4645e3 fix(anthropic): adaptive thinking + omit sampling params for Opus 4.7/4.8
Address codex review feedback on #9484:

- P1: Extend supports_adaptive_thinking() to match Opus 4.7 and 4.8.
  These models accept adaptive thinking only and reject manual
  budget_tokens thinking with HTTP 400, so reasoning requests
  (GOOSE_THINKING_EFFORT / legacy CLAUDE_THINKING_TYPE=adaptive)
  now send the adaptive payload.

- P2: Add rejects_sampling_params() and omit temperature for Opus
  4.7/4.8, whose Messages API rejects non-default sampling params
  (temperature/top_p/top_k) with HTTP 400.

Adds tests covering adaptive thinking and temperature suppression
for the new models.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
2026-05-30 15:25:47 +10:00
Michael Neale 23aa9b09b4 fix(model): raise default max_output_tokens floor for Claude 4+
Two related fixes so newly-released Claude models don't silently truncate
responses at 4096 output tokens (which manifests as the model 'stopping
mid tool-call' from the user's perspective):

1. Refresh the bundled canonical model registry from models.dev. This
   adds claude-opus-4-7, claude-opus-4-8, and bumps the registry from
   4740 to 4965 models. Generated by:

       cargo run --bin build_canonical_models

   With the refreshed registry, claude-opus-4-8 now resolves through
   strip_version_suffix's existing '-N-M' -> '-N.M' normalization to the
   canonical 'anthropic/claude-opus-4.8' entry (1M context, 128k output).

2. Add a Claude-4+ family floor to ModelConfig::max_output_tokens(). When
   GOOSE_MAX_TOKENS is unset and the canonical lookup doesn't populate
   max_tokens (e.g. a brand-new model id one release ahead of the bundled
   registry), we previously fell back to a hard 4096. That's a cliff:
   every Claude 4.x model supports at least 32k output, and capping at
   4096 cuts off tool_use blocks mid-stream, which agents experience as
   the model 'not making tool calls'.

   The new floor is 32k for Claude 4+ family ids (matches the lowest
   real 4.x output cap). Claude 3.x and non-Claude models keep the
   existing 4096 default since their real caps are genuinely small.

Also adds claude-opus-4-7 and claude-opus-4-8 to ANTHROPIC_KNOWN_MODELS
for UI/discovery.

Tests:
- canonical_lookup_resolves_dash_to_dot_for_claude_4_x: verifies
  claude-opus-4-{1,5,6,7,8} all resolve to the canonical entry and
  get a real (>4096) output cap.
- claude_4_family_falls_back_to_32k_not_4096: verifies the family
  floor for unknown future ids (claude-opus-4-9, anthropic/claude-opus-4.8,
  databricks-claude-opus-4.6, etc).
- claude_3_and_non_claude_keep_4096_default: regression guard.
- explicit_max_tokens_always_wins: GOOSE_MAX_TOKENS still overrides.

Also updates sets_limits_from_canonical_model to assert shape rather
than a hardcoded gpt-4o output number (16384 -> 4096 in models.dev),
so the test stops breaking every registry refresh.
2026-05-29 17:02:07 +10:00
Bradley Axen 25ff547487 Expose raw provider supported models over ACP (#9475)
Signed-off-by: Bradley Axen <baxen@squareup.com>
Signed-off-by: Matt Toohey <contact@matttoohey.com>
Co-authored-by: Matt Toohey <contact@matttoohey.com>
2026-05-29 02:18:42 +00:00
Matt Toohey a3bdb918e7 fix(acp): forward ACP server context window size to clients (#9455)
Signed-off-by: Matt Toohey <contact@matttoohey.com>
2026-05-29 01:05:45 +00:00
Bradley Axen 104cc17758 Add ACP session system prompt setter (#9478)
Signed-off-by: Bradley Axen <baxen@squareup.com>
2026-05-29 00:35:12 +00:00
Mark Lavercombe 1cb5cb06a3 Add Scholar Sidekick MCP extension (#9433) 2026-05-28 14:15:41 +00:00
Rodolfo Olivieri 2116f88908 feat: add tui feature flag to gate the tui command (#9428)
Signed-off-by: Rodolfo Olivieri <rolivier@redhat.com>
2026-05-28 14:12:31 +00:00
Jack Amadeo d10d009b97 CLI to list skills with token counts (#9326) 2026-05-28 14:09:11 +00:00
Quentin Champenois 27d68ba636 doc: Add Scaleway provider (#9423)
Co-authored-by: Quentin Champenois <qchampenois@scaleway.com>
2026-05-28 10:07:08 -04:00
Alex Hancock 9c403b1560 refactor: convert desktop v1 and goose-server extensions to ACP+ (#9448) 2026-05-27 20:59:48 +00:00
dependabot[bot] c9945bca5d chore(deps): bump sha2 from 0.10.9 to 0.11.0 (#8963)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jack Amadeo <jackamadeo@squareup.com>
2026-05-27 19:08:54 +00:00
James Liounis e9b0d9247b feat(providers): add Perplexity as a declarative OpenAI-compatible provider (#9324) 2026-05-27 19:07:03 +00:00
Jeremy Dawes 4c88f4b91c feat(providers): add Alibaba (Qwen via DashScope) declarative provider (#9443)
Signed-off-by: Jeremy Dawes <jeremy@jezweb.net>
2026-05-27 18:45:48 +00:00
Asish Kumar a18b92e62d fix(desktop): refresh provider list in Switch Models picker (#9408)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
2026-05-27 18:38:14 +00:00
Michael Neale 35d1fc7c51 fix(desktop): start new chat in current window from recipe param modal (#9422)
Signed-off-by: Michael Neale <micn@block.xyz>
2026-05-27 18:35:17 +00:00
Douwe Osinga 1125e8dd54 fix: make azure api-version query param optional (#9221)
Signed-off-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2026-05-27 18:32:19 +00:00
Douwe Osinga 10ac6b18c9 feat: make tool output size limit configurable via GOOSE_MAX_TOOL_RESPONSE_SIZE (#9256)
Signed-off-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2026-05-27 18:31:22 +00:00
UGBOMEH OGOCHUKWU WILLIAMS 4f43ae4cd0 fix(ui): preserve pending env vars in Add Extension form (#9285)
Signed-off-by: UGBOMEH OGOCHUKWU WILLIAMS <williamsugbomeh@gmail.com>
Signed-off-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2026-05-27 18:26:35 +00:00
Angie Jones d017295f32 fix: tolerate missing responses output (#9449)
Signed-off-by: Angie Jones <jones.angie@gmail.com>
2026-05-27 18:05:00 +00:00
jh-block 27b41d93f5 local inference: stricter GGUF requirements, auto detection of tool calling support, fixed thinking output parsing (#9442)
Signed-off-by: jh-block <jhugo@block.xyz>
2026-05-27 18:00:30 +00:00
Bradley Axen d90b349a69 feat: add /model slash command to CLI for session model switching (#8747)
Signed-off-by: Bradley Axen <baxen@squareup.com>
Signed-off-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2026-05-27 13:38:59 +00:00
88plug 794402d932 fix(ci): build linux x86_64 standard inside manylinux_2_28 for glibc 2.28+ compat (#9415)
Signed-off-by: Andrew Mello <andrew@88plug.com>
Co-authored-by: Alex Hancock <alex@alexhancock.com>
Co-authored-by: jh-block <255854896+jh-block@users.noreply.github.com>
2026-05-27 12:58:46 +00:00
Bradley Axen 17493540e1 Prefer goose aliases for Databricks v2 inventory (#9430)
Signed-off-by: Bradley Axen <baxen@squareup.com>
2026-05-27 01:32:32 +00:00
Bradley Axen 7dc904e1e2 add databricks ai gateway provider (#9274)
Signed-off-by: Bradley Axen <baxen@squareup.com>
2026-05-26 22:45:01 +00:00
github-actions[bot] b0cd61aa42 chore(release): bump version to 1.36.0 (minor) (#9417)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-26 18:02:17 +00:00
Dmitry Beskov b332f509b2 Russian language support (#9406)
Co-authored-by: Jack Amadeo <jackamadeo@squareup.com>
2026-05-26 17:46:23 +00:00
Asish Kumar dcdc7f645b fix(desktop): stop the main window growing taller on every launch (#9409)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
2026-05-26 16:19:22 +00:00
jh-block 27dc0d5f83 Improve dependency hygiene (#9360)
Signed-off-by: jh-block <jhugo@block.xyz>
2026-05-26 16:09:03 +00:00
seneroner77-cmd bf0da953d5 Add Turkish desktop locale (#9392)
Signed-off-by: dejavu <dejavu@Mac.home>
Co-authored-by: dejavu <dejavu@Mac.home>
2026-05-26 15:53:19 +00:00
dependabot[bot] 6d544e7b55 chore(deps): bump qs from 6.14.2 to 6.15.2 in /evals/open-model-gym/mcp-harness (#9395)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-26 15:52:38 +00:00
Douwe Osinga a11843a1e8 Simplify UI customization (#9353)
Signed-off-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2026-05-26 13:49:18 +00:00
Angie Jones ba16de9738 docs: stats update (#9410) 2026-05-25 02:38:39 +00:00
Douwe Osinga e1cc44f7ec Build summon instructions per turn (#9329)
Signed-off-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2026-05-25 00:55:58 +00:00
Angie Jones c4d64d1a83 Fix desktop chat search session limiting (#9366)
Signed-off-by: Angie Jones <jones.angie@gmail.com>
2026-05-24 23:33:44 +00:00
fre$h ce004f7475 fix(agents): serialize per-session agent creation to stop duplicate MCP init (#9357)
Signed-off-by: fresh3nough <anonwurcod@proton.me>
2026-05-23 18:42:32 +00:00
dependabot[bot] 8689fdf33f chore(deps): bump image from 0.24.9 to 0.25.10 (#9383)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-23 18:40:20 +00:00
dependabot[bot] d625e58215 chore(deps): bump agent-client-protocol from 0.11.1 to 0.12.1 (#9381)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jack Amadeo <jackamadeo@squareup.com>
2026-05-23 18:40:13 +00:00
dependabot[bot] 728d72a79f chore(deps): bump ctor from 0.2.9 to 1.0.6 (#9380)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jack Amadeo <jackamadeo@squareup.com>
2026-05-22 22:13:15 +00:00
dependabot[bot] cbe5d93caa chore(deps): bump strum from 0.27.2 to 0.28.0 (#9384)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:13:03 +00:00
dependabot[bot] 141e350ab8 chore(deps): bump lru from 0.16.3 to 0.18.0 (#9382)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:12:35 +00:00
dependabot[bot] 656ae04d97 chore(deps): bump shlex from 1.3.0 to 2.0.1 (#9379)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:11:59 +00:00
dependabot[bot] a7d46251a0 chore(deps): bump sigstore-verify from 0.6.6 to 0.8.0 (#9378)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:11:10 +00:00
dependabot[bot] 9185be5f94 chore(deps): bump clap_mangen from 0.2.33 to 0.3.0 (#9377)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:10:58 +00:00
dependabot[bot] 61252fc4fd chore(deps): bump the cargo-minor-and-patch group with 12 updates (#9376)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:09:13 +00:00
dependabot[bot] 221ccfff14 chore(deps): bump qs and express in /documentation (#9375)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:08:10 +00:00
dependabot[bot] 87d165d574 chore(deps): bump docker/build-push-action from 6.18.0 to 7.2.0 (#9374)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-22 21:07:44 +00:00