The Claude 4+ fallback floor in max_output_tokens() only matched
family-first ids (claude-opus-4-8). Version-first ids used by
Snowflake/Databricks (claude-4-sonnet, claude-4-opus, claude-4-5-sonnet,
goose-claude-4-sonnet-bedrock) did not match, so a version-first Claude 4
model not yet in the canonical table would still fall back to 4096 and
truncate tool calls.
Extend the regex to match the major version in either ordering. Older
3.x ids in either ordering keep the conservative 4096 default. Added
version-first cases to both the positive and negative tests.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
The generator strips date suffixes when building canonical ids, so
models.dev entries like gpt-4o, gpt-4o-2024-05-13, gpt-4o-2024-08-06
and gpt-4o-2024-11-20 all collapse onto openai/gpt-4o. Registration
was last-write-wins over models.dev's (unstable) listing order, so a
regeneration could land on the 2024-05-13 snapshot and drop
limit.output from 16384 to 4096 — capping gpt-4o max_tokens and
truncating output/tool calls.
The builder now keeps the entry with the newest release_date on a
canonical-key collision, so the result is deterministic regardless of
upstream ordering. Regenerated the registry: openai/gpt-4o is back to
16384 (2024-11-20) and azure/gpt-3.5-turbo to 16384, with no native
provider output-limit downgrades versus main.
This supersedes the manual data edit in the previous commit with a
fix in the generator itself, so future release regenerations stay
correct.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Address codex review feedback on #9484 (P2).
The bulk canonical regeneration in this branch flipped the bare
`openai/gpt-4o` alias from the 2024-08-06 snapshot to the older
2024-05-13 snapshot, dropping limit.output from 16384 to 4096 (and
reverting pricing). Since ModelConfig::with_canonical_limits copies
limit.output into max_tokens when GOOSE_MAX_TOKENS is unset, ordinary
gpt-4o responses would be capped at 4096 and truncate output/tool calls.
Restore openai/gpt-4o to the current 2024-08-06 values (output 16384,
input 2.5 / output 10.0 / cache_read 1.25). Apply the same fix to
azure/gpt-3.5-turbo, which the regeneration regressed the same way
(0613 -> 0301 snapshot, context/output 16384 -> 4096).
These were the only native-provider (openai/anthropic/google/azure)
output-limit downgrades introduced by the regeneration; remaining
changes are upstream drift on third-party router prefixes.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Address codex review (new P1) on #9484:
thinking_type() returned Enabled whenever only a legacy
ANTHROPIC_THINKING_BUDGET / CLAUDE_THINKING_BUDGET was set, before
checking is_adaptive_model. For adaptive-only Opus 4.7/4.8 that
produced thinking: {"type":"enabled","budget_tokens":...}, which
Anthropic rejects with HTTP 400. Now the legacy-budget path maps
adaptive models to the adaptive payload while non-adaptive models
keep the enabled/budget behavior.
Adds a test covering the legacy-budget-without-effort path.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Address codex review feedback on #9484:
- P1: Extend supports_adaptive_thinking() to match Opus 4.7 and 4.8.
These models accept adaptive thinking only and reject manual
budget_tokens thinking with HTTP 400, so reasoning requests
(GOOSE_THINKING_EFFORT / legacy CLAUDE_THINKING_TYPE=adaptive)
now send the adaptive payload.
- P2: Add rejects_sampling_params() and omit temperature for Opus
4.7/4.8, whose Messages API rejects non-default sampling params
(temperature/top_p/top_k) with HTTP 400.
Adds tests covering adaptive thinking and temperature suppression
for the new models.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Two related fixes so newly-released Claude models don't silently truncate
responses at 4096 output tokens (which manifests as the model 'stopping
mid tool-call' from the user's perspective):
1. Refresh the bundled canonical model registry from models.dev. This
adds claude-opus-4-7, claude-opus-4-8, and bumps the registry from
4740 to 4965 models. Generated by:
cargo run --bin build_canonical_models
With the refreshed registry, claude-opus-4-8 now resolves through
strip_version_suffix's existing '-N-M' -> '-N.M' normalization to the
canonical 'anthropic/claude-opus-4.8' entry (1M context, 128k output).
2. Add a Claude-4+ family floor to ModelConfig::max_output_tokens(). When
GOOSE_MAX_TOKENS is unset and the canonical lookup doesn't populate
max_tokens (e.g. a brand-new model id one release ahead of the bundled
registry), we previously fell back to a hard 4096. That's a cliff:
every Claude 4.x model supports at least 32k output, and capping at
4096 cuts off tool_use blocks mid-stream, which agents experience as
the model 'not making tool calls'.
The new floor is 32k for Claude 4+ family ids (matches the lowest
real 4.x output cap). Claude 3.x and non-Claude models keep the
existing 4096 default since their real caps are genuinely small.
Also adds claude-opus-4-7 and claude-opus-4-8 to ANTHROPIC_KNOWN_MODELS
for UI/discovery.
Tests:
- canonical_lookup_resolves_dash_to_dot_for_claude_4_x: verifies
claude-opus-4-{1,5,6,7,8} all resolve to the canonical entry and
get a real (>4096) output cap.
- claude_4_family_falls_back_to_32k_not_4096: verifies the family
floor for unknown future ids (claude-opus-4-9, anthropic/claude-opus-4.8,
databricks-claude-opus-4.6, etc).
- claude_3_and_non_claude_keep_4096_default: regression guard.
- explicit_max_tokens_always_wins: GOOSE_MAX_TOKENS still overrides.
Also updates sets_limits_from_canonical_model to assert shape rather
than a hardcoded gpt-4o output number (16384 -> 4096 in models.dev),
so the test stops breaking every registry refresh.