Commit Graph

80 Commits

Author SHA1 Message Date
Alishahryar1 07497c7ed8 Add NVIDIA NIM CLI smoke matrix and tool schema aliasing 2026-05-09 14:25:50 -07:00
Alishahryar1 5294661aa4 feat: add Wafer provider 2026-05-08 23:43:16 -07:00
Alishahryar1 3bde98abaf Raise default timeouts for write and connect 2026-05-05 21:24:47 -07:00
alex c521589817 feat: add Kimi (Moonshot) provider (#335) 2026-05-04 18:28:54 -07:00
Ali Khokhar 1dfa54a04d Update .env.example 2026-04-30 02:20:26 -07:00
Ali Khokhar 0294e04f43 Update .env.example 2026-04-28 18:59:41 -07:00
Alishahryar1 6297b48f81 feat(deepseek): use native Anthropic Messages transport
- Point DeepSeek at api.deepseek.com/anthropic with x-api-key headers
- Native request builder, DeepSeek-specific thinking/block sanitization
- Drop deepseek from OpenAI-chat server-tool preflight; update tests and docs
- Default smoke model deepseek-v4-pro; re-export dump_raw_messages_request
2026-04-26 12:03:21 -07:00
Alishahryar1 2d2bf3de70 fix: replay reasoning_content for DeepSeek/NIM and expand provider smoke
- Add ReasoningReplayMode and top-level reasoning replay in OpenAI conversion
- DeepSeek/NIM request bodies use reasoning_content when thinking is enabled
- NIM retries without reasoning_content on 400 from upstream
- Per-provider smoke models (FCC_SMOKE_MODEL_*) independent of MODEL mapping
- Fix smoke model override parsing for owner/model names with slashes
- Live smoke: reasoning tool continuation uses synthetic thinking+tool history
- Tests and docs updated
2026-04-26 11:02:18 -07:00
Alishahryar1 f3a7528d49 Major refactor: API, providers, messaging, and Anthropic protocol
Consolidates the incremental refactor work into a single change set: modular web tools (api/web_tools), native Anthropic request building and SSE block policy, OpenAI conversion and error handling, provider transports and rate limiting, messaging handler and tree queue, safe logging, smoke tests, and broad test coverage.
2026-04-26 03:01:14 -07:00
Wang Ji b525217633 [feat] ollama method support (#129)
Support use ollama method like LM stuio

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
Co-authored-by: u011436427 <u011436427@noreply.gitcode.com>
2026-04-25 22:06:36 -07:00
Alishahryar1 f29e693dc5 Add per-model thinking toggles 2026-04-25 20:51:07 -07:00
Alishahryar1 b926f60f64 feat: Anthropic web server tools, provider metadata, messaging hardening
- Add local web_search/web_fetch SSE handling and optional tool schemas
- Extend HeuristicToolParser for JSON-style WebFetch/WebSearch text
- Consolidate provider defaults, ids, and exception typing; stream contracts
- Messaging: typed options, voice config injection, platform contract cleanup
- Tests for web server tools, converters, parsers, contracts; ignore debug-*.log
2026-04-24 23:01:14 -07:00
Alishahryar1 4b89183ba0 Raise default http connect timeout to 10s 2026-04-24 21:19:33 -07:00
Alishahryar1 0e3b2c24b4 refactor: remove OpenRouter rollback, shims, and redundant layers
- OpenRouter: native Anthropic only; remove chat_request and OPENROUTER_TRANSPORT
- Drop OpenAICompatibleProvider alias, api.request_utils, voice_pipeline facade
- Simplify OpenRouter SSE, generic reasoning in conversion, messaging dispatch
- Shared markdown table helpers; API optimization response helper; contract guards
- Restore PLAN.md; update docs and tests
2026-04-24 21:08:38 -07:00
Alishahryar1 26b8a29537 Architecture refactor: core anthropic, runtime, smoke tiers, remove providers.common 2026-04-24 20:03:14 -07:00
Alishahryar1 d2db1bd689 Treat empty model overrides as fallback 2026-04-24 13:58:25 -07:00
Alishahryar1 6f3d762a4f Revert "Add per-model thinking toggles"
This reverts commit 1f12a33dd7.
2026-04-24 00:26:15 -07:00
Alishahryar1 1f12a33dd7 Add per-model thinking toggles 2026-04-24 00:14:49 -07:00
arssing 2fe15bd2cd feat: add proxy support for httpx clients (#125)
Add proxy support for providers based on
[doc](https://www.python-httpx.org/advanced/proxies/):

- Add per-provider proxy support (HTTP and SOCKS5) for all 4 providers:
nvidia_nim, open_router, lmstudio, llamacpp
- Each provider gets its own env var (NVIDIA_NIM_PROXY,
OPENROUTER_PROXY, LMSTUDIO_PROXY, LLAMACPP_PROXY) for independent proxy
configuration

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:06:16 -07:00
Pavel Yurchenko e719e4aed2 feat: deepseek api support (#118)
## Summary

* add native DeepSeek provider support via the shared OpenAI-compatible
provider base
* allow `deepseek/...` model prefixes in config validation
* add `DEEPSEEK_API_KEY` and `DEEPSEEK_BASE_URL` settings
* add DeepSeek entries to `.env.example` and `config/env.example`
* implement `DeepSeekProvider` and register it in provider dependencies
* add a DeepSeek request builder with DeepSeek-specific thinking payload
handling
* preserve Anthropic thinking blocks as `reasoning_content` for
DeepSeek-compatible continuation flows
* update `claude-pick` to discover DeepSeek models from the DeepSeek API
* document DeepSeek usage in `README.md`
* add tests for config validation, provider dependency wiring, request
building, and streaming behavior

## Motivation

DeepSeek exposes an OpenAI-compatible API and can be used directly
without routing through OpenRouter. This lets users spend their existing
DeepSeek balance through the proxy while keeping the same Claude Code
workflow and per-model provider mapping.

## Example

```dotenv
DEEPSEEK_API_KEY="sk-..."
DEEPSEEK_BASE_URL="https://api.deepseek.com"

MODEL_OPUS="deepseek/deepseek-reasoner"
MODEL_SONNET="deepseek/deepseek-chat"
MODEL_HAIKU="deepseek/deepseek-chat"
MODEL="deepseek/deepseek-chat"

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:06:01 -07:00
Alishahryar1 835d0454e8 Fixes for issue 113 and 116 2026-04-18 16:32:31 -07:00
Alishahryar1 b75f47b62d Gate NIM thinking params behind NIM_ENABLE_THINKING env var
Mistral models reject chat_template_kwargs, causing 400 errors. Make
thinking params (chat_template_kwargs, reasoning_budget) opt-in via
NIM_ENABLE_THINKING env var (default false) so only models that need it
(kimi, nemotron) receive them.
2026-03-27 21:44:36 -07:00
th-ch f703a0e403 Implement optional authentication (Anthropic style) (#80) 2026-03-27 11:11:47 -07:00
Alishahryar1 5a36a32836 feat: add llama.cpp provider for local anthropic messages API 2026-03-08 10:38:25 -07:00
Alishahryar1 87d8ce1196 feat(lmstudio): route natively to Anthropic /v1/messages endpoint
- Rewrites LMStudioProvider to inherit from BaseProvider
- Passes requests natively to /v1/messages using httpx instead of AsyncOpenAI
- Auto-translates internal ThinkingConfig to Anthropic schema
- Updates .env.example with model routing instructions
- Adjusts test suite for new native integration
2026-03-08 08:17:05 -07:00
Alishahryar1 49075b7fa5 Fixed default models 2026-03-01 21:34:01 -08:00
Alishahryar1 ac499cf585 Increased read timeout 2026-03-01 21:33:32 -08:00
Alishahryar1 feba0d456a Updated .env.example 2026-03-01 21:33:17 -08:00
Ali Khokhar 0b324e0421 Per claude model mapping (#66) 2026-03-01 21:32:23 -08:00
Mauro Druwel de70700dde feat: Use NVIDIA NIM ASR for audio transcription (#53)
## Summary
Added NVIDIA NIM as a second transcription option ( alongside local
Whisper). This lets you transcribe voice notes using NVIDIA's cloud API
instead of running Whisper locally.

## What changed

- **Transcription**: Now supports the two backends

  - Local Whisper: Free, runs on your GPU/CPU (existing)
  - NVIDIA NIM: Cloud API via Riva gRPC (new)

- **Supported models**: 8 NVIDIA NIM models added (Parakeet variants for
different languages, Whisper Large V3)

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-02-28 08:48:59 -08:00
Ali Khokhar c4d8681000 Backup/before cleanup 20260222 230402 (#58) 2026-02-27 19:50:21 -08:00
Alishahryar1 d6a0e1a401 Provider inferred from model name using prefix 2026-02-19 20:53:02 -08:00
Alishahryar1 2ad64cc97a quoted string vars in env example 2026-02-19 20:27:28 -08:00
Claude 45b7e4cafd Make PROVIDER_MAX_CONCURRENCY required with default of 5
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
  is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
  no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
  setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
  updated mock settings to use `5` instead of `None`

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:39:42 +00:00
Claude 99f99fce90 Remove max_cli_sessions — CLI session pool is now unbounded
The max_sessions cap in CLISessionManager was the only thing enforcing
a limit on concurrent CLI processes. Now that provider concurrency is
controlled at the streaming layer (PROVIDER_MAX_CONCURRENCY semaphore),
the CLI session pool cap is redundant and removed entirely.

Changes:
- cli/manager.py: remove max_sessions param, cap check, _cleanup_idle_sessions_unlocked, max_sessions from get_stats()
- config/settings.py: remove max_cli_sessions field
- api/app.py: remove max_sessions=settings.max_cli_sessions from CLISessionManager constructor
- messaging/handler.py: remove "Waiting for slot" status check; stats display no longer shows Max CLI
- .env.example: remove MAX_CLI_SESSIONS line
- tests/cli/test_cli.py: remove max_sessions args and assertion from manager tests
- tests/cli/test_cli_manager_edge_cases.py: remove two tests for cap/cleanup behavior
- tests/api/test_app_lifespan_and_errors.py: remove max_cli_sessions from all SimpleNamespace settings
- tests/config/test_config.py: remove max_cli_sessions isinstance assertion
- tests/conftest.py: remove max_sessions from mock stats
- tests/messaging/test_handler.py: merge slot/capacity tests into single new-conversation test; remove Max CLI assertion from stats test
- tests/messaging/test_handler_markdown_and_status_edges.py: remove "Waiting for slot" assertion; drop max_sessions from all stats mocks

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:31:47 +00:00
Claude afaf50a972 Add queue-level concurrency limit to provider streaming
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.

Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:23:21 +00:00
Alishahryar1 4aff0b910f provider type quoted 2026-02-18 19:54:30 -08:00
Alishahryar1 cf1284b784 default voice note enabled set to false 2026-02-18 19:54:13 -08:00
Alishahryar1 416f259c8b Reordered env example 2026-02-18 19:53:30 -08:00
Alishahryar1 c35ecba9d8 Update Whisper model configuration to use 'base' as the default model ID 2026-02-18 19:36:58 -08:00
Alishahryar1 8807f58267 decreased default message rate limit 2026-02-18 13:35:09 -08:00
Alishahryar1 75e066f17f Refactor voice note transcription to use Hugging Face transformers Whisper pipeline
- Updated transcription logic to utilize Hugging Face's Whisper models instead of faster-whisper.
- Introduced new model mapping and pipeline loading functions.
- Adjusted tests to reflect changes in the transcription process.
- Updated documentation in README, .env.example, and settings to align with the new implementation.
- Ensured compatibility with CUDA 13 and removed unnecessary dependencies.
2026-02-18 06:18:28 -08:00
Cursor Agent db646ef2db Remove auto support for whisper_device; only cpu and cuda allowed
- Validate whisper_device in Settings and _get_local_model
- Reject 'auto' with clear ValueError/ValidationError
- Update docs in config, .env.example, README
- Add tests for invalid device and valid cpu/cuda

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-18 13:38:59 +00:00
Cursor Agent eabe8db2e8 Remove CPU fallbacks for voice note transcribe; auto/cuda/cpu fail fast
- Remove _cuda_failed_models and inference-time CPU fallback
- auto: try CUDA only, fail fast on RuntimeError (no CPU fallback)
- cpu/cuda: use device directly, fail fast on errors
- Update docs in config, .env.example, README

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-18 13:37:23 +00:00
Alishahryar1 d668f6e476 Add voice note transcription feature
- Introduced voice note handling for Discord and Telegram platforms.
- Added configuration options for voice note functionality in settings.py and .env.example.
- Updated README to include voice note instructions and configuration details.
- Implemented audio attachment processing and transcription using faster-whisper.
- Enabled voice note support through message handlers in both platforms.
2026-02-16 20:14:59 -08:00
Alishahryar1 0eb999a0c1 lint 2026-02-16 02:23:57 -08:00
Alishahryar1 01852e1638 Add configurable HTTP timeouts for provider API requests
Updated the README to include new timeout settings. Implemented these timeouts in the provider classes and added corresponding tests to ensure they are correctly passed to the client. Also included environment variable support for the new settings.
2026-02-16 01:40:15 -08:00
Alishahryar1 6511542bfe Implement Discord bot support and update README for messaging platform changes 2026-02-16 00:08:09 -08:00
Alishahryar1 b53a1b20c5 lint 2026-02-15 23:55:57 -08:00
Alishahryar1 b6f870ffab Added space in example env 2026-02-15 23:53:42 -08:00