Commit Graph

159 Commits

Author SHA1 Message Date
Giancarlo Erra bca259805e chore: release v1.8.12 v1.8.12 2026-05-22 18:05:32 +01:00
Giancarlo Erra 30200b3ec5 Merge pull request #61 from giancarloerra/fix/windows-path-separator-normalization
fix(graph): normalize Windows backslash paths to forward slashes
2026-05-22 18:02:51 +01:00
Giancarlo Erra 4526ea58ae fix(graph): normalize stored node keys during lookup for legacy cache compat
Address CodeRabbit review: also normalize stored graph node keys when
comparing, not just the query input. Handles pre-fix Windows caches
where node keys still contain backslashes until the graph is rebuilt.
2026-05-22 16:07:14 +01:00
Giancarlo Erra e9ee3ea116 fix(graph): normalize Windows backslash paths to forward slashes
On Windows, path.relative() and path.join() return backslash separators.
Graph node keys were stored with native separators, but query inputs use
forward slashes, causing silent lookup failures on Windows.

Add toForwardSlash() utility and apply it at build time (file walker,
resolution functions) and query time (getFileDependencies,
getSymbolContext, listSymbols) for defense-in-depth.

No-op on macOS/Linux where path.relative() already returns forward
slashes. Existing Windows symbol graph caches require one rebuild.

Fixes #60
2026-05-22 15:59:13 +01:00
Giancarlo Erra c126c525fa chore: release v1.8.11 v1.8.11 2026-05-12 14:39:43 +01:00
Giancarlo Erra fc63d147e4 Merge pull request #59 from shaitourchin/fix/refuse-node-26-until-qdrant-undici-compat
fix: refuse Node 26+ until qdrant-js gains undici 7 compat
2026-05-12 13:36:57 +01:00
Shai Tourchin 69a6b74b8a fix(index): use fs.writeSync for synchronous flush + sync exit
Third-pass review caught that the callback shape from 5cd9db0
(while correctly fixing the stderr-truncation concern CodeRabbit
raised) introduced an async-exit window. With process.exit(1)
inside the write callback, on Node 26+ the rest of the file's
top-level code runs before termination: imports' top-level
evaluation, the McpServer/tool registrations, and the start of
main()'s connect — the MCP host can briefly see a handshake
begin before the process dies.

fs.writeSync(2, msg) is the canonical Node pattern for "print
fatal error then die" — blocking (no truncation when stderr is
piped) AND synchronous (so process.exit(1) runs before any
further top-level code). Strictly better than the callback shape
on both axes.

Also soften comment phrasing to reduce rot risk:
- "Candidate fixes already in flight" -> "Upstream PRs under discussion"
- "Once one lands" -> "If either lands -- or any other fix supersedes them"

Verified: full 4-line stderr message survives piping to a file
on Node 26.0.0; exit code 1 preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 11:19:20 +03:00
Shai Tourchin 5cd9db07e6 fix(index): flush stderr before exit on Node 26+ guard
Per CodeRabbit review on #59: process.stderr.write() is async when
stderr is piped (every MCP host captures stderr to surface server
logs), so a bare `process.exit(1)` immediately after the write
terminates synchronously without draining I/O — risking truncation
of the compatibility warning that this guard exists to surface.

Move the exit into the write callback so the message is guaranteed
to flush before the process terminates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 11:06:51 +03:00
Shai Tourchin c23120e6c0 fix: refuse to start on Node 26+ until qdrant-js gains undici 7 compat
Under Node 26+, the very first qdrant request crashes with
`UND_ERR_INVALID_ARG: invalid onError method`. Root cause is a version
mismatch: @qdrant/js-client-rest constructs an undici.Agent from its
pinned undici ^6 and passes it as the dispatcher to Node's built-in
fetch(), which under Node 26 uses a newer undici with stricter
dispatcher-hook validation.

The bug surfaces on the first real codebase_search / codebase_index
call — the MCP handshake succeeds, then everything fails. The error
message gives no hint about Node version, so users on Node 26+ lose
significant time debugging.

This change:
- Adds a runtime pre-flight check at index.ts entry that prints a
  clear actionable error and exits 1. Per ESM the imports below
  evaluate first, but qdrant-js's module init is side-effect-light,
  so exiting at the first top-level statement is enough.
- Tightens engines.node to `>=18.0.0 <26.0.0` so npm/npx warns at
  install time.

Both can be reverted once one of qdrant/qdrant-js#123 (undici major
upgrade) or qdrant/qdrant-js#128 (inject fetch) lands.

Refs: qdrant/qdrant-js#134

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 10:14:38 +03:00
Giancarlo Erra c842ff61f5 chore: release v1.8.10 v1.8.10 2026-05-08 11:01:04 +01:00
Giancarlo Erra f87f5297ef Merge pull request #57 from airmonitor/litellm
Brilliant stuff, merging now, thank you!
2026-05-08 10:53:10 +01:00
AirMonitor 6c67965628 fix(litellm): iterate paginated /v1/models in readiness checks
The OpenAI SDK's `client.models.list()` returns a `PagePromise` that
implements `AsyncIterable<Model>` and auto-paginates on demand. The
previous implementation read `modelList.data` directly, which only
contains the first page. Today's LiteLLM proxy returns the entire
`model_list` from `config.yaml` in a single response so the bug is
latent, but a future LiteLLM build (or an upstream proxy in front of
it) that paginates `/v1/models` would cause `ensureReady` and
`healthCheck` to throw a spurious "alias not registered" error for
any alias landing on a non-first page.

Switch both checks to `for await (const m of client.models.list())`
and accumulate ids into a single array. Equivalent to the SDK's
documented async-iteration pattern; `PagePromise` is itself the
iterable, so no extra `await` is needed before the loop. Inline
comment explains why the iteration matters even though today's
LiteLLM doesn't paginate, so the pattern survives future drive-by
"simplifications".

Surfaced by CodeRabbit on PR review of 1708510.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:12:16 +02:00
Tomasz Szuster 1708510c16 feat(embeddings): add LiteLLM as a first-class embedding provider
LiteLLM Proxy Server (https://docs.litellm.ai/docs/simple_proxy) exposes an
OpenAI-compatible /v1/embeddings endpoint and fans out to 100+ underlying
providers (OpenAI, Anthropic, Cohere, Voyage, HuggingFace, Bedrock, Vertex AI,
Ollama, ...). Mirroring the lmstudio strategy (PR #42 + commit bb141a0) but
with three meaningful differences that justify a dedicated provider rather
than a flag on provider-openai:

- Authentication is mandatory. LiteLLM gates /v1/models with the master key
  or a virtual key, unlike LM Studio (no auth by default) and OpenAI (cloud
  key). LITELLM_API_KEY is checked at config-load time; the provider also
  duck-types 401/403 in ensureReady/healthCheck via err.status to surface a
  distinct "auth rejected" message vs. "proxy unreachable".
- Model aliases come from the proxy's config.yaml, so EMBEDDING_MODEL and
  EMBEDDING_DIMENSIONS have no sensible defaults. Fail-fast in
  loadEmbeddingConfig with provider-specific error messages pointing at
  litellm_params.model in the proxy config and at the underlying alias's
  output dim.
- Whether dimensions can be forwarded depends on the underlying provider:
  Matryoshka-aware models (text-embedding-3-*, voyage-3) accept it,
  non-Matryoshka backends (BGE, nomic, Cohere v3) reject. Made opt-in via
  LITELLM_SEND_DIMENSIONS=true rather than hardcoded like provider-openai
  does for text-embedding-3-*, since LiteLLM aliases are user-defined.

Encoding-format=float fix from bb141a0 ports verbatim — the OpenAI SDK 6.x
base64-decode path corrupts any backend that returns plain JSON float arrays
(many LiteLLM aliases do, including Ollama-routed and tei-wrapped ones).

Files:

- src/services/provider-litellm.ts: new LiteLLMEmbeddingProvider with the
  same OpenAI-SDK + custom baseURL pattern. Default baseURL
  http://localhost:4000/v1 (LiteLLM's default port, /v1 prefix required).
  Batch size 256 — between OpenAI's 512 and LM Studio's 64, since the
  practical ceiling depends on whichever provider the alias resolves to.
  ensureReady distinguishes proxy-unreachable / auth-rejected /
  alias-not-registered. Lists up to 10 currently-registered models in the
  alias-missing error so the operator can sanity-check their config.yaml
  without leaving the log.
- src/services/embedding-config.ts: extends EmbeddingProvider union with
  "litellm", adds litellmUrl to EmbeddingConfig, fail-fast validation for
  LITELLM_API_KEY + EMBEDDING_MODEL + EMBEDDING_DIMENSIONS (key first so a
  virtual-key user fixes the easy problem before touching the proxy
  config), updates Invalid EMBEDDING_PROVIDER message and hasApiKey log
  expression.
- src/services/embedding-provider.ts: factory case for litellm with dynamic
  import to avoid loading the OpenAI SDK at startup for non-litellm users.
- README.md: dedicated LiteLLM section, MCP host config example, env-var
  table entries for EMBEDDING_PROVIDER / EMBEDDING_MODEL /
  EMBEDDING_DIMENSIONS / EMBEDDING_CONTEXT_LENGTH (clarifying which require
  manual values for litellm), new LiteLLM Configuration table.
- tests/unit/embedding-config.test.ts: 9 new cases (model + dim + key
  required, error-ordering, URL default + override, dimensions parsing,
  EMBEDDING_CONTEXT_LENGTH override for unknown aliases, auto-detection
  when alias matches a known model name) plus updated "full external
  config" expected object and updated invalid-provider error message.
- tests/unit/embedding-provider.test.ts: factory test for litellm, plus 4
  cases against a deliberately-closed port (config rejects construction
  without API_KEY, ensureReady unreachable error format, healthCheck
  short-circuits on missing key without a network call, healthCheck
  reaches "Not reachable" path without throwing).

Backward compatible. The litellm provider is opt-in via
EMBEDDING_PROVIDER=litellm. Existing ollama, openai, google, and lmstudio
paths are untouched.

Verified: 64/64 unit tests pass on the touched suites; biome lint clean;
tsc --noEmit clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 08:58:51 +02:00
Giancarlo Erra aedfa8543d chore: release v1.8.9 v1.8.9 2026-05-07 21:53:41 +01:00
Giancarlo Erra 92fec9e582 Merge pull request #56 from iritbrener-blip/fix/wrap-qdrant-errors-with-context
fix(qdrant): wrap propagated client errors with operation context (#55)
2026-05-07 21:50:05 +01:00
IritBrenerShalem f22b4d128f fix(qdrant): wrap propagated client errors with operation context (#55)
The deliberate catch-and-rethrow sites in `loadProjectHashes` and
`getCollectionInfo` were re-throwing the raw Qdrant client error. With
no wrapping, the original message ("Internal Server Error") landed
verbatim in the MCP response and `codebase_status` `lastCompleted.error`
field, leaving consumers no way to distinguish which operation failed,
which collection it targeted, or whether the underlying error carried
an HTTP status code.

Add a `wrapQdrantError(operation, context, err)` helper and apply it at
both rethrow sites. The new error message looks like:

    loadProjectHashes(collName=codebase_xxx) failed [status 500]: Internal Server Error

The original error is preserved via `cause`, so any consumer that walks
the cause chain still has full access. Also pick up `statusCode` as a
fallback for clients that use that field name.

Behaviour preserved:
- 404 / not-found still returns null from `getCollectionInfo` (no change).
- Re-throw is still the path for transient/unknown errors (no behaviour
  change for the deliberate hardening that protects against destructive
  clean-start cascades).

Adds `tests/unit/qdrant-error-wrapping.test.ts` covering the 404 pass-
through, wrapped-message format, status-code inclusion, missing-status
case, and `cause` preservation.

Closes #55.
2026-05-07 23:38:11 +03:00
Tomasz Szuster a0814005f8 Merge pull request #3 from giancarloerra/main
merging with upstream
2026-05-06 17:39:09 +02:00
Giancarlo Erra f014c05e03 chore: release v1.8.8 v1.8.8 2026-05-06 15:16:55 +01:00
Tomasz Szuster 2c4d55ca50 feat(config): support projectId in .socraticode.json for team-shared indexes (#53)
Adds an optional `projectId` field to `.socraticode.json` so teams can
commit a stable project identifier to the repo. Without this field the
project ID is derived from the SHA-256 of the absolute checkout path,
which means the same project resolves to a different Qdrant collection
on every machine, OS user, filesystem layout, or worktree. With it,
every checkout addresses the same `codebase_*`, `codegraph_*`, and
`context_*` collections regardless of where the working tree lives on
disk.

This is the path-independent, multi-project complement to the existing
`SOCRATICODE_PROJECT_ID` env var. The env var is process-scoped and
global to all projects in a host, so it does not scale to a developer
who works on several projects on one laptop. The file is per-project
and shared across teammates via git.

Resolution precedence (highest first):

  1. `SOCRATICODE_PROJECT_ID` env var (per-machine override)
  2. `projectId` in .socraticode.json (committed, team-wide)
  3. SHA-256 prefix of the absolute path (existing default)

Both override paths trim whitespace, validate against `[a-zA-Z0-9_-]+`,
and throw on invalid characters so a misconfigured value cannot
silently route a project to the wrong (or empty) collection. Malformed
JSON, missing fields, wrong types, and empty/whitespace-only values
fall through to the next precedence level so the MCP server stays
resilient against hand-edited config files. Branch-aware mode is
suppressed for either explicit override since explicit identifiers
are stable by intent.

Also fixes a pre-existing bug in `resolveLinkedCollections`: linked
projects were resolved via `coreProjectId(linkedPath)` (path hash
only), so a linked project that pinned its own `projectId` in
`.socraticode.json` would silently miss its actual data during
cross-project search. Linked-project resolution now goes through a
new `effectiveBaseProjectId` helper that honors the committed value,
preserving symmetry: a project addresses the same Qdrant collection
whether it is the current root or a linked dependency. Dedup is
tightened to use the same effective base ID, so two paths pinning the
same shared identifier collapse to a single result.

The env var deliberately does not leak into linked-project collection
names. It is process-scoped and applying it as a single value to every
linked path would collapse them onto the env-var collection, silently
losing per-project isolation.

Tests: 16 new cases in tests/unit/config.test.ts, written TDD-style
(RED to GREEN). Coverage:

  - `projectIdFromPath` (13): file resolution, ignores path
    differences when file projectId is set, whitespace trimming,
    throws on invalid characters, falls back to hash on
    empty/whitespace/wrong-type/null/missing-file/malformed-JSON,
    env-var precedence over file, branch-suffix suppression, and
    coexistence with `linkedProjects` in the same file.
  - `resolveLinkedCollections` (3): linked project's committed
    projectId honored, dedup on shared committed projectId, env var
    does not leak into linked-project collection names.

The new branch-aware-suppression test explicitly disables git
`commit.gpgsign` and `tag.gpgsign` in its throwaway-repo fixture so
the test is robust against the developer's global git config.

Backwards compatible: zero behaviour change for users who do not adopt
the new field. The `SocratiCodeConfig` interface gains an optional
field; existing `linkedProjects` parsing is functionally identical
(routed through the new shared `loadSocratiCodeConfig` helper).
Composes cleanly with the recently-added `QDRANT_COLLECTION_PREFIX`:
prefix + projectId combine into `<prefix>codebase_<projectId>` as
expected.

README and DEVELOPER documentation updated: new "Team-Shared Index
(committed `projectId`)" section in README between Git Worktrees and
Cross-Project Search, and the env-var table notes the new precedence.
DEVELOPER's "Project ID & Collection Naming" section now documents
the three-level precedence and explains why both override paths
suppress the branch-aware suffix.

Co-authored-by: airmonitor <tomasz.szuster@gmail.com>
2026-05-06 15:05:43 +01:00
Giancarlo Erra 9a80a80654 chore: release v1.8.7 v1.8.7 2026-05-06 10:29:05 +01:00
Aleksey Chugarev 2007a18865 fix(context): checkpoint artifact metadata after each successful index (#52)
indexAllArtifacts and ensureArtifactsIndexed previously called saveContextMetadata only once, after the entire indexing pass completed. When the underlying loop took longer than the MCP client's tool-call timeout, completed artifacts appeared unindexed because their state was never persisted, and partial progress was lost.

This patch saves the metadata snapshot after every successfully indexed artifact, so each artifact's success is durable as soon as the indexing for it returns. It also seeds the in-flight stateMap from the previously-loaded existingStates so that interrupted runs can preserve completed work for artifacts already finished, and uses that same original snapshot to identify orphan artifacts that need cleanup when the config has changed.

Backwards compatible: a successful full run produces exactly the same final on-disk state as before. The only behavioural difference is in the interrupted-mid-run case, where the new code retains more state instead of losing everything since the last full pass.

Tests: 3 new cases in tests/unit/context-artifacts-checkpoint.test.ts covering the checkpointing path during full indexing, preservation of earlier successes when a later artifact fails, and preservation of up-to-date states while re-indexing stale ones. Existing unit tests continue to pass unchanged.

Co-authored-by: jackblackjack chugarev@gmail.com
2026-05-06 10:27:00 +01:00
Giancarlo Erra bde9eb7efb chore: release v1.8.6 v1.8.6 2026-05-05 11:44:01 +01:00
Giancarlo Erra f54263f1b6 Merge pull request #50 from giancarloerra/feat/qdrant-collection-prefix
feat(qdrant): add QDRANT_COLLECTION_PREFIX env var for shared instances
2026-05-05 11:29:28 +01:00
Giancarlo Erra 70db002796 feat(qdrant): add QDRANT_COLLECTION_PREFIX env var for shared instances
Resolves #49. Reported by @awbait.

When sharing a single Qdrant server across multiple applications
(SocratiCode + Open-WebUI + custom RAG, etc.) or across multiple
SocratiCode instances (per-project, per-environment, per-user), the
fixed `codebase_<id>` / `codegraph_<id>` / `context_<id>` /
`<id>_symgraph_*` / `socraticode_metadata` collection names risk
colliding with other apps and prevent isolation between SocratiCode
instances.

This patch adds an optional QDRANT_COLLECTION_PREFIX env var that, when
set, is prepended verbatim to every Qdrant collection name SocratiCode
creates, queries, lists, or deletes. Default empty string preserves the
existing collection names exactly: fully backwards compatible.

Touchpoints (mechanical, no logic changes):

- src/constants.ts: new QDRANT_COLLECTION_PREFIX export with eager
  validation. Qdrant accepts only [a-zA-Z0-9_-] in collection names; an
  invalid prefix throws at module load with a message naming the
  offending value, before any Qdrant call is attempted.
- src/config.ts: all six collection-name generators
  (collectionName, graphCollectionName, contextCollectionName,
  symgraphMetaCollectionName, symgraphFileCollectionName,
  symgraphIndexCollectionName) prepend the prefix. Generator semantics
  are otherwise unchanged.
- src/services/qdrant.ts: METADATA_COLLECTION (the global
  socraticode_metadata collection used for cross-project state) also
  honours the prefix, so two SocratiCode instances on one Qdrant keep
  their metadata isolated as well as their per-project collections.
  The two startsWith() filters in listCodebaseCollections — used by
  codebase_list_projects to discover this instance's collections —
  build the match prefix from QDRANT_COLLECTION_PREFIX so a prefixed
  instance only sees its own collections, not those of co-tenants.
- src/tools/manage-tools.ts: codebase_list_projects similarly uses the
  prefix in its filters. The projectId extraction (formerly
  c.replace("codebase_", "")) now slices the full
  ${prefix}codebase_ token so the recovered id is correct under any
  prefix; the codegraph cross-reference uses the same prefixed name.

Tests: 20 new test cases in tests/unit/qdrant-collection-prefix.test.ts
covering:

- Default empty prefix preserves the legacy collection-name forms for
  all six generators (regression guard against backward-compat break).
- Empty-string env var is treated identically to unset.
- Non-empty prefix prepends correctly to all six generators, including
  the suffix-style symgraph names.
- Two different prefixes produce disjoint collection-name sets for the
  same projectId (the multi-instance isolation property).
- Validation rejects whitespace, slash, colon, and unicode characters.
- The error message includes the offending value for discoverability.
- Validation accepts the full set of legal characters.

Existing 752 unit tests continue to pass unchanged. Total: 772.

typecheck, biome, and CodeRabbit local review all clean. README
updated to document the new env var alongside the other QDRANT_*
settings, including the user-side responsibility to remove old
collections when changing prefix between runs.

Co-authored-by: awbait <awbait@users.noreply.github.com>
2026-05-05 11:20:33 +01:00
Giancarlo Erra 9431e144d2 chore: release v1.8.5 v1.8.5 2026-05-05 02:06:44 +01:00
Giancarlo Erra 2ce536dc11 Merge pull request #48 from giancarloerra/fix/go-module-resolution
fix(graph): resolve Go imports via go.mod module path
2026-05-05 02:05:24 +01:00
Giancarlo Erra 8c26ed8b49 fix(graph): allow Go resolution for projects with golang.org/* module paths
Address CodeRabbit review on PR #48. The early `isExternalModule` check
in resolveImport was filtering out any import starting with `golang.org/`
before the Go case had a chance to match it against the local module
path. This blocked legitimate local imports for any project whose own
module path starts with `golang.org/` (the Go team's own packages like
golang.org/x/sync, golang.org/x/net, etc., where each one's go.mod
declares `module golang.org/x/<name>`).

Skip the early external check for Go specifically. The Go case in
resolveImport already does its own module-path-aware classification
and returns null for everything outside the local module, including
stdlib and third-party deps. No regression in those cases.

New regression test asserts that
`module golang.org/x/custom` + `import "golang.org/x/custom/internal"`
resolves to the local internal/ package. Confirmed the test fails
without the fix and passes with it. Total: 752 unit tests pass.

Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
2026-05-05 01:57:07 +01:00
Giancarlo Erra c156da1688 fix(graph): resolve Go imports via go.mod module path
Resolves #45. Reported by @mrsuit92.

Go projects produced 0 dependency edges in codebase_graph_query and
codebase_graph_stats even though import extraction worked correctly.
The Go case in resolveImport returned null unconditionally, with a
comment that resolution required go.mod analysis. This patch adds
that analysis and wires it into the resolver, mirroring the existing
buildJvmSuffixMap and buildCsNamespaceMap patterns.

Mechanism:

- buildGoModuleInfo reads <projectPath>/go.mod once at graph-build
  time, parses the `module <path>` directive, and walks the file set
  to build a directory-to-representative-file map for every Go
  package. _test.go files are excluded from representative selection
  because Go does not allow them to be imported from non-test code in
  other packages. Files are sorted lexicographically for
  deterministic representative selection across machines and runs.
  Returns null when go.mod is missing or has no parseable module
  directive; the resolver treats null as "no Go resolution available"
  and behaves exactly as before this patch in those cases.

- The Go case in resolveImport now strips the module path prefix
  from the import (handling the bare-module-path root case as well
  as subpackage paths) and looks up the resulting directory in the
  package map. Imports outside the module path return null and are
  treated as external dependencies (or stdlib already filtered
  upstream by isExternalModule).

- Map keys are forward-slash paths, not OS-native, so resolution
  works on Windows: Go imports are always forward-slash regardless
  of host OS, but path.dirname produces backslashes on Windows for
  nested directories. Normalising the key to forward slashes at
  build time keeps the lookup correct across platforms.

Limitations (deferred to follow-up issues if any user reports them):

- The parenthesised module ( path ) form in go.mod is not parsed.
  Not used by any mainstream Go project (verified against cobra,
  gin-gonic/gin, uber-go/zap real-world go.mod files).
- vendor/ directory shadowing of external imports is not honoured.
- replace directives in go.mod are not honoured.
- go.work multi-module workspaces are not handled (each workspace
  module would need its own go.mod read and prefix matching).

These are real Go features but each one widens the patch and
narrowly affects specific user populations. They can be added as
separate small PRs if a real user hits them.

Tests: 16 new cases in tests/unit/graph-resolution.test.ts covering
the new buildGoModuleInfo function (parses simple go.mod, handles
leading whitespace and trailing content, returns null on missing or
malformed go.mod, excludes _test.go from representative selection,
omits test-only directories, uses forward-slash keys for nested
packages) and the Go resolveImport case (back-compat null without
goModuleInfo, subpackage import resolves to lex-smallest non-test
.go file, root-package import resolves to a project-root .go file,
external imports return null, missing or malformed go.mod returns
null, _test.go excluded from representative selection, lexically
smallest file picked deterministically, similar-prefix imports do
not falsely resolve, nested-package imports work cross-platform).

Existing 735 unit tests continue to pass unchanged. Total: 751.

typecheck, biome, and CodeRabbit local review all clean. CodeRabbit
caught a real Windows path-separator bug in the first iteration; the
fix and a regression test for it are included.

Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
2026-05-05 01:48:22 +01:00
Giancarlo Erra 4e2ccaa6d2 Merge pull request #47 from giancarloerra/fix/python-sibling-imports
fix(graph): resolve Python sibling-flat imports in service-style monorepos
2026-05-05 01:17:01 +01:00
Giancarlo Erra 8921690d72 fix(graph): resolve Python sibling-flat imports in service-style monorepos
Resolves #46. Reported by @mrsuit92.

Python projects where each top-level directory is a runnable application
root (a common service-style monorepo layout) had `import config` from
`service-a/main.py` produce 0 dependency edges, even when
`service-a/config.py` sits next to the importer. At runtime Python
resolves this correctly because the importer's directory is sys.path[0]
when the file is run as `python main.py` from inside its own directory.
The static resolver did not check that path.

The Python case in graph-resolution.ts only tried:

  <projectPath>/<module>.py
  <projectPath>/src/<module>.py
  <projectPath>/lib/<module>.py

It did not try `<sourceDir>/<module>.py`, so non-relative sibling
imports never resolved. Relative imports (`from .config import ...`)
already used sourceDir and worked.

Fix: add `<sourceDir>/<module>.py` as the LAST fallback, after the
existing project-root and src/lib checks. Tried last to preserve
project-root precedence, so any layout that resolved before this PR
continues to resolve to the same file. resolveRelativePath also handles
the `<sourceDir>/<module>/__init__.py` package case via its built-in
Python init fallback, so package-style sibling imports work too.

Tests: 5 new cases in tests/unit/graph-resolution.test.ts covering
sibling-flat resolution, dotted module paths, package via __init__.py,
project-root precedence preservation, and the negative case (no match
anywhere). Existing 730 tests continue to pass; total now 735.

typecheck, biome, and CodeRabbit local review all clean.

Co-authored-by: mrsuit92 <mrsuit92@users.noreply.github.com>
2026-05-05 00:54:22 +01:00
Giancarlo Erra bf36c0c1b7 docs: add note about MCP governance and JanuScope 2026-05-05 00:36:41 +01:00
Giancarlo Erra ccebb8d980 chore: release v1.8.4 v1.8.4 2026-05-04 19:16:25 +01:00
Giancarlo Erra e6ce32710a fix(graph): pre-validate ast-grep grammar libraryPath to survive missing prebuilds (#44)
Resolves #43.

On Linux/Node combinations where one ast-grep grammar package's prebuilt
parser binary is missing for the host architecture, the v1.8.3 loader
silently failed to register every dynamic grammar in the batch, not just
the broken one. registerDynamicLanguage iterates and accesses each
module's lazy libraryPath getter; one throwing getter aborts the call
atomically and zero grammars end up registered.

Fix: pre-validate each grammar's libraryPath getter inside the per-
grammar try/catch so a missing prebuild is contained to that grammar.
Build the batch object with only the survivors and make ONE atomic
registerDynamicLanguage call. Standard environments are unaffected
because all grammars pass pre-validation. Affected environments lose
only the unloadable grammar, the rest register cleanly.

Also captures the actual error reason (the previous empty `catch {}`
discarded it), bumps symbol- and import-extraction failure logs from
debug to warn with one-shot dedupe per language, exposes loaded/failed
grammars via a new getDynamicLanguageStatus() API, and renders an "AST
grammars" block in codebase_graph_status output so users see loader
state without enabling debug logging.

Empirically verified against @ast-grep/napi@0.40.5 in a clean Node
environment. Two probes confirmed the napi semantics: sequential
register({A}); register({B}) calls are REPLACING (so per-language
registration is broken), and batch register with one bad getter is
ATOMIC (so pre-validation before the batch call is the only correct
pattern). All 721 existing unit tests continue to pass unchanged. Adds
9 new tests for the loader status API; total 730 pass. typecheck and
biome clean. CodeRabbit returned no findings on this diff.

Co-authored-by: X-Adam <X-Adam@users.noreply.github.com>
2026-05-04 19:14:27 +01:00
Tomasz Szuster 7a42ea67ed Merge pull request #2 from giancarloerra/main
merging with upstream
2026-05-04 15:11:15 +02:00
Giancarlo Erra 852ae5d6d5 chore: release v1.8.3 v1.8.3 2026-05-04 12:48:54 +01:00
Tomasz Szuster 332ee800a8 feat(embeddings): add LM Studio as a first-class embedding provider (#42)
LM Studio's Local Server speaks the OpenAI-compatible /v1/embeddings
protocol, so users running it as their model host (chat plus embedding in
one desktop app, GGUF model management) had no clean integration path.

Changes:

- src/services/provider-lmstudio.ts: new LMStudioEmbeddingProvider wrapping
  the OpenAI SDK with a custom baseURL (default http://localhost:1234/v1).
  Sends a placeholder API key to satisfy the OpenAI SDK while LM Studio's
  Local Server runs without auth by default. Skips the dimensions parameter
  because LM Studio models have no Matryoshka projection. Forces
  encoding_format=float to defeat the OpenAI SDK 6.x base64 default, which
  would otherwise mangle LM Studio's plain-array responses into 1024 zeros.
- src/services/embedding-config.ts: extends the EmbeddingProvider union,
  reads LMSTUDIO_URL and LMSTUDIO_API_KEY, fail-fast validation when
  EMBEDDING_PROVIDER=lmstudio without EMBEDDING_MODEL or EMBEDDING_DIMENSIONS.
- src/services/embedding-provider.ts: factory case for lmstudio with a
  dynamic import to avoid loading the OpenAI SDK at startup for ollama users.
- ensureReady distinguishes "LM Studio unreachable" from "reachable but
  embedding model not loaded" so the operator knows whether to start the
  Local Server or load the configured model.
- src/services/qdrant.ts: minor refactor to extract the hybrid-search query
  payload to a local const for readability.
- README.md: dedicated LM Studio section, MCP host config example, env-var
  table entries.
- tests/unit/embedding-config.test.ts: 8 new cases (required-env validation,
  URL default and override, optional API key, context-length override).
- tests/unit/embedding-provider.test.ts: 3 new cases (factory wiring,
  ensureReady error format against a closed port, healthCheck unreachable
  output).

Backward compatible. The lmstudio provider is opt-in via
EMBEDDING_PROVIDER=lmstudio. Existing ollama, openai, and google paths are
untouched.
2026-05-04 12:47:26 +01:00
Tomasz Szuster dbbeda3e7a Merge pull request #1 from giancarloerra/main
merging with upstream
2026-05-04 12:05:05 +02:00
Giancarlo Erra f991c33c3c chore: release v1.8.2 v1.8.2 2026-05-04 10:47:13 +01:00
Giancarlo Erra baf0fe92ea Merge pull request #40 from a5345534/fix/jvm-symbol-extraction
fix: extract JVM symbol names from declarations
2026-05-04 10:43:50 +01:00
Shawn 1dbc1eb398 test: cover JVM annotations with parameters 2026-05-04 16:32:51 +08:00
Shawn 6a76ad4782 fix: cover JVM annotation and Scala callable edge cases 2026-05-04 16:07:01 +08:00
Shawn 019eba0583 fix: extract JVM symbol names from declarations 2026-05-04 15:51:34 +08:00
Giancarlo Erra aefc8ae1bc chore: release v1.8.1 v1.8.1 2026-05-04 02:09:44 +01:00
Giancarlo Erra 8d6cb86b27 fix(docs): replace broken Marketplace badges and surface listings in main README
shields.io's `visual-studio-marketplace/*` endpoints currently return
"404: badge not found", so the version and installs badges on the
extension's marketplace listing render as broken images. Switch to
`vsmarketplacebadges.dev`, which is the de-facto third-party
replacement most VS Code extensions fall back to.

Also surface the marketplace listings in the root README's top badge
row, alongside the existing MCP-install deep-link badges. The two
serve different purposes (the deep links register the MCP config
without an extension, the marketplace badges link to the extension
itself), so both stay.
2026-05-04 02:06:01 +01:00
Giancarlo Erra fa1755dccc chore: release v1.8.0 v1.8.0 2026-05-04 01:00:16 +01:00
Giancarlo Erra e30e8bfbd0 Merge pull request #39 from giancarloerra/feat/extensions
feat(extension): VS Code and Open VSX extension
2026-05-04 00:57:55 +01:00
Giancarlo Erra c2d012fe4c fix(extension): tighten graphPanel path and line-number bounds
Two follow-up review fixes on the interactive-graph webview surface:

- `loadGraphHtml(projectId)` now resolves the projectId-derived path
  and verifies the result stays inside `GRAPH_DIR`. The
  `socraticode.openInteractiveGraph` command accepts an arbitrary
  argument from any caller (palette, sidebar, other extensions), so a
  value like `../../etc/passwd` would otherwise escape the cache
  directory via `path.join`. Suspicious projectIds are now rejected
  with a log entry and the function returns `undefined`.
- `handleWebviewMessage` now opens the document first, then clamps
  the requested line number against the document's actual `lineCount`
  before constructing a `Range`. The previous `m.line > 0` check
  prevented negatives but allowed absurdly large values (e.g.
  `Number.MAX_SAFE_INTEGER`) that would build a Range past the end of
  the file. Selection is set on the resolved editor.

Lint, typecheck, manifest tests and build all clean.
2026-05-04 00:52:32 +01:00
Giancarlo Erra 562a946053 fix(extension): harden review-flagged paths
A pass over the extension surface to address review feedback:

Safety / hardening:

- `graphPanel.ts`: validate `m.path` from the webview before opening
  files. Reject absolute paths and any path that escapes the workspace
  root (`..`, `/foo`, `C:/...`). Validate the line number is a positive
  integer before constructing a `Range`. Surface failures via the output
  channel rather than letting the rejection bubble up.
- `mcpProvider.ts`: defensively check that
  `vscode.lm.registerMcpServerDefinitionProvider` exists before calling
  it. The `engines.vscode: ^1.99.0` field already enforces this on
  install, but some VS Code-derived editors mis-report their engine
  version. The extension now degrades gracefully (sidebar, commands,
  status bar still work) instead of failing activation.
- `commands.ts` and `graphPanel.ts`: wrap `workbench.action.chat.open`
  in try/catch. Not every VS Code-compatible editor exposes that
  command; falling back to the output channel avoids unhandled
  rejections after the user clicked "Open chat".
- `extension.ts`: persist the first-run walkthrough flag only after the
  walkthrough command resolves successfully, so a transient failure
  doesn't silently skip the onboarding forever.

CI gates:

- `extension-ci.yml` and `extension-release.yml`: run `npm test` between
  typecheck and build, so manifest-level smoke regressions can't slip
  through to either the PR artefact or the marketplace publishes.

Settings copy:

- `socraticode.env` description: explicitly call out that the setting
  is for non-secret config only. Recommend OS environment variables /
  local `.env` files for API keys, since workspace settings can sync
  via Settings Sync and end up in committed `.vscode/settings.json`.

Quality of life:

- `sidebar.ts` `formatRelative`: clamp the computed seconds to zero so
  a file mtime slightly ahead of the local clock doesn't render
  "-5s ago".
- `walkthroughs/first-index.md`: corrected the embedding model name
  (`nomic-embed-text`, not `mxbai-embed-large`) to match the engine
  default in `src/constants.ts`.

Lint / docs:

- `extension/README.md`: hyphenate "Eclipse Theia-based editors".
- `DEVELOPER.md`: add `text` language hint to the directory-tree code
  fence (markdownlint MD040). Updated the inline comment for
  `settings.ts` to reflect its current shape.
- `README.md`: reflow the "extension vs plugin" callout into a single
  blockquote (markdownlint MD028).

Lint, typecheck, manifest tests and build all clean. Engine unit tests
unaffected (706/706 still pass).
2026-05-04 00:41:51 +01:00
Giancarlo Erra a27b8ceb67 chore(extension): set publisher to giancarloerra
Marketplace URL becomes giancarloerra.socraticode (VS Code Marketplace)
and giancarloerra/socraticode (Open VSX), matching the registered
publisher account. The Altaire sponsor link and license contact email
stay as they are; only the publisher slug and the corresponding
shields.io / marketplace URLs in the README change.
2026-05-03 21:45:14 +01:00
Giancarlo Erra 9a197b3421 docs(extension): add Discord badge and hosted-edition pointer
Two small additions to the marketplace README:

- Discord badge in the header row, next to GitHub stars. Static
  shields.io badge linking to the community Discord. The auto-updating
  member-count variant can replace it later if we publish the server
  ID.
- New short "SocratiCode Cloud (private beta)" section between
  Compatibility and Privacy. Mirrors the wording on socraticode.cloud:
  managed infrastructure, webhook-driven indexing, shared team
  indexes, SSO/SAML, audit logs, SOC 2 / ISO 27001-aligned controls,
  private beta. Request-access link only; no in-extension Cloud
  feature is implied.

Marketplace listing now points readers at both the community channel
and the hosted edition without overstating either.
2026-05-03 21:15:59 +01:00