# Developer Guide
This document covers the internals of **SocratiCode** — architecture, data flow, configuration, and how to build, extend, and debug.
## Table of Contents
- [Architecture Overview](#architecture-overview)
- [Prerequisites for Development](#prerequisites-for-development)
- [Building and Running](#building-and-running)
- [Project Structure](#project-structure)
- [Configuration & Constants](#configuration--constants)
- [Data Flow: Indexing](#data-flow-indexing)
- [Data Flow: Search](#data-flow-search)
- [Data Flow: Incremental Update](#data-flow-incremental-update)
- [Data Flow: Code Graph](#data-flow-code-graph)
- [Testing](#testing)
- [Services Reference](#services-reference)
- [MCP Tools Reference](#mcp-tools-reference)
- [Data Structures](#data-structures)
- [Docker & Infrastructure](#docker--infrastructure)
- [Extending the Indexer](#extending-the-indexer)
- [VS Code / Open VSX Extension](#vs-code--open-vsx-extension)
- [Troubleshooting](#troubleshooting)
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────┐
│ MCP Host │
│ (VS Code, Claude Desktop, etc.) │
└──────────────────────┬──────────────────────────────┘
│ stdio (JSON-RPC)
┌──────────────────────▼──────────────────────────────┐
│ MCP Server (src/index.ts) │
│ │
│ ┌──────────┐ ┌───────────┐ ┌───────┐ ┌──────────┐ │
│ │ Index │ │ Query │ │ Graph │ │ Manage │ │
│ │ Tools │ │ Tools │ │ Tools │ │ Tools │ │
│ └────┬─────┘ └─────┬─────┘ └───┬───┘ └────┬─────┘ │
│ │ │ │ │ │
│ ┌────▼─────────────▼───────────▼───────────▼─────┐ │
│ │ Services │ │
│ │ ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Indexer │ │ Qdrant │ │ Ollama │ │ Docker │ │ │
│ │ │ │ │ Client │ │ Client │ │ Mgmt │ │ │
│ │ └────┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │ │
│ │ │ │ │ │ │ │
│ │ ┌────▼────┐ ┌───▼────┐ ┌──▼──────┐ │ │
│ │ │ Ignore │ │Embedder│ │ Watcher │ │ │
│ │ │ Filter │ │ │ │(@parcel) │ │ │
│ │ └─────────┘ └────────┘ └─────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────────┐
│ Qdrant (Docker) │ │ Ollama (Docker) │
│ localhost:16333 │ │ localhost:11435 │
│ │ │ │
│ Vector storage │ │ nomic-embed-text │
│ 768-dim cosine │ │ 768-dim embeddings │
└──────────────────┘ └─────────────────────┘
```
---
## Prerequisites for Development
| Tool | Version | Purpose |
|------|---------|---------|
| Node.js | 18+ | Runtime |
| npm | 9+ | Package manager |
| TypeScript | 5.7+ | Installed as devDependency |
| Docker | Any recent | Runs Qdrant |
| Ollama | Any recent | Runs embedding model |
---
## Building and Running
### Install dependencies
```bash
npm install
```
### Build
```bash
npm run build
```
This compiles TypeScript from `src/` to `dist/` with source maps and declarations.
### Run directly (development)
```bash
npm run dev
```
Uses `tsx` to run TypeScript directly without a build step.
### Run built version
```bash
npm start
# or
node dist/index.js
```
The server communicates over **stdio** using JSON-RPC (MCP protocol). It's designed to be launched by an MCP host, not run standalone in a terminal. For testing, you can use the MCP Inspector or pipe JSON-RPC messages.
### TypeScript Configuration
- **Target**: ES2022
- **Module**: Node16 (ESM)
- **Strict mode**: Enabled
- **Output**: `dist/` with source maps and `.d.ts` declarations
### Linting
SocratiCode uses [Biome](https://biomejs.dev/) for linting. Biome is fast, zero-config, and catches unused imports, style issues, and potential bugs.
```bash
# Check for lint issues
npm run lint
# Auto-fix safe issues
npm run lint:fix
```
For VS Code, install the [Biome extension](https://marketplace.visualstudio.com/items?itemName=biomejs.biome) for real-time lint feedback and auto-fix on save.
### Versioning & Releases
SocratiCode uses [Conventional Commits](https://www.conventionalcommits.org/) and [release-it](https://github.com/release-it/release-it) for automated versioning and changelog generation.
**Commit message format:**
```
feat: add fuzzy search support → Features
fix: resolve race condition → Bug Fixes
perf: optimise embedding batching → Performance
refactor: simplify provider factory → Refactors
docs: update quickstart guide → Documentation
test: add watcher edge-case tests → Tests
chore: update deps → hidden from changelog
```
**Before committing:**
```bash
npm run lint && npx tsc --noEmit && npm run test:unit
```
**Creating a release** (maintainers only):
```bash
# Interactive — prompts for patch/minor/major
npm run release
# Dry run — preview what will happen without making changes
npm run release:dry
```
This will automatically:
1. Determine the version bump from your commits
2. Update `CHANGELOG.md` with all `feat:`, `fix:`, etc. entries
3. Bump the version in `package.json`
4. Create a git commit and tag (`v1.1.0`)
5. Push to GitHub and create a GitHub Release
---
## Project Structure
```
src/
├── index.ts # MCP server entry point — registers all 21 tools
├── config.ts # Project ID generation (SHA-256), collection naming, linked projects, branch detection
├── constants.ts # All constants: ports, container names, models, chunk sizes, extensions
├── types.ts # TypeScript interfaces and types
│
├── services/
│ ├── docker.ts # Docker CLI wrapper — manage Qdrant & Ollama containers
│ ├── ollama.ts # Ollama client — model availability, embedding calls
│ ├── embeddings.ts # Embedding generation with batching and task prefixes
│ ├── qdrant.ts # Qdrant client — collections, upsert, search, metadata
│ ├── indexer.ts # Core indexing — file discovery, chunking, full/incremental
│ ├── watcher.ts # File system watcher via @parcel/watcher with debouncing
│ ├── lock.ts # Cross-process file-based locking via proper-lockfile
│ ├── ignore.ts # Ignore filter (.gitignore + .socraticodeignore + defaults)
│ ├── logger.ts # Structured JSON logging — stderr (startup/no MCP transport) or MCP notifications/message (when hosted)
│ ├── code-graph.ts # AST-based code graph building via ast-grep
│ ├── graph-analysis.ts # Graph queries: dependencies, stats, cycles, Mermaid diagrams
│ ├── graph-aliases.ts # Path alias resolution from tsconfig/jsconfig compilerOptions.paths
│ ├── graph-imports.ts # Import/require/use extraction for 18+ languages via AST
│ ├── graph-resolution.ts # Module specifier → file path resolution (incl. aliases, SCSS partials)
│ ├── graph-symbols.ts # Per-language symbol & call-site extraction (Impact Analysis)
│ ├── graph-symbol-resolution.ts # Three-tier cross-file call-site resolution
│ ├── graph-entrypoints.ts # Entry-point detection (orphans + main() + framework patterns + tests)
│ ├── graph-impact.ts # Impact / flow / context / list analysis primitives
│ ├── symbol-graph-store.ts # Sharded Qdrant storage layer for the symbol graph
│ ├── symbol-graph-cache.ts # Per-project LRU cache with lazy shard loading
│ ├── startup.ts # Startup lifecycle: auto-resume, graceful shutdown coordination
│ └── context-artifacts.ts # Context artifact loading, chunking, indexing, search
│
├── tools/
│ ├── index-tools.ts # Handlers: codebase_index, codebase_update, codebase_remove, codebase_stop, codebase_watch
│ ├── query-tools.ts # Handlers: codebase_search, codebase_status
│ ├── graph-tools.ts # Handlers: codebase_graph_*, codebase_impact, codebase_flow, codebase_symbol(s)
│ ├── context-tools.ts # Handlers: codebase_context, codebase_context_search/index/remove
│ └── manage-tools.ts # Handlers: codebase_health, codebase_list_projects, codebase_about
tests/
├── helpers/
│ ├── fixtures.ts # Test fixture utilities (temp projects, Docker checks)
│ └── setup.ts # Integration test infrastructure (Qdrant client, cleanup)
├── unit/ # 608 tests — no Docker required
├── integration/ # 137 tests — requires Docker
└── e2e/ # 20 tests — full lifecycle
docker-compose.yml # Alternative way to run infrastructure
vitest.config.ts # Test framework configuration
```
---
## Configuration & Constants
All constants are defined in `src/constants.ts`:
| Constant | Value | Description |
|----------|-------|-------------|
| `SEARCH_DEFAULT_LIMIT` | `10` | Default search results per query (env-configurable, 1-50) |
| `SEARCH_MIN_SCORE` | `0.10` | Minimum RRF score threshold (env-configurable, 0-1) |
| `CHUNK_SIZE` | `100` | Lines per chunk |
| `CHUNK_OVERLAP` | `10` | Overlapping lines between adjacent chunks |
| `MAX_FILE_BYTES` | `5 MB` | Max file size before skipping (env-configurable via `MAX_FILE_SIZE_MB`) |
| `MAX_AVG_LINE_LENGTH` | `500` | Avg line length above which character-based chunking is used (minified files) |
| `MAX_CHUNK_CHARS` | `2000` | Hard character limit per chunk (provider-level safety net) |
| `QDRANT_PORT` | `16333` | Qdrant HTTP API port (host-side) |
| `QDRANT_GRPC_PORT` | `16334` | Qdrant gRPC port (host-side) |
| `QDRANT_CONTAINER_NAME` | `socraticode-qdrant` | Docker container name |
| `QDRANT_IMAGE` | `qdrant/qdrant:v1.17.0` | Docker image (pinned version) |
| `OLLAMA_PORT` | `11435` | Ollama API port (host-side) |
| `OLLAMA_CONTAINER_NAME` | `socraticode-ollama` | Docker container name |
| `OLLAMA_IMAGE` | `ollama/ollama:latest` | Docker image |
> **Note**: `EMBEDDING_MODEL`, `EMBEDDING_DIMENSIONS`, and `EMBEDDING_CONTEXT_LENGTH` are defined in `src/services/embedding-config.ts`, not in `src/constants.ts`. Defaults are `nomic-embed-text` / `768` for Ollama, `text-embedding-3-small` / `1536` for OpenAI, and `gemini-embedding-001` / `3072` for Google.
### Embedding batch size
Defined in `src/services/embeddings.ts`: texts are sent to Ollama in batches of **32**.
### File watcher debounce
Defined in `src/services/watcher.ts`: file changes are debounced for **2000ms** before triggering an index update.
### Maximum file size
Defined in `src/constants.ts` as `MAX_FILE_BYTES`: files larger than **5 MB** are skipped (configurable via `MAX_FILE_SIZE_MB` env var).
### Qdrant health check
Defined in `src/services/docker.ts`: after starting the container, the server polls `/healthz` up to **30 times** with **1000ms** between retries.
### Project ID & Collection Naming
Defined in `src/config.ts`. `projectIdFromPath()` resolves the project ID with the following precedence (highest first):
1. **`SOCRATICODE_PROJECT_ID` env var** — per-machine override; bypasses both file lookup and path hashing.
2. **`projectId` field in `.socraticode.json`** — committed, team-wide stable identifier; survives different filesystem layouts and OS users.
3. **First 12 characters of SHA-256 of the absolute project path** — default fallback.
In both override paths the value must match `[a-zA-Z0-9_-]+`; whitespace is trimmed; empty/whitespace-only values fall through to the next level. Invalid characters in an explicit override fail loud (throw) — silent fallback would hide misconfigurations that map a project to the wrong (or new) collection.
Collection names derived from the project ID:
- **Code collection**: `codebase_{projectId}`
- **Graph collection**: `codegraph_{projectId}`
- **Context artifacts collection**: `context_{projectId}`
With the default (path-hash) ID, the same folder path always maps to the same collection across restarts. With either override, the mapping is stable across machines and checkouts.
#### Branch-aware mode
When `SOCRATICODE_BRANCH_AWARE=true`, the current git branch is detected via `git rev-parse --abbrev-ref HEAD` and appended to the project ID (e.g. `abc123def456__feat_my-feature`). Branch names are sanitized: non-alphanumeric characters (except `-`) become `_`, consecutive underscores collapse, leading/trailing underscores are stripped. Detached HEAD states fall back to the branchless ID. Ignored when `SOCRATICODE_PROJECT_ID` is set explicitly or when `projectId` is set in `.socraticode.json` — explicit identifiers are treated as stable and not augmented per branch.
#### Linked projects
`loadLinkedProjects()` reads `.socraticode.json` and `SOCRATICODE_LINKED_PROJECTS` env var. `resolveLinkedCollections()` maps linked paths to `{ name, label }` descriptors for `searchMultipleCollections()`. The current project is always first (highest dedup priority).
### Supported File Extensions (54)
| Category | Extensions |
|----------|-----------|
| JavaScript/TypeScript | `.js`, `.jsx`, `.ts`, `.tsx`, `.mjs`, `.cjs` |
| Python | `.py`, `.pyw`, `.pyi` |
| Java/Kotlin/Scala | `.java`, `.kt`, `.kts`, `.scala` |
| C/C++ | `.c`, `.h`, `.cpp`, `.hpp`, `.cc`, `.hh`, `.cxx` |
| C# | `.cs` |
| Go | `.go` |
| Rust | `.rs` |
| Ruby | `.rb` |
| PHP | `.php` |
| Swift | `.swift` |
| Shell | `.sh`, `.bash`, `.zsh` |
| Web | `.html`, `.htm`, `.css`, `.scss`, `.sass`, `.less`, `.vue`, `.svelte` |
| Config | `.json`, `.yaml`, `.yml`, `.toml`, `.xml`, `.ini`, `.cfg` |
| Documentation | `.md`, `.mdx`, `.rst`, `.txt` |
| SQL | `.sql` |
| Dart | `.dart` |
| Lua | `.lua` |
| R | `.r`, `.R` |
| Docker | `.dockerfile` |
Special files always indexed: `Dockerfile`, `Makefile`, `Rakefile`, `Gemfile`, `Procfile`, `.env.example`, `.gitignore`, `.dockerignore`.
### Built-in Ignore Patterns (45)
The full list is in `src/services/ignore.ts`. Key entries: `node_modules`, `.git`, `dist`, `build`, `.next`, `__pycache__`, `.venv`, `target`, `.idea`, `.vscode`, `*.min.js`, `*.lock`, `package-lock.json`, `yarn.lock`, `coverage`, `vendor`, `.DS_Store`, `Thumbs.db`.
---
## Data Flow: Indexing
When `codebase_index` is called:
```
1. INFRASTRUCTURE CHECK
handleIndexTool() → ensureQdrantReady() + ensureOllamaReady()
├── Check Docker CLI: docker info
├── Check Qdrant image: docker images (qdrant/qdrant:v1.17.0)
├── Pull image if missing: docker pull qdrant/qdrant:v1.17.0
├── Check container: docker ps --filter name=socraticode-qdrant
├── Create/start container with volume mount
├── Wait for /healthz (up to 30s, 30 × 1s retries)
├── Check Ollama container: docker ps --filter name=socraticode-ollama
├── Start Ollama container if needed
├── Check for nomic-embed-text model
└── Pull model if missing
2. FILE DISCOVERY
getIndexableFiles(projectPath, extraExts?)
├── glob("**/*") to enumerate all files
├── Build ignore filter: defaults + .gitignore + .socraticodeignore
├── Filter by supported extension, special filename, or extra extensions
└── Filter out ignored paths
3. COLLECTION SETUP
ensureCollection(collectionName)
├── Check if collection exists
├── Create with: provider-dependent dimensions (768/1536/3072), cosine distance, on-disk payload
└── Create payload indexes: filePath, relativePath, language, contentHash
4. FILE SCANNING & CHUNKING (parallel batches of 50 files)
├── Read file content (skip if > 5 MB or unreadable)
├── Hash content: SHA-256 → 16-char prefix
├── Skip if hash matches existing (re-index mode)
├── Chunk using three-tier strategy:
│ ├── Minified/bundled (avg line length > 500): character-based chunking
│ │ └── Splits at safe boundaries (newline, space, semicolon, comma)
│ ├── AST-aware (supported languages): chunk at function/class boundaries
│ │ ├── ast-grep parses top-level declarations
│ │ ├── Small declarations merged, large ones sub-chunked
│ │ └── Preamble (imports) and epilogue handled separately
│ └── Line-based fallback: 100-line segments with 10-line overlap
├── Hard character cap (2000 chars) applied to all chunks
├── Generate chunk ID: SHA-256 of "filePath:startLine" formatted as UUID
└── Detect language from file extension
5. BATCHED EMBEDDING + UPSERT (50 files per batch)
For each batch of files:
├── Prepare text: "search_document: {relativePath}\n{content}"
├── Generate embeddings via configured provider (further batched internally)
├── Upsert to Qdrant with dense vector + BM25 text + payload
├── Update in-memory file hashes
├── Checkpoint: persist hashes to Qdrant (progress survives crashes)
└── Check for cancellation request before next batch
6. POST-INDEX
├── Save final metadata (status: "completed")
├── Auto-build code dependency graph (non-fatal on failure)
└── Auto-index context artifacts if config exists (non-fatal on failure)
```
---
## Data Flow: Search
When `codebase_search` is called:
```
1. Generate query embedding
├── Prepare text: "search_query: {query}"
└── Send to configured embedding provider → provider-dependent vector (768 / 1536 / 3072 dims)
2. HYBRID SEARCH (dense + BM25, RRF-fused)
├── Build two parallel prefetch sub-queries:
│ ├── Dense: query vector → semantic cosine similarity (client-side)
│ └── BM25: query text → server-side BM25 inference (Qdrant v1.15.2+)
├── Apply optional filters (filePath, language) as payload conditions on both sub-queries
├── Qdrant Query API runs both sub-queries then fuses results via Reciprocal Rank Fusion (RRF)
└── Return top N results with RRF-combined scores and payloads
3. Format results
└── Each result: file path, line range, language, RRF score, code content
```
### nomic-embed-text Task Prefixes
The `nomic-embed-text` model uses task-specific prefixes for asymmetric retrieval:
- **Documents** are prefixed with `search_document: ` — this tells the model to encode the text as a passage to be retrieved.
- **Queries** are prefixed with `search_query: ` — this tells the model to encode as a search query.
This asymmetric encoding significantly improves retrieval quality.
---
## Data Flow: Incremental Update
When `codebase_update` is called:
```
1. Check if collection exists and has points
└── If empty/missing → fall back to full indexProject()
2. Enumerate current files on disk
└── Same filtering as full index (extensions, ignore rules)
3. Compare against in-memory hash map
├── File hash matches → skip (unchanged)
├── File hash differs → delete old chunks, re-chunk, re-embed, upsert
├── File not in hash map → new file, chunk, embed, upsert
└── Hash map entry not on disk → deleted file, remove chunks
4. Return delta: { added, updated, removed, chunksCreated, cancelled }
```
> **Note**: File content hashes are persisted in Qdrant after each batch. On server restart, hashes are loaded from Qdrant on first use, so incremental updates remain truly incremental across restarts.
---
## Data Flow: Code Graph
When `codebase_graph_build` is called:
```
0. CONCURRENCY GUARD
├── If a build is already in progress for this project, return the
│ existing in-flight promise (deduplication — callers share the result)
└── Otherwise, start a new tracked build
1. BACKGROUND EXECUTION (fire-and-forget)
├── Tool returns immediately with "build started" message
├── Actual build runs asynchronously on the event loop
├── Progress tracked via GraphBuildProgress { filesTotal, filesProcessed, phase }
└── Client polls codebase_graph_status for progress %
2. FILE DISCOVERY (phase: "scanning files")
├── Get graphable files from project (same ignore filters as indexing)
└── Include files with AST-grep grammar + files with extra extensions
3. PARSE IMPORTS (phase: "analyzing imports", per file, via ast-grep)
├── Determine AST-grep language from file extension
│ ├── Built-in: TypeScript, JavaScript, Python, Java, Kotlin, etc.
│ └── Dynamic: C, C++, C#, Go, Rust, Ruby, PHP, Swift, Bash, Scala
├── Files with extra extensions (no AST grammar) → leaf nodes only
├── Parse file with ast-grep
├── Extract import/require/use/include statements using AST patterns
│ ├── JavaScript/TypeScript: import ... from, require(), dynamic import()
│ ├── Python: import, from ... import
│ ├── Java/Kotlin/Scala: import statements
│ ├── Go: import declarations
│ ├── Rust: use, mod
│ ├── Ruby: require, require_relative
│ ├── PHP: use, require, include
│ ├── C/C++: #include
│ ├── Swift: import
│ ├── Bash: source, . (dot)
│ ├── Dart/Lua: regex-based extraction
│ ├── Svelte/Vue: HTML parse → ` in a symbol name
cannot break out.
3. `services/graph-visualize-browser.ts::writeInteractiveGraphFile()`
writes to `os.tmpdir()/socraticode-graph/${projectId}.html` —
deterministic path per project, overwritten on each call.
4. `openInBrowser()` uses the `open` npm package (no new transitive
deps; wraps `open` on macOS, `xdg-open` on Linux, `start` on
Windows). Failure is soft — the tool output still includes the file
path so the user can open it manually.
5. The viewer (`viewer-app.js`) uses `document.createElement` +
`textContent` exclusively (no `innerHTML`) so data fields with
HTML-looking content are neutral.
When the symbol graph overflows the caps, the HTML still renders the
file view; the Symbols toggle is shown but explicitly disabled with a
banner directing the user to `codebase_impact` / `codebase_symbols` for
symbol-level queries. Per-file symbol lists in the sidebar remain
available regardless of total graph size.
#### `codebase_graph_remove`
```
Parameters:
projectPath: string — Absolute path (required)
Returns: Confirmation message
Behaviour:
Waits for any in-flight graph build to finish before removing the persisted
graph data and clearing the in-memory cache.
```
#### `codebase_graph_status`
```
Parameters:
projectPath?: string — Absolute path (defaults to cwd)
Returns:
If build in progress: Status BUILDING with phase, progress %, elapsed time
If ready: Status READY with node/edge count, last built time, cache status, last build duration
If not found: Instructions to build
```
### Management Tools
#### `codebase_health`
```
Parameters: none
Returns: Status of Docker, Qdrant image, Qdrant container, Ollama, and embedding model
```
#### `codebase_list_projects`
```
Parameters: none
Returns: List of all Qdrant collections (indexed projects and their graph status)
```
#### `codebase_about`
```
Parameters: none
Returns: Information about SocratiCode — philosophy, features, and capabilities
```
### Context Artifact Tools
#### `codebase_context`
```
Parameters:
projectPath?: string — Absolute path (defaults to cwd)
Returns: List of all artifacts defined in .socraticodecontextartifacts.json with names, descriptions, paths, and index status
```
#### `codebase_context_search`
```
Parameters:
query: string — Natural language search query
projectPath?: string — Absolute path (defaults to cwd)
artifactName?: string — Filter to a specific artifact name
limit?: number — Max results (default: 10, range: 1-50)
minScore?: number — 0-1, minimum RRF score threshold (default: 0.10, env: SEARCH_MIN_SCORE)
Returns: Ranked context chunks matching the query. Auto-indexes on first use, auto-detects staleness.
```
#### `codebase_context_index`
```
Parameters:
projectPath?: string — Absolute path (defaults to cwd)
Returns: Number of indexed artifacts and chunks per artifact
```
#### `codebase_context_remove`
```
Parameters:
projectPath: string — Absolute path (required)
Returns: Confirmation message
Behaviour:
Refuses removal if indexing/update is in progress (which includes context
artifact indexing). Returns a message suggesting codebase_stop or waiting.
```
---
## Data Structures
### FileChunk
```typescript
interface FileChunk {
id: string; // SHA-256 of "filePath:startLine" formatted as UUID (36 chars, 8-4-4-4-12)
filePath: string; // Absolute path
relativePath: string; // Relative to project root
content: string; // Chunk text content
startLine: number; // 1-based line number
endLine: number; // Inclusive
language: string; // Detected from extension
type: "code" | "comment" | "mixed";
}
```
### SearchResult
```typescript
interface SearchResult {
filePath: string;
relativePath: string;
content: string;
startLine: number;
endLine: number;
language: string;
score: number; // RRF (Reciprocal Rank Fusion) score from hybrid search
}
```
### CodeGraph / CodeGraphNode / CodeGraphEdge
```typescript
interface CodeGraph {
nodes: CodeGraphNode[];
edges: CodeGraphEdge[];
}
interface CodeGraphNode {
filePath: string;
relativePath: string;
imports: string[]; // Module specifiers (e.g. "./utils")
exports: string[];
dependencies: string[]; // Resolved relative paths
dependents: string[]; // Files that import this file
}
interface CodeGraphEdge {
source: string; // Relative path of importer
target: string; // Relative path of imported
type: "import" | "re-export" | "dynamic-import";
}
```
### HealthStatus
```typescript
interface HealthStatus {
docker: boolean;
ollama: boolean;
qdrant: boolean;
ollamaModel: boolean;
qdrantImage: boolean;
ollamaImage: boolean;
}
```
### ContextArtifact / ArtifactIndexState
```typescript
/** A context artifact defined in .socraticodecontextartifacts.json */
interface ContextArtifact {
name: string; // Unique name (e.g. "database-schema")
path: string; // File or directory path (relative or absolute)
description: string; // Describes what this artifact is and how AI should use it
}
/** Runtime state of an indexed artifact */
interface ArtifactIndexState {
name: string;
description: string;
resolvedPath: string; // Absolute path
contentHash: string; // For staleness detection
lastIndexedAt: string; // ISO timestamp
chunksIndexed: number; // Number of chunks stored in Qdrant
}
```
### ProjectConfig
```typescript
interface ProjectConfig {
projectId: string; // 12-char SHA-256 prefix
projectPath: string;
collectionName: string; // "codebase_{projectId}"
graphCollectionName: string; // "codegraph_{projectId}"
lastIndexedAt?: string;
}
```
---
## Docker & Infrastructure
### Qdrant Container
The server manages a single Qdrant container with these settings:
```
Name: socraticode-qdrant
Image: qdrant/qdrant:v1.17.0
Ports: 16333:6333 (HTTP REST API), 16334:6334 (gRPC)
Volume: socraticode_qdrant_data:/qdrant/storage
Restart: unless-stopped
```
### Ollama Container
The server manages a single Ollama container:
```
Name: socraticode-ollama
Image: ollama/ollama:latest
Ports: 11435:11434 (API)
Volume: socraticode_ollama_data:/root/.ollama
Restart: unless-stopped
```
**Data persistence**: The named Docker volumes persist across container restarts and upgrades. Your indexes and models survive server and Docker restarts.
**Alternative**: Instead of the server auto-managing the containers, you can run them yourself via `docker-compose up -d` using the included `docker-compose.yml`. The server will detect the already-running containers and skip creation.
### Qdrant Collection Schema
Each project gets a collection with:
- **Vectors**: Provider-dependent dimensions (768 for Ollama, 1536 for OpenAI, 3072 for Google), cosine distance
- **Optimizers**: 2 segments
- **Payload storage**: On-disk (to handle large codebases)
- **Payload indexes**: `filePath` (keyword), `relativePath` (keyword), `language` (keyword), `contentHash` (keyword)
---
## Extending the Indexer
### Adding a new file extension
Edit `SUPPORTED_EXTENSIONS` and `getLanguageFromExtension()` in `src/constants.ts`.
### Changing chunk size or overlap
Edit `CHUNK_SIZE` and `CHUNK_OVERLAP` in `src/constants.ts`. Smaller chunks give more precise search results but use more storage and embedding calls. Larger chunks give more context per result.
### Switching embedding model or provider
1. Set `EMBEDDING_PROVIDER` in your MCP config env block (`ollama`, `openai`, or `google`).
2. Optionally override `EMBEDDING_MODEL` and `EMBEDDING_DIMENSIONS` for the chosen provider (auto-detected defaults exist for all built-in models).
3. Re-index all projects (`codebase_remove` then `codebase_index`) since existing vectors have different dimensions.
See `src/services/embedding-config.ts` for all supported environment variables and per-provider defaults.
### Adding a new MCP tool
1. Define the handler in the appropriate file under `src/tools/`
2. Register the tool in `src/index.ts` using `server.tool()`
3. Follow the existing pattern: zod schema for input, string return value
### Adding new ignore patterns
Edit `DEFAULT_IGNORE_PATTERNS` in `src/services/ignore.ts`.
---
## VS Code / Open VSX Extension
The repo also ships a regular VS Code extension that auto-registers the
SocratiCode MCP server in any MCP-aware host (Copilot agent mode, Cline,
Continue, Roo Code) and adds native UI: sidebar, status bar, interactive
graph webview, walkthrough, and palette commands. The same `.vsix` is
published to both VS Code Marketplace and Open VSX, so it installs in
Cursor, VSCodium, Gitpod, code-server, Theia, Antigravity, and Particle
Workbench in addition to stock VS Code.
Source: [`extension/`](./extension)
### Layout
```text
extension/
├── package.json # extension manifest
├── tsconfig.json # TS strict, ES2022, Node 18 target
├── biome.json # lint config
├── esbuild.config.mjs # bundles src/extension.ts -> dist/extension.js
├── README.md # marketplace landing page
├── CHANGELOG.md # extension-specific changelog
├── images/icon.png # marketplace icon
├── walkthroughs/ # markdown shown in the getting-started walkthrough
└── src/
├── extension.ts # activation entrypoint
├── mcpProvider.ts # registerMcpServerDefinitionProvider
├── sidebar.ts # TreeDataProvider for the activity-bar view
├── graphPanel.ts # webview panel for the interactive graph
├── statusBar.ts # status-bar item
├── commands.ts # command palette commands
├── settings.ts # typed config accessors
├── output.ts # log/output channel
└── __tests__/ # node:test smoke tests for the manifest
```
### Local development
```bash
cd extension
npm install
npm run watch # rebuild dist/extension.js on save
```
To debug the extension live in a real VS Code instance:
1. Open the `extension/` folder in VS Code.
2. `F5` (Run > Start Debugging) launches a new "Extension Development Host"
window with the extension loaded.
3. Open any folder in that host window. The MCP server registration kicks
in on activation; the sidebar appears in the Activity Bar.
4. Output Channel: `View > Output > SocratiCode`.
### Build, lint, test, package
```bash
cd extension
npm run lint # biome check
npm run typecheck # tsc --noEmit
npm run compile # esbuild bundle to dist/
npm test # node:test smoke tests on the manifest
npm run package # produces socraticode-.vsix
```
### Publishing
The extension is published to two registries from the same `.vsix`:
- **VS Code Marketplace** via `vsce publish` (Azure DevOps PAT)
- **Open VSX Registry** via `ovsx publish` (Eclipse PAT)
CI handles this on `v*` tags via `.github/workflows/extension-release.yml`.
The required GitHub Actions secrets are `VSCE_PAT` (Azure DevOps personal
access token with Marketplace Manage scope) and `OVSX_PAT` (Eclipse
access token with Open VSX namespace). Each is set up once per
publisher account. Maintainer notes for the full submission flow live
outside this public repo.
To publish manually:
```bash
cd extension
npm run publish:vsce # Microsoft marketplace
npm run publish:ovsx # Open VSX
# or both at once:
npm run publish:all
```
### Versioning
The extension's version tracks the engine version. The
`scripts/bump-plugin-versions.mjs` `release-it` hook bumps
`extension/package.json` along with every plugin manifest, so an engine
release `vX.Y.Z` automatically bumps the extension to `X.Y.Z`. Patch
drift is allowed for extension-only hotfixes (e.g. release the engine
at `1.7.2` but ship the extension at `1.7.3` for a UI bug fix).
### What the extension does NOT do
- It does **not** re-implement the engine. All search, dependency-graph
analysis, impact analysis and indexing happen in the engine via the
registered MCP server. The extension is a thin distribution and UI
shell.
- It does **not** ship its own copy of the engine. The engine launches
via `npx -y socraticode` (configurable via the `socraticode.command` /
`socraticode.args` settings).
- It does **not** add language-server features (code lenses, hovers,
diagnostics). Those would conflict with the host editor's existing
language servers and aren't aligned with the engine's value
proposition.
---
## Troubleshooting
### "Docker is not available"
Make sure Docker Desktop is installed and running. On Linux, ensure the Docker daemon is started and your user is in the `docker` group.
### "Ollama is not available"
The Ollama container is managed automatically via Docker. Check that the `socraticode-ollama` container is running with `docker ps`. If it's not starting, check `docker logs socraticode-ollama`.
### Qdrant health check times out
The container may be slow to start. Try:
```bash
docker logs socraticode-qdrant
```
### Search returns no results
Make sure the project has been indexed first (`codebase_index`). Check the status with `codebase_status`.
### Code graph returns empty
The code graph uses ast-grep for AST-based import extraction. It works for 18+ languages. If a file has no recognised imports (or uses non-standard import patterns), it may appear as an orphan node.
### Large codebase is slow to index
- Initial indexing is CPU/IO intensive (embedding generation). Subsequent updates are incremental and much faster.
- Files over 5 MB are automatically skipped (configurable via `MAX_FILE_SIZE_MB`).
- Consider adding large generated or data files to `.socraticodeignore`.
### Server crashes on startup
Check that the `dist/` directory exists. Run `npm run build` first.
### Qdrant Manual Management
Qdrant exposes a REST API on port 16333. You can inspect and clean up directly via `curl`:
**List all collections:**
```bash
curl -s http://localhost:16333/collections | python3 -m json.tool
```
**Delete a specific collection:**
```bash
curl -X DELETE http://localhost:16333/collections/codebase_
curl -X DELETE http://localhost:16333/collections/codegraph_
```
**Delete ALL collections (nuclear option):**
```bash
curl -s http://localhost:16333/collections | python3 -c "
import sys, json
for c in json.load(sys.stdin)['result']['collections']:
print(c['name'])
" | while read name; do
echo "Deleting $name"
curl -s -X DELETE "http://localhost:16333/collections/$name" > /dev/null
done
```
**Wipe the entire Qdrant volume (most thorough):**
```bash
docker stop socraticode-qdrant
docker rm socraticode-qdrant
docker volume rm socraticode_qdrant_data
```
The server will recreate everything automatically on next use.