diff --git a/.gitignore b/.gitignore index 9e8870f..d0becba 100644 --- a/.gitignore +++ b/.gitignore @@ -48,3 +48,6 @@ pnpm-lock.yaml .opencode/ .opencode-agenthub/ .opencode-agenthub.user.json + +# Superpowers local planning artifacts +docs/superpowers/plans/ diff --git a/docs/architecture.md b/docs/architecture.md index f13c8d4..4d2fe03 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -73,15 +73,16 @@ Long-term memory that persists across sessions within the same workspace. Perfec ### Memory Extraction -During compaction, the plugin scans for `` blocks: +During compaction, the plugin scans for `Memory candidates:` sections: ``` - +Memory candidates: - [decision] Use npm cache for plugin loading - [project] This repo uses TypeScript with strict mode - ``` +**Legacy Format**: The plugin also accepts `` XML blocks for backward compatibility, but this format is deprecated. + **Quality Gate**: Not all candidates become memories. The plugin rejects: - Git commit hashes (e.g., `abc1234`) - Raw errors (e.g., `Error: something failed`) @@ -180,15 +181,20 @@ Hot session state is injected after workspace memory: ``` --- - -- [project] This repo uses TypeScript with strict mode - -Active Files: +Hot session state (current session): + +active_files: - src/plugin.ts (edit, 18x) - tests/plugin.test.ts (edit, 5x) -Open Errors: (none) +open_errors: (none) + +recent_decisions: +- Use frozen workspace memory snapshots for cache stability + +pending_memories: +- [decision] Parser supports 3 candidate formats ``` ## Layer 3: Native OpenCode State diff --git a/docs/superpowers/plans/2026-04-25-memory-v2-redesign.md b/docs/superpowers/plans/2026-04-25-memory-v2-redesign.md deleted file mode 100644 index 2e60a51..0000000 --- a/docs/superpowers/plans/2026-04-25-memory-v2-redesign.md +++ /dev/null @@ -1,976 +0,0 @@ -# Memory V2 Redesign Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Replace the current heavy four-tier memory plugin with a low-token, no-extra-agent-call memory system that provides workspace-scoped long-term memory and session hot state. - -**Architecture:** Implement three layers: stable workspace memory, hot session state, and native OpenCode state integration. Workspace memory is frozen per session and refreshed at compaction boundaries; hot session state tracks active files and unresolved blocking errors automatically from tool events; OpenCode todos remain owned by OpenCode and are only read during compaction. - -**Tech Stack:** TypeScript, OpenCode Plugin hooks, Node/Bun file APIs, JSON sidecar storage under user data directory, TypeScript typecheck via `npm run typecheck`. - ---- - -## Design Summary - -### What changes - -- Remove default agent-visible memory tools from the normal flow. -- Remove raw tool-output cache and pressure-monitor intervention from the core path. -- Add workspace-scoped long-term memory that persists across sessions but does not cross workspaces. -- Add hot session state that is fully automatic and tiny: active files, open blocking errors, and recent decisions for compaction only. -- Reuse OpenCode compaction to extract long-term memory candidates with no extra LLM call. -- Read OpenCode todos during compaction instead of duplicating todo storage. - -### What stays out of memory - -- Long-term memory does **not** save file lists, stack traces, code signatures, API docs, git history, architecture snapshots, or temporary task progress. -- Short-term memory does **not** save todos or dependency facts because OpenCode and project files already own those. - ---- - -## File Structure - -Current project has a single `index.ts`. This plan splits memory behavior into focused modules while keeping `index.ts` as the plugin entrypoint. - -### Create - -- `src/paths.ts` — computes workspace-scoped storage paths under user data directory. -- `src/storage.ts` — atomic JSON read/write helpers with safe defaults. -- `src/types.ts` — canonical schemas and constants for long-term memory and session state. -- `src/workspace-memory.ts` — load/save/merge/render long-term workspace memory. -- `src/session-state.ts` — load/save/update/render active files, open errors, recent decisions. -- `src/extractors.ts` — deterministic extraction from user messages, tool args, bash output, and compaction summaries. -- `src/opencode.ts` — thin wrappers around OpenCode SDK calls for latest user messages, summaries, and todos. -- `src/plugin.ts` — hook orchestration. -- `tests/extractors.test.ts` — unit tests for deterministic extraction. -- `tests/workspace-memory.test.ts` — unit tests for merge, dedupe, limits, staleness rendering. -- `tests/session-state.test.ts` — unit tests for active files and error lifecycle. - -### Modify - -- `index.ts` — replace monolithic implementation with `export { default } from "./src/plugin";`. -- `package.json` — add a test script using Node’s built-in test runner or Bun test depending available runtime. -- `README.md` — update feature description from four-tier memory to Memory V2. -- `docs/architecture.md` — replace stale four-tier docs with three-layer design. -- `docs/configuration.md` — document limits and optional debug tools. -- `AGENTS.md` — update development guide, storage paths, and testing commands. - ---- - -## Wave 1 — Storage, Types, and Deterministic Core - -### Task 1: Add canonical types and limits - -**Files:** -- Create: `src/types.ts` - -- [ ] **Step 1: Create memory and session schemas** - -Add this file: - -```ts -export type LongTermType = "feedback" | "project" | "decision" | "reference"; - -export type LongTermSource = "explicit" | "compaction" | "manual"; - -export type LongTermMemoryEntry = { - id: string; - type: LongTermType; - text: string; - rationale?: string; - source: LongTermSource; - confidence: number; - status: "active" | "superseded"; - createdAt: string; - updatedAt: string; - staleAfterDays?: number; - supersedes?: string[]; - tags?: string[]; -}; - -export type WorkspaceMemoryStore = { - version: 1; - workspace: { - root: string; - key: string; - }; - limits: { - maxRenderedChars: number; - maxEntries: number; - }; - entries: LongTermMemoryEntry[]; - updatedAt: string; -}; - -export type ActiveFile = { - path: string; - action: "read" | "grep" | "edit" | "write"; - count: number; - lastSeen: number; -}; - -export type OpenError = { - id: string; - category: "typecheck" | "test" | "lint" | "build" | "runtime" | "tool"; - summary: string; - command?: string; - file?: string; - fingerprint: string; - status: "open" | "maybe_fixed"; - firstSeen: number; - lastSeen: number; - seenCount: number; -}; - -export type SessionDecision = { - id: string; - text: string; - rationale?: string; - source: "assistant" | "user" | "compaction"; - createdAt: number; - promotedToLongTerm?: boolean; -}; - -export type SessionState = { - version: 1; - sessionID: string; - turn: number; - updatedAt: string; - activeFiles: ActiveFile[]; - openErrors: OpenError[]; - recentDecisions: SessionDecision[]; -}; - -export const LONG_TERM_LIMITS = { - maxRenderedChars: 5200, - targetRenderedChars: 4200, - maxEntries: 28, - maxEntryTextChars: 260, - maxRationaleChars: 180, -} as const; - -export const HOT_STATE_LIMITS = { - maxRenderedChars: 1200, - maxActiveFilesStored: 20, - maxActiveFilesRendered: 8, - maxOpenErrorsStored: 5, - maxOpenErrorsRendered: 3, - maxRecentDecisionsStored: 8, -} as const; -``` - -- [ ] **Step 2: Run typecheck** - -Run: `npm run typecheck` - -Expected: PASS or existing unrelated failures only. Since file is not imported yet, it should not introduce errors. - ---- - -### Task 2: Add workspace-scoped paths and atomic storage - -**Files:** -- Create: `src/paths.ts` -- Create: `src/storage.ts` - -- [ ] **Step 1: Create `src/paths.ts`** - -```ts -import { createHash } from "crypto"; -import { homedir } from "os"; -import { join } from "path"; -import { realpath } from "fs/promises"; - -export function dataHome(): string { - return process.env.XDG_DATA_HOME ?? join(homedir(), ".local", "share"); -} - -export async function workspaceKey(root: string): Promise { - const resolved = await realpath(root).catch(() => root); - return createHash("sha256").update(resolved).digest("hex").slice(0, 16); -} - -export async function memoryRoot(root: string): Promise { - return join(dataHome(), "opencode-working-memory", "workspaces", await workspaceKey(root)); -} - -export async function workspaceMemoryPath(root: string): Promise { - return join(await memoryRoot(root), "workspace-memory.json"); -} - -export async function sessionStatePath(root: string, sessionID: string): Promise { - return join(await memoryRoot(root), "sessions", `${sessionID}.json`); -} -``` - -- [ ] **Step 2: Create `src/storage.ts`** - -```ts -import { existsSync } from "fs"; -import { mkdir, readFile, rename, writeFile } from "fs/promises"; -import { dirname } from "path"; - -export async function readJSON(path: string, fallback: () => T): Promise { - if (!existsSync(path)) return fallback(); - try { - return JSON.parse(await readFile(path, "utf8")) as T; - } catch { - return fallback(); - } -} - -export async function atomicWriteJSON(path: string, data: unknown): Promise { - await mkdir(dirname(path), { recursive: true }); - const tmp = `${path}.${process.pid}.${Date.now()}.tmp`; - await writeFile(tmp, JSON.stringify(data, null, 2), { encoding: "utf8", mode: 0o600 }); - await rename(tmp, path); -} -``` - -- [ ] **Step 3: Run typecheck** - -Run: `npm run typecheck` - -Expected: PASS. - ---- - -### Task 3: Add extractor tests before implementation - -**Files:** -- Create: `tests/extractors.test.ts` -- Modify: `package.json` - -- [ ] **Step 1: Add test script** - -Modify `package.json` scripts: - -```json -{ - "scripts": { - "build": "node -e \"console.log('No build step required: OpenCode loads index.ts directly')\"", - "typecheck": "tsc --noEmit", - "test": "node --test --experimental-strip-types tests/*.test.ts" - } -} -``` - -- [ ] **Step 2: Write failing tests** - -Create `tests/extractors.test.ts`: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import { - extractExplicitMemories, - extractActiveFiles, - extractErrorsFromBash, - parseWorkspaceMemoryCandidates, -} from "../src/extractors.ts"; - -test("extractExplicitMemories captures clear remember instruction", () => { - const items = extractExplicitMemories("请记住:这个 workspace 的 memory 功能必须默认无感"); - assert.equal(items.length, 1); - assert.equal(items[0].type, "feedback"); - assert.match(items[0].text, /默认无感/); -}); - -test("extractExplicitMemories avoids casual negative commands", () => { - assert.equal(extractExplicitMemories("不要吃这个").length, 0); - assert.equal(extractExplicitMemories("以后再说").length, 0); -}); - -test("extractActiveFiles uses tool args before output", () => { - assert.deepEqual(extractActiveFiles("read", { filePath: "/repo/index.ts" }, "random content"), [ - { path: "/repo/index.ts", action: "read" }, - ]); -}); - -test("extractErrorsFromBash captures typecheck failure", () => { - const errors = extractErrorsFromBash("npm run typecheck", "src/index.ts(10,3): error TS2345: bad type"); - assert.equal(errors.length, 1); - assert.equal(errors[0].category, "typecheck"); - assert.match(errors[0].summary, /TS2345/); -}); - -test("parseWorkspaceMemoryCandidates parses compaction block", () => { - const entries = parseWorkspaceMemoryCandidates(`summary - -- [decision] Use JSON as canonical storage because it is easier to validate. -- [reference] External design notes are in Notion. -`); - assert.equal(entries.length, 2); - assert.equal(entries[0].type, "decision"); - assert.equal(entries[1].type, "reference"); -}); -``` - -- [ ] **Step 3: Run tests and confirm failure** - -Run: `npm test` - -Expected: FAIL because `src/extractors.ts` does not exist. - ---- - -### Task 4: Implement deterministic extractors - -**Files:** -- Create: `src/extractors.ts` - -- [ ] **Step 1: Add extractor implementation** - -```ts -import { createHash } from "crypto"; -import type { ActiveFile, LongTermMemoryEntry, LongTermType, OpenError } from "./types"; -import { LONG_TERM_LIMITS } from "./types"; - -function id(prefix: string): string { - return `${prefix}_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`; -} - -function hash(value: string): string { - return createHash("sha1").update(value).digest("hex").slice(0, 12); -} - -export function extractExplicitMemories(text: string): LongTermMemoryEntry[] { - const patterns = [ - /(?:请记住|記住|记住这一点|remember this|commit to memory)[::]?\s*(.+)$/im, - /(?:从现在开始|從現在開始|从今以后|從今以後|from now on|always)[::]?\s*(.+)$/im, - ]; - - const now = new Date().toISOString(); - const entries: LongTermMemoryEntry[] = []; - - for (const pattern of patterns) { - const match = text.match(pattern); - const body = match?.[1]?.trim(); - if (!body || body.length < 8) continue; - if (/^(再说|再說|later|next time)$/i.test(body)) continue; - - entries.push({ - id: id("mem"), - type: classifyExplicitMemory(body), - text: body.slice(0, LONG_TERM_LIMITS.maxEntryTextChars), - source: "explicit", - confidence: 1, - status: "active", - createdAt: now, - updatedAt: now, - staleAfterDays: staleAfterDaysFor(classifyExplicitMemory(body)), - }); - } - - return entries; -} - -function classifyExplicitMemory(text: string): LongTermType { - const lower = text.toLowerCase(); - if (/https?:\/\/|linear|slack|notion|dashboard|grafana/.test(lower)) return "reference"; - if (/decide|decision|choose|chosen|决定|決定|选择|選擇/.test(lower)) return "decision"; - if (/project|workspace|repo|项目|專案/.test(lower)) return "project"; - return "feedback"; -} - -export function staleAfterDaysFor(type: LongTermType): number | undefined { - if (type === "feedback") return undefined; - if (type === "decision") return 45; - if (type === "project") return 60; - return 90; -} - -export function extractActiveFiles( - toolName: string, - args: Record, - output: string, -): Array<{ path: string; action: ActiveFile["action"] }> { - if (toolName === "read" && typeof args.filePath === "string") return [{ path: args.filePath, action: "read" }]; - if (toolName === "edit" && typeof args.filePath === "string") return [{ path: args.filePath, action: "edit" }]; - if (toolName === "write" && typeof args.filePath === "string") return [{ path: args.filePath, action: "write" }]; - if (toolName === "grep") return extractGrepPaths(output).map(path => ({ path, action: "grep" as const })); - return []; -} - -function extractGrepPaths(output: string): string[] { - const matches = output.match(/^(\/[^ - return [...new Set(matches.map(match => match.replace(/:$/, "")))].slice(0, 10); -} - -export function extractErrorsFromBash(command: string, output: string): OpenError[] { - const lines = output.split("\n").filter(line => /error|failed|failure|exception|TS\d{4}|ERR!/i.test(line)).slice(0, 5); - if (lines.length === 0) return []; - - const category = classifyCommand(command) ?? "runtime"; - const summary = lines.join(" ").slice(0, 280); - const fingerprint = hash(`${category}:${summary.toLowerCase().replace(/\s+/g, " ")}`); - const now = Date.now(); - - return [{ - id: `err_${fingerprint}`, - category, - summary, - command, - file: extractFirstPath(summary), - fingerprint, - status: "open", - firstSeen: now, - lastSeen: now, - seenCount: 1, - }]; -} - -export function classifyCommand(command: string): OpenError["category"] | null { - const c = command.toLowerCase(); - if (/\b(tsc|typecheck)\b/.test(c)) return "typecheck"; - if (/\b(test|vitest|jest|mocha|pytest|go test|cargo test)\b/.test(c)) return "test"; - if (/\b(lint|eslint|biome)\b/.test(c)) return "lint"; - if (/\b(build|vite build|webpack|tsup)\b/.test(c)) return "build"; - return null; -} - -function extractFirstPath(text: string): string | undefined { - return text.match(/[\w./-]+\.(ts|tsx|js|jsx|json|md|py|go|rs)/)?.[0]; -} - -export function parseWorkspaceMemoryCandidates(summary: string): LongTermMemoryEntry[] { - const match = summary.match(/([\s\S]*?)<\/workspace_memory_candidates>/i); - if (!match) return []; - - const now = new Date().toISOString(); - const entries: LongTermMemoryEntry[] = []; - - for (const line of match[1].split("\n")) { - const item = line.trim().match(/^-\s*\[(feedback|project|decision|reference)\]\s*(.+)$/i); - if (!item) continue; - const type = item[1].toLowerCase() as LongTermType; - const body = item[2].trim(); - if (body.length < 12) continue; - entries.push({ - id: id("mem"), - type, - text: body.slice(0, LONG_TERM_LIMITS.maxEntryTextChars), - source: "compaction", - confidence: 0.75, - status: "active", - createdAt: now, - updatedAt: now, - staleAfterDays: staleAfterDaysFor(type), - }); - } - - return entries; -} -``` - -- [ ] **Step 2: Run extractor tests** - -Run: `npm test` - -Expected: PASS for extractor tests. - ---- - -### Wave 1 verification checkpoint - -- [ ] **Step 1: Run all checks** - -Run: `npm test && npm run typecheck` - -Expected: PASS. - -- [ ] **Step 2: Review wave output** - -Confirm: Types, paths, storage helpers, and deterministic extractors exist and tests cover clear remember, false positives, active files, bash errors, and compaction candidates. - -- [ ] **Step 3: Commit wave** - -```bash -git add package.json src tests -git commit -m "refactor: add memory v2 core primitives" -``` - ---- - -## Wave 2 — Workspace Memory and Hot Session State - -### Task 5: Implement workspace memory store - -**Files:** -- Create: `src/workspace-memory.ts` -- Test: `tests/workspace-memory.test.ts` - -- [ ] **Step 1: Write failing tests** - -Create `tests/workspace-memory.test.ts`: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import type { LongTermMemoryEntry } from "../src/types.ts"; -import { enforceLongTermLimits, renderWorkspaceMemory } from "../src/workspace-memory.ts"; - -function entry(text: string, type: LongTermMemoryEntry["type"] = "feedback"): LongTermMemoryEntry { - const now = new Date().toISOString(); - return { id: text, type, text, source: "explicit", confidence: 1, status: "active", createdAt: now, updatedAt: now }; -} - -test("enforceLongTermLimits dedupes entries", () => { - const kept = enforceLongTermLimits([entry("Memory must be invisible"), entry("Memory must be invisible")]); - assert.equal(kept.length, 1); -}); - -test("renderWorkspaceMemory includes verify marker for stale decisions", () => { - const old = entry("Use JSON storage", "decision"); - old.createdAt = "2020-01-01T00:00:00.000Z"; - old.staleAfterDays = 45; - const rendered = renderWorkspaceMemory({ version: 1, workspace: { root: "/repo", key: "abc" }, limits: { maxRenderedChars: 5200, maxEntries: 28 }, entries: [old], updatedAt: old.createdAt }); - assert.match(rendered, /verify/); -}); -``` - -- [ ] **Step 2: Implement workspace memory functions** - -Create `src/workspace-memory.ts` with: - -```ts -import type { LongTermMemoryEntry, WorkspaceMemoryStore } from "./types"; -import { LONG_TERM_LIMITS } from "./types"; -import { workspaceKey, workspaceMemoryPath } from "./paths"; -import { atomicWriteJSON, readJSON } from "./storage"; - -export async function emptyWorkspaceMemory(root: string): Promise { - return { - version: 1, - workspace: { root, key: await workspaceKey(root) }, - limits: { maxRenderedChars: LONG_TERM_LIMITS.maxRenderedChars, maxEntries: LONG_TERM_LIMITS.maxEntries }, - entries: [], - updatedAt: new Date().toISOString(), - }; -} - -export async function loadWorkspaceMemory(root: string): Promise { - return readJSON(await workspaceMemoryPath(root), () => ({ - version: 1, - workspace: { root, key: "unknown" }, - limits: { maxRenderedChars: LONG_TERM_LIMITS.maxRenderedChars, maxEntries: LONG_TERM_LIMITS.maxEntries }, - entries: [], - updatedAt: new Date().toISOString(), - })); -} - -export async function saveWorkspaceMemory(root: string, store: WorkspaceMemoryStore): Promise { - store.workspace = { root, key: await workspaceKey(root) }; - store.entries = enforceLongTermLimits(store.entries); - store.updatedAt = new Date().toISOString(); - await atomicWriteJSON(await workspaceMemoryPath(root), store); -} - -export function enforceLongTermLimits(entries: LongTermMemoryEntry[]): LongTermMemoryEntry[] { - const byKey = new Map(); - for (const entry of entries.filter(e => e.status === "active")) { - const text = entry.text.slice(0, LONG_TERM_LIMITS.maxEntryTextChars); - const key = `${entry.type}:${text.toLowerCase().replace(/\s+/g, " ").trim()}`; - const existing = byKey.get(key); - if (!existing || entry.source === "explicit") byKey.set(key, { ...entry, text }); - } - return [...byKey.values()] - .sort((a, b) => priority(b) - priority(a)) - .slice(0, LONG_TERM_LIMITS.maxEntries); -} - -function priority(entry: LongTermMemoryEntry): number { - const type = { feedback: 400, decision: 300, project: 200, reference: 100 }[entry.type]; - const source = entry.source === "explicit" ? 1000 : 0; - return source + type + entry.confidence * 10; -} - -export function renderWorkspaceMemory(store: WorkspaceMemoryStore): string { - const active = enforceLongTermLimits(store.entries); - if (active.length === 0) return ""; - const lines = [ - "", - "Persistent workspace memory. Use as background; verify stale or code-related claims.", - ]; - for (const type of ["feedback", "project", "decision", "reference"] as const) { - const items = active.filter(e => e.type === type); - if (items.length === 0) continue; - lines.push(`${type}:`); - for (const item of items) lines.push(`- ${renderEntry(item)}`); - } - lines.push(""); - return lines.join("\n").slice(0, store.limits.maxRenderedChars); -} - -function renderEntry(entry: LongTermMemoryEntry): string { - const ageDays = Math.floor((Date.now() - new Date(entry.createdAt).getTime()) / 86_400_000); - const stale = entry.staleAfterDays && ageDays > entry.staleAfterDays ? ` [${ageDays}d old, verify]` : ""; - const rationale = entry.rationale ? ` Why: ${entry.rationale.slice(0, LONG_TERM_LIMITS.maxRationaleChars)}` : ""; - return `${entry.text}${rationale}${stale}`; -} -``` - -- [ ] **Step 3: Run tests** - -Run: `npm test` - -Expected: PASS. - ---- - -### Task 6: Implement session state lifecycle - -**Files:** -- Create: `src/session-state.ts` -- Test: `tests/session-state.test.ts` - -- [ ] **Step 1: Write failing tests** - -Create `tests/session-state.test.ts`: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import { createEmptySessionState, touchActiveFile, upsertOpenError, clearErrorsForSuccessfulCommand, renderHotSessionState } from "../src/session-state.ts"; -import type { OpenError } from "../src/types.ts"; - -test("touchActiveFile weights edits above reads", () => { - const state = createEmptySessionState("s1"); - touchActiveFile(state, "/repo/a.ts", "read"); - touchActiveFile(state, "/repo/b.ts", "edit"); - assert.equal(state.activeFiles[0].path, "/repo/b.ts"); -}); - -test("clearErrorsForSuccessfulCommand clears category", () => { - const state = createEmptySessionState("s1"); - const err: OpenError = { id: "e", category: "typecheck", summary: "TS error", fingerprint: "f", status: "open", firstSeen: 1, lastSeen: 1, seenCount: 1 }; - upsertOpenError(state, err); - clearErrorsForSuccessfulCommand(state, "npm run typecheck"); - assert.equal(state.openErrors.length, 0); -}); - -test("renderHotSessionState includes active files and open errors", () => { - const state = createEmptySessionState("s1"); - touchActiveFile(state, "/repo/index.ts", "edit"); - upsertOpenError(state, { id: "e", category: "test", summary: "test failed", fingerprint: "f", status: "open", firstSeen: 1, lastSeen: 1, seenCount: 1 }); - const rendered = renderHotSessionState(state, "/repo"); - assert.match(rendered, /index.ts/); - assert.match(rendered, /test failed/); -}); -``` - -- [ ] **Step 2: Implement session state functions** - -Create `src/session-state.ts` with create/load/save/touch/upsert/clear/render functions matching the tests. - -- [ ] **Step 3: Run tests** - -Run: `npm test` - -Expected: PASS. - ---- - -### Wave 2 verification checkpoint - -- [ ] **Step 1: Run all checks** - -Run: `npm test && npm run typecheck` - -Expected: PASS. - -- [ ] **Step 2: Review wave output** - -Confirm: Long-term store enforces limits and renders staleness. Hot session state ranks active files, stores open errors, and clears category errors on successful validation commands. - -- [ ] **Step 3: Commit wave** - -```bash -git add src tests -git commit -m "feat: add workspace memory and hot session state" -``` - ---- - -## Wave 3 — Plugin Hook Integration - -### Task 7: Wire OpenCode helper functions - -**Files:** -- Create: `src/opencode.ts` - -- [ ] **Step 1: Add SDK wrappers** - -Create `src/opencode.ts` with helpers: - -```ts -export async function latestUserText(client: any, sessionID: string): Promise<{ id: string; text: string } | null> { - const result = await client.session.messages({ path: { id: sessionID } }); - const messages = result.data ?? []; - for (let i = messages.length - 1; i >= 0; i--) { - const msg = messages[i]; - if (msg.info?.role !== "user") continue; - const text = msg.parts?.filter((p: any) => p.type === "text").map((p: any) => p.text).join("\n") ?? ""; - if (text.trim()) return { id: msg.info.id, text }; - } - return null; -} - -export async function latestCompactionSummary(client: any, sessionID: string): Promise { - const result = await client.session.messages({ path: { id: sessionID } }); - const messages = result.data ?? []; - for (let i = messages.length - 1; i >= 0; i--) { - const msg = messages[i]; - if (msg.info?.role !== "assistant" || msg.info?.summary !== true) continue; - const text = msg.parts?.filter((p: any) => p.type === "text").map((p: any) => p.text).join("\n") ?? ""; - if (text.trim()) return text; - } - return null; -} - -export async function pendingTodos(client: any, sessionID: string): Promise> { - try { - const result = await client.session.todo({ path: { id: sessionID } }); - return (result.data ?? []).filter((todo: any) => todo.status !== "completed"); - } catch { - return []; - } -} -``` - -- [ ] **Step 2: Run typecheck** - -Run: `npm run typecheck` - -Expected: PASS. - ---- - -### Task 8: Implement plugin orchestration - -**Files:** -- Create: `src/plugin.ts` -- Modify: `index.ts` - -- [ ] **Step 1: Replace `index.ts` entrypoint** - -```ts -export { default } from "./src/plugin"; -``` - -- [ ] **Step 2: Implement hooks in `src/plugin.ts`** - -Create plugin that: - -- caches frozen workspace memory per `sessionID` -- processes explicit memory from latest user text once per message id -- injects frozen workspace memory and dynamic hot session state -- updates session state after tools -- augments compaction context with memory, hot state, todos, and memory candidate instruction -- parses compaction summaries from `session.compacted` event and merges candidates - -The compaction instruction must be: - -```ts -function memoryCandidateInstruction(): string { - return ` -At the end of the compaction summary, include: - - -- [feedback] ... -- [project] ... -- [decision] ... -- [reference] ... - - -Only include durable information useful across future sessions in this exact workspace. -Do NOT include active file lists, raw errors, temporary progress, stack traces, code signatures, API docs, git history, or facts easily rediscovered from the repository. -For decisions, include rationale in one sentence. -If nothing qualifies, output an empty block. -`.trim(); -} -``` - -- [ ] **Step 3: Run typecheck** - -Run: `npm run typecheck` - -Expected: PASS. - ---- - -### Wave 3 verification checkpoint - -- [ ] **Step 1: Run all checks** - -Run: `npm test && npm run typecheck` - -Expected: PASS. - -- [ ] **Step 2: Manual plugin smoke test** - -Run OpenCode with local plugin and verify: - -- user message `请记住:这个 workspace 的 memory 功能要默认无感` creates a long-term entry -- reading/editing files updates hot session state -- failed typecheck creates an open error -- successful typecheck clears typecheck errors - -- [ ] **Step 3: Commit wave** - -```bash -git add index.ts src tests -git commit -m "feat: wire memory v2 plugin hooks" -``` - ---- - -## Wave 4 — Documentation and Migration - -### Task 9: Update documentation - -**Files:** -- Modify: `README.md` -- Modify: `docs/architecture.md` -- Modify: `docs/configuration.md` -- Modify: `AGENTS.md` - -- [ ] **Step 1: Update README feature summary** - -Describe Memory V2 as: - -- workspace-scoped long-term memory -- hot session state -- no default agent-visible memory tools -- no raw tool-output cache -- compaction boundary extraction with no extra LLM call - -- [ ] **Step 2: Update architecture doc** - -Replace four-tier architecture with: - -```text -Layer 1: Stable Workspace Memory -Layer 2: Hot Session State -Layer 3: Native OpenCode State -``` - -- [ ] **Step 3: Update configuration doc** - -Document: - -- `LONG_TERM_LIMITS` -- `HOT_STATE_LIMITS` -- storage root under `XDG_DATA_HOME` or `~/.local/share` -- optional future `/memory import` - -- [ ] **Step 4: Update AGENTS.md** - -Update commands: - -```bash -npm test -npm run typecheck -``` - -Update storage and testing guidance to match Memory V2. - ---- - -### Task 10: Remove obsolete implementation paths - -**Files:** -- Modify: `index.ts` if old code remains -- Modify: docs references if any still mention old APIs - -- [ ] **Step 1: Remove obsolete references** - -Ensure repo no longer advertises default tools: - -- `core_memory_update` -- `core_memory_read` -- `working_memory_add` -- `working_memory_clear` -- `working_memory_clear_slot` -- `working_memory_remove` - -Unless a debug-only compatibility layer is explicitly retained, these names must not appear in README or architecture docs. - -- [ ] **Step 2: Remove obsolete concepts from docs** - -Remove or mark deprecated: - -- slots/pool/decay -- pressure monitor as core feature -- raw tool-output cache -- smart pruning replacing old tool outputs - -- [ ] **Step 3: Run docs grep** - -Run: `grep -R "core_memory_update\|working_memory_add\|pressure monitor\|tool-output cache" README.md docs AGENTS.md` - -Expected: no matches, or matches only under a clearly marked migration note. - ---- - -### Wave 4 verification checkpoint - -- [ ] **Step 1: Run all checks** - -Run: `npm test && npm run typecheck` - -Expected: PASS. - -- [ ] **Step 2: Verify docs match code** - -Confirm: README, architecture, configuration, and AGENTS describe Memory V2 and do not promise old tools or old four-tier behavior. - -- [ ] **Step 3: Commit wave** - -```bash -git add README.md docs AGENTS.md index.ts src tests package.json -git commit -m "docs: document memory v2 design" -``` - ---- - -## Verification Strategy - -### Automated - -- `npm test` validates extractors, long-term merge/render, and hot session lifecycle. -- `npm run typecheck` validates TypeScript imports and plugin entrypoint. - -### Manual OpenCode smoke tests - -1. Start a session with the plugin enabled. -2. Send: `请记住:这个 workspace 的 memory 功能要默认无感`. -3. Confirm `workspace-memory.json` is written under `~/.local/share/opencode-working-memory/workspaces//`. -4. Read and edit a file. -5. Confirm session state active files update. -6. Run a failing typecheck command. -7. Confirm open error appears in hot state. -8. Run a passing typecheck command. -9. Confirm typecheck error clears. -10. Trigger or simulate compaction. -11. Confirm compaction context includes memory candidate instruction and parsed candidates merge after compaction. - ---- - -## Risk Controls - -- **False memory extraction:** explicit regex only matches strong remember/from-now-on phrasing; compaction extraction uses explicit “what not to save” boundaries. -- **Token overhead:** no background LLM agent; compaction extraction piggybacks existing compaction call; hot state capped at 1200 chars. -- **Stale memory:** decision/project/reference entries have stale markers during render. -- **Privacy:** storage lives in user data directory, not repo, and writes with `0600` mode. -- **Duplicate todo state:** todos are not stored by the plugin; OpenCode remains source of truth. -- **Error staleness:** errors clear only after successful validation commands and become `maybe_fixed` after related edits. - ---- - -## Self-Review - -- Spec coverage: plan implements workspace-scoped cross-session memory, bounded long-term memory, compaction-boundary update, fully automatic hot session memory, and no extra LLM calls. -- Placeholder scan: plan contains no TBD/TODO placeholders; Tasks 8-10 reference exact expected behavior and code boundaries. -- Type consistency: `LongTermMemoryEntry`, `WorkspaceMemoryStore`, `SessionState`, `ActiveFile`, `OpenError`, and `SessionDecision` are defined once in Task 1 and reused consistently. -- Wave coherence: each wave ends with tests/typecheck and a committable checkpoint. diff --git a/docs/superpowers/plans/2026-04-26-memory-dedup-staleness-analysis.md b/docs/superpowers/plans/2026-04-26-memory-dedup-staleness-analysis.md deleted file mode 100644 index 76b6162..0000000 --- a/docs/superpowers/plans/2026-04-26-memory-dedup-staleness-analysis.md +++ /dev/null @@ -1,815 +0,0 @@ -# Memory Deduplication and Staleness Analysis - -Date: 2026-04-26 - -## Executive recommendation - -Fix this at storage time first, then tighten ingestion prompts. - -Storage is the safety net. Every memory entry, whether from compaction, explicit user instruction, or future manual editing, already flows through `normalizeWorkspaceMemory()` in `src/workspace-memory.ts`. That is the right architectural choke point for deduplication, supersession, and lifecycle pruning. - -Prompt changes are still useful, but only as a quality reducer. They cannot be the source of truth because model output will drift, multilingual phrasing will vary, and old stores already contain bad entries. - -Do not add embeddings yet. This repo has 22 entries, a limit of 28, and all current failures are simple lexical/category problems. Embeddings would add latency, dependencies, nondeterminism, and storage shape questions for a problem that can be solved with boring code. - -## Current data flow - -```text -OpenCode session.compacted event - │ - ▼ -latestCompactionSummary(client, sessionID) - │ - ▼ -parseWorkspaceMemoryCandidates(summary) - │ src/extractors.ts - │ - validates shape and basic quality - │ - assigns type/source/confidence/staleAfterDays - ▼ -updateWorkspaceMemory(directory, store => { - store.entries.push(...candidates) -}) - │ - ▼ -normalizeWorkspaceMemory(root, store) - │ src/workspace-memory.ts - │ - exact canonical dedupe only - │ - maxEntries trim - ▼ -workspace-memory.json -``` - -The broken boundary is clear: ingestion appends all candidates, and normalization only dedupes exact normalized text per type. - -## Problem 1: near-duplicate accumulation - -### Diagnosis - -`canonicalMemoryText()` catches only exact matches after NFKC, lowercase, and punctuation/whitespace collapse. It does not catch: - -- same fact with extra location detail -- same path with slightly different label text -- same decision revised from version 3 to version 4 -- bilingual restatements of the same project fact -- new fix superseding an older fix for the same issue - -This is not one dedupe problem. It is three different classes wearing the same hat. - -```text -Near duplicate classes -──────────────────────────────────────────── -project/reference → entity identity problem -feedback → topic preference/result problem -decision → supersession/history problem -``` - -Treating all of these with one fuzzy text threshold will either miss real duplicates or delete useful distinct decisions. - -### Ingestion time vs storage time - -Use both, with different jobs. - -#### Storage time, required - -Add deterministic memory normalization in `src/workspace-memory.ts`: - -1. exact canonical dedupe, keep existing behavior -2. type-specific identity keys for obvious entities -3. simple lexical similarity for same-type candidates -4. explicit supersession rules for versioned/solution-style decisions -5. lifecycle pruning before `maxEntries` trim - -Why storage first: - -- one code path for compaction, explicit, manual, and tests -- fixes existing stores on next load/save -- deterministic and unit-testable -- does not depend on model behavior - -#### Ingestion time, useful but secondary - -Improve `buildCompactionPrompt()` in `src/plugin.ts` so compaction receives existing memory and is told to emit only new or replacing facts. - -The current prompt already passes rendered workspace memory as background context and says "Do not output this context verbatim." That is not strong enough. Add a small rule near `Memory candidates:`: - -```text -Before emitting a memory candidate, compare it to Background context. -Do not emit a candidate that repeats an existing memory. -If a new candidate replaces an older one, write only the newer statement. -Prefer one canonical statement per project fact, reference path, user feedback topic, or implementation decision. -``` - -This will reduce noise. It will not eliminate it. Models repeat themselves. Software should expect this. - -### Recommended deduplication strategy - -Use deterministic, type-aware dedupe. Avoid embeddings. Avoid global fuzzy dedupe as the main rule. - -#### 1. Keep exact canonical dedupe - -Current logic is good as the first pass. - -```ts -dedup key = `${entry.type}:${canonicalMemoryText(text)}` -``` - -Keep source/confidence tie-breaking. - -#### 2. Add type-specific identity extraction - -For `project` and `reference`, dedupe by identifiable anchors, not prose. - -Examples: - -- repo/plugin system facts: normalized phrase key like `opencode-agenthub plugin system` -- file paths: normalized path key, with backticks stripped -- URLs/domains if they appear later - -For the current data: - -```text -reference:path:.opencode-agenthub/current/xdg/opencode/opencode.json -project:phrase:opencode-agenthub plugin system -``` - -When two entries share the same identity key, merge them by keeping the more useful text: - -1. explicit source beats manual beats compaction -2. higher confidence beats lower confidence -3. more specific text beats vague text, usually longer but cap this to avoid keeping rambles -4. newer beats older if specificity/source/confidence tie - -This directly fixes: - -- `OpenCode plugin config location: ...` vs `OpenCode plugin config: ...` -- Chinese and English variants that both mention `opencode-agenthub plugin system` - -#### 3. Add conservative lexical similarity only inside same type - -Use token Jaccard or Dice similarity over normalized tokens after stopword removal. No new dependencies. - -Suggested thresholds: - -```text -project/reference: >= 0.72 duplicate -feedback: >= 0.70 possible duplicate if same topic anchor exists -decision: do not use fuzzy deletion by default -``` - -This should be a fallback after identity keys, not the primary system. - -Risk: fuzzy matching can delete nearby but distinct decisions. Example: "Markdown headers cause purple text" and "Plain text labels avoid special markup" are related but both useful in the history of the bug. - -Keep fuzzy matching conservative and type-scoped. - -#### 4. Use explicit supersession for decisions - -Decision duplication is fundamentally different. Decisions often form a timeline. Some are still valuable context, some are obsolete. - -The pair below is supersession, not duplication: - -```text -Parser supports 3 formats: HTML comment, Markdown section, legacy XML -Parser supports 4 formats: plain text label, Markdown section, legacy section name, legacy XML -``` - -The right model is: newer active decision supersedes older active decision on the same topic. - -Keep this simple. Do not build a knowledge graph. - -Add a small `decisionTopicKey(text)` heuristic: - -```text -parser supports formats → decision:parser-supported-formats -solution: use ... → decision:purple-italic-output-format, if text contains purple/italic/markup/markdown/xml/html/comment/label -use output.prompt ... template → decision:compaction-template-replacement -opencode plugin load/config facts → decision:plugin-loading-config -``` - -That sounds bespoke, but that is acceptable here. The repo is small, the memory types are product-specific, and the current bad entries are product-specific. Boring beats clever. - -When same decision topic appears: - -- keep the newest active entry as active -- optionally mark the older entry `status: "superseded"` if the type supports it, or drop it during normalization if old status values are not preserved -- do not render superseded entries - -If preserving history matters later, add `supersededBy?: string` and `supersededAt?: string` to the type. Not needed for the first fix. - -### Type-specific policy - -| Type | Nature | Recommended dedupe | Keep history? | -|---|---|---|---| -| `project` | stable facts about repo/system | identity key + conservative similarity | no, keep one canonical fact | -| `reference` | pointer to path/URL/config | path/URL/entity key | no, keep one canonical pointer | -| `feedback` | user preference or resolved issue | topic key + newer wins for same issue | usually no | -| `decision` | implementation choice over time | topic supersession, not fuzzy duplicate deletion | sometimes, but render only active latest | - -## Problem 2: stale entries never cleaned - -### Diagnosis - -`staleAfterDays` exists, but only `renderEntry()` uses it to append `[Xd old, verify]`. Nothing removes or demotes stale entries. As a result, the store is monotonic until `maxEntries` forces a priority trim. - -That trim is the wrong cleanup mechanism. It sorts by type/source/confidence, not usefulness. A stale high-priority decision can beat a fresh low-priority reference. - -### When to prune - -Prune during storage normalization, not render. - -`normalizeWorkspaceMemory()` is already called by `load/save/updateWorkspaceMemory()`. That gives one central place to enforce lifecycle rules. - -```text -load/update/save - │ - ▼ -normalizeWorkspaceMemory() - │ - ├─ drop inactive/superseded from active set - ├─ exact dedupe - ├─ identity dedupe - ├─ supersession - ├─ stale lifecycle pruning - └─ maxEntries trim -``` - -Do not prune only on render. Render is presentation. If render hides or labels stale entries while the JSON keeps growing, the system still rots. - -Do not require explicit cleanup as the only path. It will not run often enough. An explicit cleanup command can be added later for manual inspection, but automatic normalization should handle the common case. - -### Should `staleAfterDays` be enforced? - -Yes, but not uniformly as immediate deletion for every type. - -`staleAfterDays` means "this should be revalidated after this age." It does not always mean "delete at this age." - -Use a two-tier lifecycle: - -```text -fresh age <= staleAfterDays -stale staleAfterDays < age <= staleAfterDays + grace -prunable age > staleAfterDays + grace -``` - -Suggested grace periods: - -| Type | Current staleAfterDays | Grace | Auto-prune? | Rationale | -|---|---:|---:|---|---| -| `feedback` | none | none | no age-based prune | User preference can remain valid indefinitely. Prune only by supersession/topic replacement. | -| `decision` | 45 | 15 | yes if compaction/manual and not explicit | Implementation decisions age fast. Supersession should remove most earlier. | -| `project` | 60 | 30 | yes if compaction/manual and no strong identity/path | Project facts change slower. Keep explicit project facts unless replaced. | -| `reference` | 90 | 30 | yes if path no longer exists or prunable age exceeded | References are rediscoverable and can become stale. | - -For the first implementation, a simpler rule is enough: - -```text -Never age-prune feedback. -Never age-prune explicit entries automatically. -Drop compaction/manual entries when age > staleAfterDays + 30 days. -Drop superseded entries immediately from the active set. -``` - -This keeps user-owned memory safe while preventing compaction sludge. - -### Explicit vs implicit contradiction detection - -Use explicit supersession for known memory shapes. Do not try general contradiction detection. - -General contradiction detection without LLM or embeddings is brittle. With an LLM it is nondeterministic and adds another model-quality surface. The current problem does not need that. - -Recommended model: - -- explicit supersession for same decision topic, same reference path, same project entity, same feedback topic -- newer entry wins inside the same topic unless older has higher source priority -- if `source === "explicit"`, require a newer explicit entry to replace it, or keep both - -This gives predictable behavior and avoids deleting user instructions because a compaction guessed a replacement. - -## Concrete implementation plan - -### P0: centralize deterministic cleanup in `src/workspace-memory.ts` - -Add helpers near `canonicalMemoryText()`: - -```text -normalizedTokens(text) -extractPathKeys(text) -memoryIdentityKeys(entry) -decisionTopicKey(text) -feedbackTopicKey(text) -isPrunableByAge(entry, now) -chooseBetterMemory(existing, candidate) -``` - -Then change `enforceLongTermLimits(entries)` to run in phases: - -```text -1. keep active entries only -2. truncate text -3. drop entries prunable by age, except feedback and explicit -4. exact canonical dedupe -5. identity-key dedupe for project/reference/feedback -6. decision-topic supersession -7. sort by priority with freshness as a tie-breaker -8. slice to maxEntries -``` - -Add freshness to `priority()` or to the final sort tie-breaker. Do not let 90-day-old compaction entries beat fresh entries just because type weight is higher. - -Minimal version: - -```text -priority desc, source priority desc, freshness desc, updatedAt desc -``` - -### P1: improve compaction prompt - -Update `buildCompactionPrompt()` with dedupe instructions before the `Memory candidates:` examples. - -Keep this short. Long prompts invite drift. - -### P1: add tests before changing behavior - -Use `tests/workspace-memory.test.ts` for normalization behavior. - -Required regression tests: - -```text -CODE PATH COVERAGE -================== -[+] enforceLongTermLimits(entries) - ├── [GAP] exact canonical duplicate still dedupes - ├── [GAP] project opencode-agenthub bilingual/long-short variants collapse to one - ├── [GAP] reference same config path variants collapse to one - ├── [GAP] decision parser 4 formats supersedes parser 3 formats - ├── [GAP] feedback purple/italic newer fix supersedes older fix - ├── [GAP] stale compaction decision older than staleAfterDays + grace is pruned - ├── [GAP] stale explicit decision is retained - └── [GAP] maxEntries trim runs after dedupe/prune - -[+] renderWorkspaceMemory(store) - └── [GAP] does not render superseded/pruned entries -``` - -No E2E needed. These are pure functions and deterministic store normalization paths. - -### P2: optional explicit cleanup command - -Later, add a manual cleanup/report command that prints: - -- duplicates removed -- superseded decisions -- stale entries pruned -- entries retained because explicit - -Not needed for the first fix. Useful for trust once memory stores grow. - -## Why not embeddings - -Embeddings are the wrong tool at this scale. - -Costs: - -- new dependency/API or local model decision -- cache/versioning problem for embedding vectors -- nondeterministic thresholds -- hard-to-debug deletions -- privacy and offline behavior questions - -The current store has 22 entries. The failures are obvious strings, paths, topics, and versioned decisions. Use deterministic rules now. Reconsider embeddings only if stores grow into hundreds of entries and lexical/topic rules fail in real usage. - -## Risks and tradeoffs - -### Risk: deleting useful historical decisions - -Mitigation: do not apply broad fuzzy dedupe to `decision`. Use topic-specific supersession only for known patterns. Keep explicit entries unless explicitly replaced. - -### Risk: bespoke topic keys become a pile of regexes - -Mitigation: keep the first version tiny and test-driven. Add keys only for observed failures. If this grows past roughly 10 topic rules, revisit the model. - -### Risk: prompt-only fix gives false confidence - -Mitigation: prompt change is P1, storage normalization is P0. The store must protect itself. - -### Risk: stale pruning removes something still useful - -Mitigation: no age pruning for feedback, no automatic age pruning for explicit entries, and grace periods for compaction/manual entries. - -### Risk: normalization mutates existing stores unexpectedly - -Mitigation: add tests with fixtures from the current store. Consider logging cleanup counts in development if a logging channel exists. The output should be deterministic. - -## NOT in scope - -- Embedding similarity, too much machinery for 22 entries. -- LLM-based contradiction detection, nondeterministic and hard to test. -- Full memory history graph with `supersededBy`, useful later but not required for current rendering quality. -- New cleanup UI or CLI, optional P2 after deterministic normalization lands. -- Changing `LongTermMemoryEntry` schema, avoid migration unless history preservation becomes required. - -## Prioritized steps - -1. **P0: Add tests in `tests/workspace-memory.test.ts` using the concrete duplicate examples from the current store.** This locks the desired behavior before touching cleanup logic. -2. **P0: Implement storage-time cleanup in `enforceLongTermLimits()`.** Exact dedupe, identity-key dedupe, decision supersession, stale pruning, then max-entry trim. -3. **P0: Make stale lifecycle enforceable but conservative.** No age pruning for feedback or explicit entries. Prune compaction/manual entries after `staleAfterDays + 30`. -4. **P1: Tighten `buildCompactionPrompt()` to avoid re-emitting existing memories and emit only replacing facts.** This reduces future noise but is not trusted as the only defense. -5. **P1: Add regression fixtures matching the real `workspace-memory.json` problem set.** Assert resulting entries are below the current 22 and contain the newer/canonical facts. -6. **P2: Add a cleanup report command only if users need visibility.** Defer until after the automatic path proves itself. - -## Final architecture decision - -The memory store should be self-cleaning at its storage boundary. - -Use prompt engineering to reduce bad candidates, but make `src/workspace-memory.ts` the authority for what persists. Use deterministic, type-aware dedupe instead of embeddings. Treat `project` and `reference` as entity identity problems, `feedback` as topic replacement, and `decision` as explicit supersession. - -That is the smallest design that solves the real failures without turning a 28-entry JSON file into a search platform. - -## Addendum: bracketless memory candidate format from real compaction - -Date: 2026-04-26 - -### Summary table - -| Issue | Severity | Fix | Priority | -|-------|----------|-----|----------| -| Parser silently drops `- project text` bracketless candidates | High | Accept both `- [type] text` and `- type text` | P0 | -| Prompt examples imply brackets but do not explicitly require exact syntax | Medium | Add "Use exactly this format, including square brackets" plus a negative example | P0, same small patch | -| No regression test for bracketless candidate lines | High | Add parser test covering all four types in bracketless form | P0 | -| Future compactions may re-extract useful facts with changed counts or wording | Medium | Keep storage-time type-aware dedupe/staleness plan | P0, unchanged | - -### 1. Parser fix - -Accept `- type text` with no brackets. - -Also strengthen the prompt. Do both. - -The parser is the product boundary. Model output is not a contract, it is an input from an unreliable narrator with excellent vibes. If the model emits a plainly parseable, semantically valid candidate, dropping it silently is a data loss bug. - -The prompt should still ask for the preferred bracketed format because bracketed type markers are less ambiguous. But prompt enforcement alone is not enough. The new evidence proves the model sometimes drops brackets even when examples include them. - -Recommended parser behavior: - -- preferred: `- [project] pathology-playground 後端健康改進計劃已完成 Phase 1-4` -- accepted fallback: `- project pathology-playground 後端健康改進計劃已完成 Phase 1-4` -- still reject unknown types -- still run `shouldAcceptWorkspaceMemoryCandidate()` -- still require body length and existing quality gates - -### 2. Prompt format enforcement - -Yes, add explicit syntax instructions. - -Current prompt shows examples, but examples are not a hard enough constraint. Add one sentence before the examples: - -```text -Use exactly this candidate format, including square brackets around the type: -``` - -Then keep the examples: - -```text -Memory candidates: -- [feedback] content -- [project] content -- [decision] content -- [reference] content -``` - -Optionally add one short warning: - -```text -Do not write `- project content`; write `- [project] content`. -``` - -Keep this short. Long formatting lectures increase prompt surface area and make the summary worse. One positive instruction plus one negative example is enough. - -### 3. Impact on dedup plan - -Parser robustness moves to P0, before storage dedup/staleness cleanup. - -This changes sequencing, not the architecture. - -Updated P0 order: - -1. **P0a: Fix parser format tolerance and add regression tests.** Lost memory is worse than duplicate memory. A deduper cannot dedupe entries that never made it into the store. -2. **P0b: Implement storage-time dedupe and stale pruning.** Still the main long-term quality fix. -3. **P0c: Tighten prompt format instruction in the same small patch as parser tolerance.** Cheap and reduces fallback-parser usage. - -The earlier recommendation still stands: storage normalization remains the authority for duplicates and staleness. This new evidence adds a more basic ingestion reliability bug in front of it. - -### 4. Concrete implementation recommendation - -#### Regex change - -Replace the current parser line in `src/extractors.ts:parseWorkspaceMemoryCandidates()`: - -```ts -const item = line.trim().match(/^-\s*\[(feedback|project|decision|reference)\]\s*(.+)$/i); -``` - -with a single regex that accepts bracketed and bracketless forms: - -```ts -const item = line.trim().match( - /^-\s*(?:\[(feedback|project|decision|reference)\]|(feedback|project|decision|reference)\b)\s+(.+)$/i, -); -if (!item) continue; - -const type = (item[1] ?? item[2]).toLowerCase() as LongTermType; -const body = item[3].trim(); -``` - -Why this shape: - -- `(?:[type]|type\b)` accepts both formats -- `\b` prevents `projectile` from being parsed as `project` -- `\s+(.+)` requires real content after the type -- unknown types still fail - -Even better for readability, avoid duplicate type alternation with a named group if the runtime target supports it cleanly: - -```ts -const item = line.trim().match( - /^-\s*(?:\[(?feedback|project|decision|reference)\]|(?feedback|project|decision|reference)\b)\s+(?.+)$/i, -); -if (!item?.groups) continue; - -const type = (item.groups.bracketed ?? item.groups.plain).toLowerCase() as LongTermType; -const body = item.groups.body.trim(); -``` - -Recommendation: use the non-named-group version. It is uglier, but it is maximally boring and consistent with the existing code style. - -Add tests in `tests/extractors.test.ts`: - -```ts -test("parseWorkspaceMemoryCandidates accepts bracketless candidate format", () => { - const summary = ` -Memory candidates: -- project pathology-playground 後端健康改進計劃已完成 Phase 1-4 -- reference Scrypt 參數必須是 N=16384, r=8, p=1 -- feedback 端口 9473 可能被舊進程佔用,需殺掉後重啟 -- decision Use output.prompt to replace the default compaction template -`; - - const items = parseWorkspaceMemoryCandidates(summary); - - assert.equal(items.length, 4); - assert.deepEqual(items.map(item => item.type), [ - "project", - "reference", - "feedback", - "decision", - ]); -}); -``` - -Also add a guard test: - -```ts -test("parseWorkspaceMemoryCandidates rejects unknown bracketless candidate type", () => { - const summary = ` -Memory candidates: -- note this should not be parsed as memory -`; - - const items = parseWorkspaceMemoryCandidates(summary); - - assert.equal(items.length, 0); -}); -``` - -#### Prompt change - -In `src/plugin.ts:buildCompactionPrompt()`, change this block: - -```ts -"At the end of the summary, extract durable memory entries for future", -"sessions using these labels:", -"", -"Memory candidates:", -"- [feedback] content", -"- [project] content", -"- [decision] content", -"- [reference] content", -``` - -to: - -```ts -"At the end of the summary, extract durable memory entries for future", -"sessions using exactly this candidate format, including square brackets around the type:", -"", -"Memory candidates:", -"- [feedback] content", -"- [project] content", -"- [decision] content", -"- [reference] content", -"", -"Do not write '- project content'; write '- [project] content'.", -``` - -This gives the model a crisp positive format and a concrete anti-pattern. The parser still accepts the anti-pattern because users need data capture more than format purity. - -### Final addendum decision - -Parser tolerance is now P0. - -The architecture stays the same: make the storage layer self-cleaning, and make ingestion defensive. But the implementation sequence changes because silent data loss beats duplicate accumulation in severity. First capture valid candidates reliably. Then dedupe and prune them. - -## Addendum 2: content quality guidance - -Date: 2026-04-26 - -### Summary table - -| Issue | Severity | Fix | Priority | -|-------|----------|-----|----------| -| Model extracts low-durability progress snapshots as `project` memory | High | Add durable-content guidance to compaction prompt | P0 | -| Exact counts like `1237 tests pass` and `37 files` churn across sessions | High | Add parser quality filter for obvious snapshot patterns | P0 | -| Stable config values are useful and should still pass | Medium | Keep `reference` guidance permissive for config/crypto/PIN values | P0 | -| Environment issues like occupied ports may be useful briefly but not long-term | Medium | Prompt says unresolved issues only; storage staleness handles aging | P1 with staleness work | - -### 1. Architecture fit - -This belongs in both the prompt and the parser, with different responsibilities. - -The prompt should teach the model what "durable" means. The model is choosing what to extract, so it needs product semantics: - -- stable configuration values are good memory -- unresolved bugs can be useful memory -- exact test counts, file counts, and phase progress are usually bad long-term memory - -The parser should still reject obvious low-durability snapshots as a backstop. The parser already has `shouldAcceptWorkspaceMemoryCandidate()` in `src/extractors.ts`; this is exactly where simple content-quality gates belong. - -Do not put subtle semantic judgment in the parser. Do put obvious anti-patterns there. - -Recommended split: - -```text -Prompt - └─ positive/negative guidance for durable memory selection - -Parser quality gate - └─ deterministic rejection of obvious snapshots - - exact test counts - - exact file counts - - completed Phase N-M progress lines - - temporary port/process cleanup notes when phrased as resolved/current env state - -Storage normalization - └─ dedupe, supersession, age-based pruning -``` - -This is the same design principle as the bracketless parser addendum: ask the model nicely, then make the code defensive. - -### 2. Specificity vs risk - -The proposed guidance is specific, but not too specific. - -It names examples from the observed failure mode, but the rule underneath is general: facts should stay true across sessions. Exact counts and phase numbers are classic snapshot smell in almost every codebase. - -Potential risk: sometimes an exact count is genuinely durable. Example: "USB sync protocol expects exactly 37 manifest entries" could be a stable contract, not a snapshot. - -Mitigation: word the guidance around "session-specific progress" rather than banning all numbers. Keep config values explicitly allowed. - -Good distinction: - -```text -Bad: 1237 tests pass today -Good: Test suite is expected to pass before handoff - -Bad: USB sync currently has 37 files -Good: USB sync covers bundles, server, frontend, tests, and docs - -Bad: Phase 1-4 completed -Good: Backend health work is organized into phased improvements - -Good: Scrypt parameters are N=16384, r=8, p=1 -``` - -The first three are progress snapshots. The Scrypt value is a stable configuration contract. Numbers are not the problem. Temporary state is the problem. - -### 3. Prompt length concern - -Adding four lines is worth it. - -This prompt is already making the model do extraction. Without guidance, the model optimizes for "important-looking facts," and progress snapshots look important. That creates churn, duplicates, and stale memory. Four lines preventing bad memory at the source are cheap. - -If trimming is needed, trim redundant formatting language before removing quality guidance. Formatting mistakes lose entries or require parser tolerance. Content mistakes pollute the store. Both matter, but the durable-content guidance carries more product value than repeated Markdown formatting reminders. - -Recommended trim posture: - -- keep one concise formatting instruction -- keep one concise candidate syntax instruction -- add one concise durable-content block -- avoid long examples or taxonomy tables in the prompt - -The prompt should not become a memory policy document. It just needs the model to stop writing "1237 tests pass" into long-term storage. Wild that we have to say this, but we do. - -### 4. Concrete prompt recommendation - -In `src/plugin.ts:buildCompactionPrompt()`, replace the candidate instruction block with this final version: - -```ts -"At the end of the summary, extract durable memory entries for future sessions.", -"Only extract facts that are likely to stay true across sessions.", -"Do not extract session-specific progress like exact test counts, file counts, or phase numbers.", -"For progress, extract the stable goal or durable milestone, not the current number.", -"For references, extract configuration values that do not usually change between sessions.", -"For feedback, extract unresolved issues or user preferences that future sessions need to know.", -"Use exactly this candidate format, including square brackets around the type:", -"", -"Memory candidates:", -"- [feedback] content", -"- [project] content", -"- [decision] content", -"- [reference] content", -"", -"Do not write '- project content'; write '- [project] content'.", -``` - -This is slightly longer than the lead's proposal, but it avoids an overbroad ban on numbers by saying "session-specific progress." It also gives a positive replacement behavior: stable goal or durable milestone. - -If a shorter version is required, use this: - -```ts -"At the end of the summary, extract durable memory entries for future sessions.", -"Only extract facts likely to stay true across sessions; skip exact test counts, file counts, phase numbers, and temporary environment state.", -"References may include stable configuration values. Feedback should be unresolved issues or user preferences future sessions need.", -"Use exactly this candidate format, including square brackets around the type:", -``` - -Recommendation: use the longer block. The extra three lines buy clarity and reduce accidental over-filtering. - -### Parser quality gate recommendation - -Add deterministic snapshot rejection to `shouldAcceptWorkspaceMemoryCandidate()`. - -Keep this conservative. Reject obvious snapshots, not every number. - -Suggested first-pass rules: - -```ts -// Session-specific progress snapshots, not durable memory. -if (entry.type === "project") { - if (/\b\d+\s+tests?\s+pass(?:ed)?\b/i.test(text)) return false; - if (/\b\d+\s+suites?\b/i.test(text)) return false; - if (/\b\d+\s+(?:files?|文件)\b/i.test(text)) return false; - if (/\bphase\s*\d+(?:\s*[-–]\s*\d+)?\s+(?:completed|done|finished)\b/i.test(text)) return false; - if (/已完成\s*Phase\s*\d+(?:\s*[-–]\s*\d+)?/i.test(text)) return false; -} -``` - -Do not reject stable `reference` values containing numbers. These must pass: - -```text -Admin PIN 是 456123 -Scrypt 參數必須是 N=16384, r=8, p=1 -``` - -For `feedback`, do not broadly reject ports yet. A port issue can be useful if it explains a recurring failure. Let staleness prune it, unless the text clearly says the issue was resolved. A future parser rule can reject resolved temporary env notes, but the current evidence is not enough to safely block all port-related feedback. - -### 5. Integration with storage-time dedup/staleness - -Prompt-level guidance and staleness solve different problems. - -Staleness is cleanup after bad or aging facts are already stored. Prompt guidance prevents low-value facts from entering the store in the first place. Parser filtering catches obvious misses when the prompt fails. - -Do not rely on staleness for exact counts. - -Why: - -- `maxEntries` is 28, so a few bad snapshots can evict useful facts before they age out -- exact counts will churn every compaction and create near-duplicates -- stale labels still consume render budget until pruning runs -- users see noisy memory and trust the feature less - -Storage-time dedup/staleness remains required for facts that were good when written but later become outdated. Example: a config path that moves, a decision superseded by a better decision, or an unresolved bug that later gets fixed. - -Use this mental model: - -```text -Prompt guidance → prevent bad candidates -Parser quality gate → reject obvious bad candidates -Storage dedupe → merge repeated good candidates -Storage staleness → retire once-good candidates that aged out -``` - -### Updated priority - -The new content-quality evidence adds another P0 ingestion fix. - -Updated sequence: - -1. **P0a: Parser accepts bracketless candidate format and tests it.** Prevent silent data loss. -2. **P0b: Prompt durable-content guidance.** Stop obvious snapshots at the source. -3. **P0c: Parser rejects obvious low-durability `project` snapshots.** Backstop the prompt with deterministic filters. -4. **P0d: Storage-time dedupe and staleness.** Still required for duplicate accumulation and lifecycle cleanup. - -### Final addendum 2 decision - -Add the durable-content guidance to the prompt and add conservative parser filters for obvious `project` snapshots. - -This does not replace storage-time dedupe or staleness. It reduces garbage before it reaches that layer. The store still needs to clean itself, but it should not be used as a trash compactor for facts we already know are temporary. diff --git a/docs/superpowers/plans/2026-04-26-memory-plugin-quality-fixes.md b/docs/superpowers/plans/2026-04-26-memory-plugin-quality-fixes.md deleted file mode 100644 index 0c60502..0000000 --- a/docs/superpowers/plans/2026-04-26-memory-plugin-quality-fixes.md +++ /dev/null @@ -1,1057 +0,0 @@ -# Memory Plugin Quality Fixes Plan - -## 概述 - -修復 Memory Plugin 的 false positive 問題和去重機制。 - -## Baseline 數據 - -執行任何修改前,先收集目前狀態: - -```bash -# Workspace memory 數量 -find ~/.local/share/opencode-working-memory/workspaces -name workspace-memory.json \ - -print -exec jq '.entries | length' {} \; - -# Open errors false positive -find ~/.local/share/opencode-working-memory/workspaces -path '*/sessions/*.json' \ - -print -exec jq '.openErrors' {} \; -``` - ---- - -## PR-1: P0 Bug Fix - -### Task 1: 修復 Bash error false positive - -#### 目標 - -避免 `exitCode === undefined` 被當成失敗,並收窄 error extraction 避免誤判。 - -#### 檔案 - -- `src/plugin.ts` - inline 修復 exitCode 判斷 -- `src/extractors.ts` - 收窄 `extractErrorsFromBash` -- `tests/plugin.test.ts` - **新增** plugin hook regression test - -#### 實作步驟 - -**1. 修改 `src/plugin.ts`** - inline exitCode 判斷 - -把: - -```ts -if (exitCode === 0 && command) { - clearErrorsForSuccessfulCommand(state, command); -} else if (exitCode !== 0) { - const errors = extractErrorsFromBash(command, outputText); - for (const error of errors) { - upsertOpenError(state, error); - } -} -``` - -改成: - -```ts -if (typeof exitCode !== "number") { - // Unknown exit status: do not extract and do not clear -} else if (exitCode === 0 && command) { - clearErrorsForSuccessfulCommand(state, command); -} else if (command) { - const errors = extractErrorsFromBash(command, outputText); - for (const error of errors) { - upsertOpenError(state, error); - } -} -``` - -**不新增 `bash-policy.ts`** - 直接 inline,邏輯簡單且只在一處使用。 - -**2. 修改 `src/extractors.ts`** - 收窄 error line 判斷 - -新增 `isErrorLine` 函數: - -```ts -function isErrorLine(line: string, knownValidationCommand: boolean): boolean { - // 無條件捕捉的強訊號 - if (/TS\d{4}|ERR!|Traceback \(most recent call last\):|panic:/i.test(line)) return true; - - // Error 類型前綴(獨立行) - if (/^\s*(Error|TypeError|ReferenceError|SyntaxError|Exception):/i.test(line)) { - return true; - } - - // 已知 validation command 才用寬鬆匹配 - if (knownValidationCommand) { - return /\b(error|failed|failure|exception)\b/i.test(line); - } - - return false; -} -``` - -修改 `extractErrorsFromBash()`: - -```ts -export function extractErrorsFromBash(command: string, output: string): OpenError[] { - const category = classifyCommand(command); - const knownValidationCommand = category !== null; - - const lines = output - .split("\n") - .filter(line => isErrorLine(line, knownValidationCommand)) - .slice(0, 5); - - if (lines.length === 0) return []; - - const finalCategory = category ?? "runtime"; - - // ... rest of function -} -``` - -**不新增 `isInspectionCommand`** - 依靠 `exitCode` guard + 收窄的 regex,後續有需要再加。 - -#### 測試 - -新增 `tests/extractors.test.ts`: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import { extractErrorsFromBash } from "../src/extractors.ts"; - -test("git log output mentioning errors is ignored", () => { - const errors = extractErrorsFromBash( - "cd /repo && rtk git log --oneline -5", - "4832b38 fix: silence memory load errors in working-memory" - ); - assert.equal(errors.length, 0); -}); - -test("cat session json with openErrors is ignored", () => { - const errors = extractErrorsFromBash( - "rtk cat ~/.local/share/opencode-working-memory/session.json", - '"openErrors": []' - ); - assert.equal(errors.length, 0); -}); - -test("typecheck failure is captured", () => { - const errors = extractErrorsFromBash( - "npm run typecheck", - "src/index.ts(10,3): error TS2345: bad type" - ); - assert.equal(errors.length, 1); - assert.equal(errors[0].category, "typecheck"); -}); - -test("runtime Error prefix is captured for failed unknown command", () => { - const errors = extractErrorsFromBash( - "node script.js", - "Error: Cannot find module './missing'" - ); - assert.equal(errors.length, 1); - assert.equal(errors[0].category, "runtime"); -}); - -test("unknown command with loose error words is ignored", () => { - const errors = extractErrorsFromBash( - "some-unknown-command", - "this output has errors in it but no clear signal" - ); - assert.equal(errors.length, 0); -}); -``` - -**新增 `tests/plugin.test.ts`** - Plugin hook regression test: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import { mkdtemp, rm } from "node:fs/promises"; -import { tmpdir } from "node:os"; -import { join } from "node:path"; -import { MemoryV2Plugin } from "../src/plugin.ts"; -import { loadSessionState, saveSessionState } from "../src/session-state.ts"; -import type { OpenError } from "../src/types.ts"; - -// Mock client for root session (not a sub-agent) -function mockRootClient() { - return { - session: { - get: async () => ({ data: { parentID: null } }), - messages: async () => ({ data: [] }), - }, - }; -} - -// Helper: create session state with pre-populated open error -function createSessionWithError(sessionID: string, error: OpenError) { - return { - version: 1 as const, - sessionID, - turn: 0, - updatedAt: new Date().toISOString(), - activeFiles: [], - openErrors: [error], - recentDecisions: [], - }; -} - -test("tool.execute.after: undefined exitCode does NOT create open error", async () => { - // 1. Temp directory for isolated file I/O - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - // 2. Mock client — root session, no user messages - const client = mockRootClient(); - - // 3. Instantiate plugin - const plugin = await MemoryV2Plugin({ directory: tmpDir, client }); - - // 4. Simulate bash output with NO exitCode, but output contains TS error - // This would create an open error if exitCode was non-zero - // Using STRONG error signal to catch the bug where undefined !== 0 - await (plugin as Record)["tool.execute.after"]( - { - tool: "bash", - sessionID: "test-session-1", - args: { command: "npm run typecheck" }, - }, - { - // exitCode deliberately absent (undefined !== 0 is the bug we're testing) - output: "src/index.ts(10,3): error TS2345: Argument of type 'string' is not assignable to parameter of type 'number'", - } - ); - - // 5. Assert: session state has ZERO open errors - const state = await loadSessionState(tmpDir, "test-session-1"); - assert.equal(state.openErrors.length, 0, - "exitCode === undefined must not create open errors even with strong error signal"); - - } finally { - // Cleanup - await rm(tmpDir, { recursive: true, force: true }); - } -}); - -test("tool.execute.after: undefined exitCode does NOT clear existing open error", async () => { - // 1. Temp directory - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - // 2. Pre-populate session state with a real open error - const preExistingError: OpenError = { - id: "err_critical_abc", - category: "typecheck", - summary: "TS2345: Argument of type 'string' is not assignable to parameter of type 'number'", - command: "npm run typecheck", - fingerprint: "ee7b3f9a1c2d", - status: "open", - firstSeen: Date.now() - 3600000, - lastSeen: Date.now() - 3600000, - seenCount: 3, - }; - - await saveSessionState(tmpDir, createSessionWithError("test-session-2", preExistingError)); - - // 3. Mock client - const client = mockRootClient(); - - // 4. Instantiate plugin - const plugin = await MemoryV2Plugin({ directory: tmpDir, client }); - - // 5. Simulate bash output with NO exitCode (inspection command) - // Using STRONG error signal (TS error) to verify undefined exitCode doesn't clear - await (plugin as Record)["tool.execute.after"]( - { - tool: "bash", - sessionID: "test-session-2", - args: { command: "rtk cat ~/.local/share/opencode-working-memory/session.json" }, - }, - { - // exitCode deliberately absent (undefined) - // Even with TS error in output, should NOT clear existing error - output: "src/other.ts(5,10): error TS2794: Expected 0 arguments, but got 1", - } - ); - - // 6. Assert: pre-existing open error is PRESERVED - const state = await loadSessionState(tmpDir, "test-session-2"); - assert.equal(state.openErrors.length, 1, - "exitCode === undefined must not clear pre-existing open errors"); - assert.equal(state.openErrors[0].fingerprint, "ee7b3f9a1c2d", - "The original open error must remain intact"); - - } finally { - // Cleanup - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -#### 驗收標準 - -- `exitCode === undefined` 不產生 open error -- `git log`、`cat` 等輸出不因 `errors` 字樣被誤判 -- `npm run typecheck` 失敗仍產生 typecheck error -- Plugin hook regression test 通過 -- `npm test && npm run typecheck` 通過 - ---- - -### Task 2: 修復 Workspace Memory XML 截斷 - -#### 目標 - -移除 `renderWorkspaceMemory()` 的 `.slice()` 截斷,改用 budget-aware 逐行 render,確保輸出永遠包含完整 `` closing tag。 - -#### 檔案 - -- `src/workspace-memory.ts` - 修改 render 邏輯 -- `tests/workspace-memory.test.ts` - 新增截斷測試 - -#### 實作步驟 - -1. 新增 `wouldFit` helper: - -```ts -function wouldFit( - lines: string[], - nextLine: string, - closingLine: string, - maxChars: number -): boolean { - return [...lines, nextLine, closingLine].join("\n").length <= maxChars; -} -``` - -2. 定義最小 envelope 長度: - -```ts -const MIN_ENVELOPE_LENGTH = 120; // \n...\n 的最小長度 -``` - -3. 修改 `renderWorkspaceMemory()` - 逐行加入直到超過 budget: - -```ts -export function renderWorkspaceMemory(store: WorkspaceMemoryStore): string { - const active = enforceLongTermLimits(store.entries); - if (active.length === 0) return ""; - - const maxChars = Math.min( - store.limits.maxRenderedChars, - LONG_TERM_LIMITS.maxRenderedChars - ); - - // 如果 maxChars 小於最小 envelope,返回空字串 - if (maxChars < MIN_ENVELOPE_LENGTH) return ""; - - const closing = ""; - const lines: string[] = [ - "", - "Persistent workspace memory. Use as background; verify stale or code-related claims.", - ]; - - for (const type of ["feedback", "project", "decision", "reference"] as const) { - const items = active.filter(entry => entry.type === type); - if (items.length === 0) continue; - - const sectionLines: string[] = [`${type}:`]; - - for (const item of items) { - const line = `- ${renderEntry(item)}`; - if (wouldFit([...lines, ...sectionLines], line, closing, maxChars)) { - sectionLines.push(line); - } - } - - if (sectionLines.length > 1 && wouldFit(lines, sectionLines[0], closing, maxChars)) { - lines.push(...sectionLines); - } - } - - lines.push(closing); - return lines.join("\n"); -} -``` - -#### 測試 - -新增 `tests/workspace-memory.test.ts`: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import type { LongTermMemoryEntry, WorkspaceMemoryStore } from "../src/types.ts"; -import { renderWorkspaceMemory } from "../src/workspace-memory.ts"; - -function entry(id: string, text: string): LongTermMemoryEntry { - const now = new Date().toISOString(); - return { - id, - type: "decision", - text, - source: "compaction", - confidence: 0.75, - status: "active", - createdAt: now, - updatedAt: now, - }; -} - -test("renderWorkspaceMemory never truncates closing XML tag", () => { - const entries = Array.from({ length: 28 }, (_, i) => - entry(`mem_${i}`, `Long durable memory entry ${i} `.repeat(20)) - ); - - const store: WorkspaceMemoryStore = { - version: 1, - workspace: { root: "/repo", key: "abc" }, - limits: { maxRenderedChars: 700, maxEntries: 28 }, - entries, - updatedAt: new Date().toISOString(), - }; - - const rendered = renderWorkspaceMemory(store); - - assert.ok(rendered.endsWith("")); - assert.ok(rendered.length <= 700); -}); - -test("renderWorkspaceMemory returns empty string when maxChars too small", () => { - const store: WorkspaceMemoryStore = { - version: 1, - workspace: { root: "/repo", key: "abc" }, - limits: { maxRenderedChars: 50, maxEntries: 28 }, - entries: [entry("test", "test memory")], - updatedAt: new Date().toISOString(), - }; - - const rendered = renderWorkspaceMemory(store); - assert.equal(rendered, ""); -}); -``` - -#### 驗收標準 - -- 輸出永遠包含完整 closing tag -- 長 memory 不會截斷半行 -- `maxChars` 小於最小 envelope 時返回空字串 -- `npm test && npm run typecheck` 通過 - ---- - -### Task 3: 移除裸 `always` trigger - -#### 目標 - -移除 `always` trigger,避免 "tests always fail" 被誤判為 explicit memory。 - -#### 檔案 - -- `src/extractors.ts` - 修改 `extractExplicitMemories` - -#### 實作步驟 - -把: - -```ts -/(?:从现在开始|從現在開始|从今以后|從今以後|from now on|always)[::]?\s*(.+)$/im -``` - -改為: - -```ts -/(?:从现在开始|從現在開始|从今以后|從今以後|from now on|going forward)[::,,]?\s*(.+)$/gim -``` - -**注意**:所有 pattern 都必須有 `g` flag,因為後續使用 `matchAll()`。 - -#### 測試 - -新增到 `tests/extractors.test.ts`(需要補上 `extractExplicitMemories` import): - -```ts -// 注意:需要在文件開頭補上 import -// import { extractErrorsFromBash, extractExplicitMemories } from "../src/extractors.ts"; - -test("extractExplicitMemories does not treat always as memory trigger", () => { - const items = extractExplicitMemories("tests always fail on CI when cache is stale"); - assert.equal(items.length, 0); -}); - -test("extractExplicitMemories still captures going forward", () => { - const items = extractExplicitMemories("going forward: use pnpm instead of npm"); - assert.equal(items.length, 1); - assert.match(items[0].text, /pnpm/); -}); -``` - -#### 驗收標準 - -- `always` 不再是 memory trigger -- `going forward` 仍可正常記憶 -- `npm test && npm run typecheck` 通過 - ---- - -### Task 4: 整體驗證與清理 - -#### 實作步驟 - -1. 執行完整測試: - -```bash -npm test -npm run typecheck -``` - -2. 手動測試: - -```bash -cd /Users/sd_wo/work/opencode-working-memory -rtk git log --oneline -5 -rtk cat ~/.local/share/opencode-working-memory/workspaces/*/sessions/*.json -``` - -確認不產生新的 false positive open errors。 - -3. 清理既有的 false positive open errors: - -備份後清理: - -```bash -# 備份 -cp ~/.local/share/opencode-working-memory/workspaces/*/sessions/*.json /tmp/sessions-backup/ - -# 清理(手動編輯或用腳本) -# 把 openErrors 設為 [] -``` - -#### 驗收標準 - -- P0 所有單元測試通過 -- 實際 session 中不再出現 false positive -- Hot Session State 的 open errors 只保留真實 validation/runtime error - ---- - -## PR-2: 行為改善 - -### Task 5: Canonical exact dedupe (Phase 1) - -#### 目標 - -升級 exact dedupe 為 canonical exact dedupe,解決標點、大小寫、空白差異。 - -**注意**:這是 phase 1,不涉及 Jaccard similarity。真正處理「OpenCode 用 npm cache 載入 plugin」和「OpenCode 載入 plugin 時使用 npm cache」這類語意相似但文字不同的情況,是 phase 2(延後到收集數據後再決定)。 - -#### 檔案 - -- `src/workspace-memory.ts` - 修改 canonical key 計算 -- `tests/workspace-memory.test.ts` - 新增 dedupe 測試 - -#### 實作步驟 - -**1. 新增 source priority 函數**: - -```ts -function sourcePriority(source: LongTermMemoryEntry["source"]): number { - if (source === "explicit") return 3; - if (source === "manual") return 2; - return 1; -} -``` - -**2. 新增 canonical text 函數**: - -```ts -function canonicalMemoryText(text: string): string { - return text - .normalize("NFKC") - .toLowerCase() - .replace(/[\s\p{P}]+/gu, " ") - .trim(); -} -``` - -**3. 修改 `enforceLongTermLimits()`**: - -```ts -export function enforceLongTermLimits(entries: LongTermMemoryEntry[]): LongTermMemoryEntry[] { - const byKey = new Map(); - - for (const entry of entries.filter(entry => entry.status === "active")) { - const text = entry.text.slice(0, LONG_TERM_LIMITS.maxEntryTextChars); - const key = `${entry.type}:${canonicalMemoryText(text)}`; - - const existing = byKey.get(key); - - // Source priority: explicit > manual > compaction - // Same source: higher confidence wins - if (!existing) { - byKey.set(key, { ...entry, text }); - } else if (sourcePriority(entry.source) > sourcePriority(existing.source)) { - byKey.set(key, { ...entry, text }); - } else if (sourcePriority(entry.source) === sourcePriority(existing.source)) { - if (entry.confidence > existing.confidence) { - byKey.set(key, { ...entry, text }); - } - } - } - - return [...byKey.values()] - .sort((a, b) => priority(b) - priority(a)) - .slice(0, LONG_TERM_LIMITS.maxEntries); -} -``` - -#### 測試 - -```ts -test("enforceLongTermLimits dedupes with canonical text", () => { - const now = new Date().toISOString(); - - const a: LongTermMemoryEntry = { - id: "a", - type: "decision", - text: "OpenCode uses NPM CACHE for plugin loading", - source: "compaction", - confidence: 0.75, - status: "active", - createdAt: now, - updatedAt: now, - }; - - const b: LongTermMemoryEntry = { - id: "b", - type: "decision", - text: "opencode uses npm cache for plugin loading!!!", - source: "compaction", - confidence: 0.8, - status: "active", - createdAt: now, - updatedAt: now, - }; - - const kept = enforceLongTermLimits([a, b]); - - assert.equal(kept.length, 1); - assert.equal(kept[0].confidence, 0.8); -}); - -test("enforceLongTermLimits preserves explicit over compaction", () => { - const now = new Date().toISOString(); - - const explicit: LongTermMemoryEntry = { - id: "explicit", - type: "decision", - text: "Use pnpm for this project", - source: "explicit", - confidence: 0.5, - status: "active", - createdAt: now, - updatedAt: now, - }; - - const compaction: LongTermMemoryEntry = { - id: "compaction", - type: "decision", - text: "Use pnpm for this project", - source: "compaction", - confidence: 0.9, - status: "active", - createdAt: now, - updatedAt: now, - }; - - const kept = enforceLongTermLimits([explicit, compaction]); - - assert.equal(kept.length, 1); - assert.equal(kept[0].source, "explicit"); - assert.equal(kept[0].confidence, 0.5); // explicit 優先,即使 confidence 較低 -}); - -test("enforceLongTermLimits same source higher confidence wins", () => { - const now = new Date().toISOString(); - - const a: LongTermMemoryEntry = { - id: "a", - type: "decision", - text: "Project uses TypeScript", - source: "compaction", - confidence: 0.7, - status: "active", - createdAt: now, - updatedAt: now, - }; - - const b: LongTermMemoryEntry = { - id: "b", - type: "decision", - text: "Project uses TypeScript", - source: "compaction", - confidence: 0.9, - status: "active", - createdAt: now, - updatedAt: now, - }; - - const kept = enforceLongTermLimits([a, b]); - - assert.equal(kept.length, 1); - assert.equal(kept[0].confidence, 0.9); -}); -``` - -#### 驗收標準 - -- 大小寫、標點、空白差不影響 dedupe -- explicit 永遠優先於 compaction(即使 confidence 較低) -- 同 source 時 higher confidence 勝出 -- `npm test && npm run typecheck` 通過 - ---- - -### Task 6: Structured negative guard - -#### 目標 - -避免「不要記住」「don't remember」被存入長期記憶,使用結構化 adjacency 判斷。 - -#### 檔案 - -- `src/extractors.ts` - 修改 `extractExplicitMemories` - -#### 實作步驟 - -**1. 確認所有 pattern 都有 `g` flag** - -```ts -const patterns = [ - // 注意:所有pattern必須有 g flag,因為使用 matchAll() - /(?:请|請)?(?:帮我|幫我)?(?:记住|記住)(?:这一点|這一點|这点|這點|这个|這個)?[::,,]?\s*(.+)$/gim, - /\bremember\s+(?:this|that)?[::,,]?\s*(.+)$/gim, - /\b(?:save|add)\s+(?:this|that)?\s*(?:to|in)\s+memory[::,,]?\s*(.+)$/gim, - /\bcommit\s+(?:this|that)?\s*to memory[::,,]?\s*(.+)$/gim, - /(?:从现在开始|從現在開始|从今以后|從今以後|from now on|going forward)[::,,]?\s*(.+)$/gim, - /(?:我的偏好是|我偏好|以后请|以後請|以后都|以後都)[::,,]?\s*(.+)$/gim, - /\b(?:my preference is|i prefer)[::,,]?\s*(.+)$/gim, -]; -``` - -**2. 新增 `isNegatedMemoryRequest` 函數**: - -```ts -function isNegatedMemoryRequest(text: string, matchIndex: number): boolean { - const prefix = text.slice(Math.max(0, matchIndex - 30), matchIndex); - - // 中文負向:不要/別/不用 + 可選「幫我」,必須緊鄰 trigger - if (/(?:不要|別|别|不用|不需要|勿)\s*(?:幫我|帮我)?\s*$/u.test(prefix)) { - return true; - } - - // 英文負向:do not / don't / never / not + 可選「please」,必須緊鄰 trigger - if (/(?:do\s+not|don't|dont|never|not)\s+(?:please\s+)?$/i.test(prefix)) { - return true; - } - - return false; -} -``` - -**3. 在 `extractExplicitMemories()` 中使用**: - -```ts -const seen = new Set(); - -for (const pattern of patterns) { - // 注意:pattern 必須有 g flag - for (const match of text.matchAll(pattern)) { - const body = match[1]?.trim(); - if (!body || body.length < 8) continue; - - // 檢查是否為負向請求 - if (isNegatedMemoryRequest(text, match.index ?? 0)) continue; - - // 檢查是否為「再說/下次」類的延遲 - if (/^(再说|再說|later|next time)$/i.test(body)) continue; - - // Dedupe by canonical body - const key = body.toLowerCase().replace(/\s+/g, " ").trim(); - if (seen.has(key)) continue; - seen.add(key); - - // ...rest of function - } -} -``` - -#### 測試 - -```ts -test("extractExplicitMemories ignores Chinese negative request", () => { - const items = extractExplicitMemories("不要記住:這個 repo 使用 npm cache"); - assert.equal(items.length, 0); -}); - -test("extractExplicitMemories ignores English negative request", () => { - const items = extractExplicitMemories("please don't remember this: use npm cache"); - assert.equal(items.length, 0); -}); - -test("extractExplicitMemories does not false positive on 'not forget'", () => { - const items = extractExplicitMemories("I will not forget to remember this"); - assert.equal(items.length, 0); -}); - -test("extractExplicitMemories still captures positive request", () => { - const items = extractExplicitMemories("from now on: reply in Traditional Chinese"); - assert.equal(items.length, 1); -}); - -test("extractExplicitMemories captures multiple memories in same message", () => { - const items = extractExplicitMemories("請記住:使用 pnpm\n記住這點:用 TypeScript"); - assert.equal(items.length, 2); -}); -``` - -#### 驗收標準 - -- 「不要記住」不產生 memory -- `don't remember` 不產生 memory -- `I will not forget to remember` 不被誤判 -- `from now on` 仍可正常記憶 -- 所有 pattern 都有 `g` flag -- `npm test && npm run typecheck` 通過 - ---- - -### Task 7: Compaction quality gate - -#### 目標 - -避免低品質 candidate 被寫入 Workspace Memory。 - -#### 檔案 - -- `src/extractors.ts` - 在 `parseWorkspaceMemoryCandidates()` 內部套用 `shouldAcceptWorkspaceMemoryCandidate()` -- `tests/extractors.test.ts` - 透過 `parseWorkspaceMemoryCandidates()` 驗證 reject/accept 行為 - -#### 實作要點 - -**1. 明確的 predicate(放在 extractors.ts 內部)**: - -```ts -function shouldAcceptWorkspaceMemoryCandidate(entry: { - type: LongTermType; - text: string; -}): boolean { - const text = entry.text.trim(); - - // 太短 - if (text.length < 20) return false; - - // Git history / commit hash - if (/\b[0-9a-f]{7,40}\b/.test(text)) return false; - if (/^(fix|feat|chore|docs|refactor|test):/i.test(text)) return false; - - // Raw error / stack trace - if (/^\s*(Error|TypeError|ReferenceError|SyntaxError):/i.test(text)) return false; - if (/at \S+ \([^)]+:\d+:\d+\)/.test(text)) return false; - - // Active file list - if (/^(modified|created|deleted|renamed)\s+\S+\.\S+$/i.test(text)) return false; - - // Temporary progress - if (/^(currently|now|pending|in progress|todo|wip):/i.test(text)) return false; - - // Code signature / API doc - if (/^(function|class|interface|type|const|let|var)\s+\w+/.test(text)) return false; - if (/^(GET|POST|PUT|DELETE|PATCH)\s+\//.test(text)) return false; - - // Path-heavy facts (rediscoverable from repo) - const pathCount = (text.match(/\/[\w.-]+(\/[\w.-]+)+/g) || []).length; - if (pathCount > 2) return false; - - return true; -} -``` - -**2. 更新 `memoryCandidateInstruction()`**: - -```ts -function memoryCandidateInstruction(): string { - return ` -At the end of the compaction summary, include: - - -- [feedback] ... -- [project] ... -- [decision] ... -- [reference] ... - - -Only include durable information useful across future sessions in this exact workspace. -Do NOT include: -- Active file lists or temporary progress -- Raw errors, stack traces, or git history -- Code signatures, function names, or API docs -- Facts easily rediscovered from the repository - -Write candidates in the same dominant language as existing workspace memory or the user's current language. -Keep code identifiers, paths, commands, package names, and product names unchanged. -Do not rephrase existing workspace memory as a new candidate, even if worded differently. - -For decisions, include rationale in one sentence. -If nothing qualifies, output an empty block. -`.trim(); -} -``` - -#### 測試矩陣 - -```ts -const QUALITY_GATE_TESTS = [ - // Reject - { text: "fix: update plugin config", expected: false }, - { text: "Error: Cannot find module", expected: false }, - { text: "at Plugin.run (plugin.ts:42:15)", expected: false }, - { text: "modified src/index.ts", expected: false }, - { text: "currently working on tests", expected: false }, - { text: "function processMemory()", expected: false }, - { text: "GET /api/users", expected: false }, - { text: "Path: /Users/foo/bar/baz/qux.ts", expected: false }, - - // Accept - { text: "Use pnpm for this project", expected: true }, - { text: "OpenCode loads plugins from npm cache, not npm link", expected: true }, - { text: "Workspace memory stored at ~/.local/share/opencode-working-memory", expected: true }, -]; - -for (const { text, expected } of QUALITY_GATE_TESTS) { - test(`shouldAcceptWorkspaceMemoryCandidate: "${text.slice(0, 30)}..."`, () => { - const result = shouldAcceptWorkspaceMemoryCandidate({ type: "decision", text }); - assert.equal(result, expected); - }); -} -``` - -#### 驗收標準 - -- Git history / commit hash 被拒絕 -- Raw error / stack trace 被拒絕 -- Active file list 被拒絕 -- Temporary progress 被拒絕 -- Code signature 被拒絕 -- Path-heavy facts 被拒絕 -- Durable facts 被接受 -- `npm test && npm run typecheck` 通過 - ---- - -### Task 8: Stale cleanup / penalty (未來) - -#### 目標 - -讓 `staleAfterDays` 不只是 render marker,而會影響排序與保留。 - -#### 延後原因 - -需要先收集 production 數據,確認有多少 stale memory 實際存在。 - ---- - -### Task 9: Per-type quota (未來) - -#### 目標 - -避免單一類型擠掉其他類型。 - -#### 延後原因 - -目前尚未達到 28 entries 上限,需先收集數據。 - ---- - -## 執行順序 - -### 本週:PR-1 - -1. Baseline snapshot -2. Task 1: inline exitCode + 收窄 extractErrorsFromBash + **plugin hook regression test** ✅ DONE -3. Task 2: budget-aware render + **min envelope handling** ✅ DONE -4. Task 3: remove bare `always` + **ensure all patterns have `g` flag** ✅ DONE -5. Manual verification -6. Cleanup false positives - -### Hotfix: 紫色斜體渲染問題 - -**問題**:Plugin compaction context 輸出在 OpenCode UI 中顯示為紫色斜體。 - -**根因分析**: -1. 第一次嘗試:XML 標籤 `` → 紫色斜體 -2. 第二次嘗試:HTML 註釋 `` → 仍然紫色斜體 -3. 第三次嘗試:Markdown 標題 `## Memory Candidates` → 紫色(無斜體) -4. 第四次嘗試:純文本標籤 `Memory candidates:` → 無特殊渲染 ✅ - -**解決方案**:架構師建議使用純文本標籤,避免所有 Markdown/XML/HTML 語法。 - -**修改內容**: -- `src/plugin.ts`: `compactionContextHeader()` 改用 `Memory candidates:` 標籤 -- `src/plugin.ts`: `renderTodosForCompaction()` 改用 `Pending todos:` 標籤 -- `src/extractors.ts`: `extractCandidateBlock()` 支援純文本格式解析(primary) -- `src/workspace-memory.ts`: `renderWorkspaceMemory()` 使用純文本 `Workspace memory:` 標籤 -- `src/session-state.ts`: `renderHotSessionState()` 使用純文本 `Hot session state:` 標籤 -- 移除 `stripXmlTags()` 函數(不再需要) - -**測試**:42 個測試全部通過。 - -### 下週:PR-2 - -5. Task 5: canonical exact dedupe + **source priority** -6. Task 6: structured negative guard + **all patterns with `g` flag** -7. Task 7: compaction quality gate + **predicate + test matrix** - -### 未來(視數據決定) - -8. Task 8: stale cleanup -9. Task 9: per-type quota -10. Near-duplicate Jaccard (write-time, not render-time) - ---- - -## 測試覆蓋清單 - -### PR-1 Coverage - -| 測試項目 | 狀態 | -|----------|------| -| git log output mentioning errors ignored | ★★★ planned | -| cat session json with openErrors ignored | ★★★ planned | -| typecheck failure captured | ★★★ planned | -| unknown command loose errors ignored | ★★★ planned | -| **plugin hook with exitCode undefined** | **★★★ CRITICAL** | -| render ends with closing tag | ★★★ planned | -| render respects maxChars limit | ★★★ planned | -| **min envelope returns empty** | **★★★ planned** | -| `always` not a trigger | ★★★ planned | -| `going forward` still works | ★★★ planned | - -### PR-2 Coverage - -| 測試項目 | 狀態 | -|----------|------| -| canonical text dedupes | ★★★ planned | -| **explicit beats compaction** | **★★★ planned** | -| **same source higher confidence wins** | **★★★ planned** | -| Chinese negative ignored | ★★★ planned | -| English negative ignored | ★★★ planned | -| "not forget" not misjudged | ★★★ planned | -| **all patterns have `g` flag** | **★★★ planned** | -| quality gate rejects git history | ★★★ planned | -| quality gate rejects stack trace | ★★★ planned | -| quality gate accepts durable facts | ★★★ planned | \ No newline at end of file diff --git a/docs/superpowers/plans/2026-04-26-workspace-memory-cleanup-migration.md b/docs/superpowers/plans/2026-04-26-workspace-memory-cleanup-migration.md deleted file mode 100644 index 4be4717..0000000 --- a/docs/superpowers/plans/2026-04-26-workspace-memory-cleanup-migration.md +++ /dev/null @@ -1,702 +0,0 @@ -# Workspace Memory Cleanup Migration Plan (v2) - -## Status: APPROVED (v3) - -## Problem Statement - -Audit of recent workspace memories found quality issues in pre-v1.2.1 stores: - -### Issue 1: Snapshot Violations (P0) - -| Workspace | Entry | Type | -|-----------|-------|------| -| opencode-record | `測試套件:1237 tests pass, 226 suites` | Test count | -| opencode-record | `USB 同步:37 個文件(...)` | File count (Chinese) | -| opencode-record | `pathology-playground...已完成 Phase 1-4` | Phase progress | -| pathology-agent-reports | `Waves 1-5, 7 已完成,Wave 6 deferred` | Wave progress | - -**Root Cause**: These entries were created before P0c/P0d fix (08:02:32). Current code would reject them. - -**Risk**: Medium. Pollutes long-term memory, wastes tokens. - -### Issue 2: Sensitive Credentials (P0) - -| Workspace | Entry | Risk | -|-----------|-------|------| -| opencode-record | `Admin PIN 是 456123` | **High** - Raw credential | -| Pre-cancer-atlas | `測試用戶名:shihlab,密碼:sushi` | **High** - Raw credential | - -**Root Cause**: No credential redaction in compaction extraction or storage normalization. - -**Risk**: High. Credentials sent to model in every compaction prompt. - -### Issue 3: Wave/Sprint Not Filtered (P0) - -| Pattern | Status | -|---------|--------| -| `Phase 1-4 已完成` | ✅ Filtered by P0c | -| `Wave 1-5 已完成` | ❌ Not filtered | - -**Root Cause**: P0c filter only covers `Phase`, not `Wave/Sprint/Milestone/Task`. - -**Risk**: Medium. New snapshots still enter memory. - -### Issue 4: Duplicates (P1) - -| Workspace | Entry | Issue | -|-----------|-------|-------| -| Pre-cancer-atlas | `認證使用 Basic Auth...` x2 | Exact duplicate | -| Pre-cancer-atlas | `IP 隱私...` x2 | Semantic duplicate | -| Pre-cancer-atlas | `Cloud Run...` project + reference | Cross-type duplicate | - -**Root Cause**: `extractEntityKey()` only recognizes `opencode-agenthub`. Natural canonical dedup handles exact duplicates. - -**Risk**: Low. Wastes tokens but not dangerous. - ---- - -## Architect Review Failures (v1, v2) - -### v1 Failures - -| Issue | Problem | -|-------|---------| -| Regex | `Waves` not matched, Chinese `\b` unreliable | -| Superseded entries | Would be deleted by `enforceLongTermLimits()` | -| Credential redaction | Was migration-gated, must be always-on | -| Wave filter | Deferred to future, must be now | -| Over-broad | `Upload limit is 10 files` would be flagged | -| Rationale | Only redacted `text`, not `rationale` | - -### v2 Failures - -| Issue | Problem | -|-------|---------| -| File context | `upload` matches `Upload limit`, false positive | -| Explicit check | Missing `source === "explicit"` check before marking | -| Credential regex | `\S+` captures through Chinese comma tail | -| Filter location | Don't filter in `getFrozenWorkspaceMemory()` | - ---- - -## Proposed Solution (v3) - -### Architecture Principle - -``` - ┌─────────────────────────────────┐ - │ normalizeWorkspaceMemory() │ - │ │ - │ 1. ALWAYS redact credentials │ - │ (not migration-gated) │ - │ │ - │ 2. Mark legacy snapshots as │ - │ superseded (migration-gated)│ - │ │ - │ 3. Preserve superseded entries │ - │ in storage, exclude from │ - │ render │ - └─────────────────────────────────┘ -``` - -### Key Design Decisions - -1. **Credential redaction is always-on** - runs on every normalize, independent of migration ID -2. **Snapshot marking is migration-gated** - one-time cleanup for legacy entries -3. **Superseded entries preserved in storage** - but excluded from render -4. **Type restriction for snapshots** - only `project` type, avoid false positives -5. **Wave/Sprint/Milestone filter added now** - not deferred - ---- - -## Implementation - -### 1. Add Migration Tracking to Type - -```typescript -// src/types.ts - -interface WorkspaceMemoryStore { - version: number; - workspace: { root: string; key: string }; - limits: { maxRenderedChars: number; maxEntries: number }; - entries: LongTermMemoryEntry[]; - migrations?: string[]; // NEW: track applied migrations - updatedAt: string; -} - -const MIGRATION_ID = "2026-04-26-p0-cleanup"; -``` - -### 2. Snapshot Detection (Revised Regex) - -```typescript -// src/workspace-memory.ts - -/** - * Detect snapshot violations in text. - * Only apply to 'project' type entries with source !== 'explicit'. - */ -function isProjectSnapshotViolation(text: string): boolean { - // Test/suite counts - if (/\d+\s+tests?\s+pass(?:ed)?/i.test(text)) return true; - if (/\d+\s+suites?\s+(?:pass|fail)/i.test(text)) return true; - - // File counts (Chinese/English) - require sync/completion context - // And must NOT be a limit/maximum statement - if (/\d+\s*(?:個|个)?\s*(?:files?|文件)/i.test(text)) { - const hasSnapshotContext = /同步|synced|uploaded|downloaded|completed|generated|created|modified|processed|完成/i.test(text); - const hasLimitContext = /limit|max|maximum|min|minimum|supports?|allowed|per\s+(?:batch|request|upload)/i.test(text); - if (hasSnapshotContext && !hasLimitContext) return true; - } - - // Phase/Wave/Sprint/Milestone progress - // English: Phase 1-4 completed, Waves 1-5 done - if (/(?:phases?|waves?|sprints?|milestones?|tasks?)\s*\d+(?:\s*[-–]\s*\d+)?/i.test(text)) { - if (/completed|done|finished|完成/i.test(text)) return true; - } - // Chinese: 已完成 Phase 1-4 - if (/(?:已完成|完成).{0,30}(?:phases?|waves?|sprints?|milestones?|tasks?)/i.test(text)) return true; - - return false; -} -``` - -### 3. Credential Redaction (Always-On) - -```typescript -// src/workspace-memory.ts - -/** - * Bounded secret value pattern - stops at delimiters and Chinese punctuation. - * Avoids capturing through Chinese commas: 密碼:sushi,用於測試 - */ -const SECRET_VALUE = String.raw`[^` + "`" + String.raw`'",,,\s]+`; - -/** - * Multilingual credential labels. - * These are used in both detection and redaction patterns. - */ -const PASSWORD_LABELS = /password|passwd|pwd|密碼|密码|パスワード|비밀번호|contraseña|mot de passe|passwort/i; -const USERNAME_LABELS = /username|user name|用戶名|用户名|ユーザー名|사용자명|usuario|utilisateur|benutzer/i; - -/** - * Prefix patterns that capture label + delimiter together. - * This preserves the delimiter in output: 密碼:secret → 密碼:[REDACTED] - */ -const PASSWORD_PREFIX = String.raw`(${PASSWORD_LABELS.source}\s*(?:是|=|:|:)?\s*)`; -const USERNAME_PREFIX = String.raw`(${USERNAME_LABELS.source}\s*(?:是|=|:|:)?\s*)`; - -/** - * Redact sensitive credentials from text. - * This runs on EVERY normalize, not just migration. - * Idempotent - [REDACTED] doesn't match patterns again. - * - * Order matters: - * 1. PIN (standalone) - * 2. Username+password pairs (must run before standalone password) - * 3. Standalone password - */ -function redactCredentials(text: string): string { - let result = text; - - // 1. PIN patterns (language-neutral, supports 是, =, :, :) - result = result.replace( - new RegExp(String.raw`\b(PIN|pin)\s*(?:是|=|:|:)?\s*[`'"]?(${SECRET_VALUE})`, 'gi'), - '$1 [REDACTED]' - ); - - // 2. Username+Password pairs (multilingual) - // Must run BEFORE standalone password to match full pairs. - // 測試用戶名:xxx,密碼:yyy - // username: xxx, password: yyy - result = result.replace( - new RegExp( - String.raw`${USERNAME_PREFIX}[\`'"]?(${SECRET_VALUE})((?:,|,)\s*)${PASSWORD_PREFIX}[\`'"]?(${SECRET_VALUE})`, - 'gi' - ), - '$1[REDACTED]$3$4[REDACTED]' - ); - - // 3. Standalone password patterns (multilingual) - // Matches: password: secret, 密碼:secret, パスワード: secret, etc. - result = result.replace( - new RegExp(String.raw`${PASSWORD_PREFIX}[\`'"]?(${SECRET_VALUE})`, 'gi'), - '$1[REDACTED]' - ); - - return result; -} -``` - -### 4. Migration Function (One-Time) - -```typescript -// src/workspace-memory.ts - -function runMigrationP0Cleanup( - store: WorkspaceMemoryStore, - nowIso: string -): WorkspaceMemoryStore { - // Check if already run - if (store.migrations?.includes(MIGRATION_ID)) { - return store; - } - - const entries = store.entries.map(entry => { - // Skip explicit entries - user-added memories are preserved - if (entry.source === "explicit") { - return entry; - } - - // Skip non-project types for snapshot marking - // (Only project entries had snapshot pollution) - if (entry.type !== "project") { - return entry; - } - - // Mark legacy snapshot violations as superseded - if (isProjectSnapshotViolation(entry.text)) { - return { - ...entry, - status: "superseded" as const, - updatedAt: nowIso, - }; - } - - return entry; - }); - - return { - ...store, - entries, - migrations: [...(store.migrations || []), MIGRATION_ID], - updatedAt: nowIso, - }; -} -``` - -### 5. Normalize with Always-On Credential Redaction - -```typescript -// src/workspace-memory.ts - -// Preserve existing normalization behavior -async function normalizeWorkspaceMemory( - root: string, - store: WorkspaceMemoryStore, -): Promise { - const nowIso = new Date().toISOString(); - - // Start with existing store normalization - let result: WorkspaceMemoryStore = { - ...store, - workspace: { root, key: await workspaceKey(root) }, - limits: { - maxRenderedChars: store.limits?.maxRenderedChars ?? LONG_TERM_LIMITS.maxRenderedChars, - maxEntries: store.limits?.maxEntries ?? LONG_TERM_LIMITS.maxEntries, - }, - entries: Array.isArray(store.entries) ? store.entries : [], - updatedAt: nowIso, - }; - - // ALWAYS-ON: Redact credentials in all entries - // This must run regardless of migration status - result.entries = result.entries.map(entry => { - const text = redactCredentials(entry.text); - const rationale = entry.rationale - ? redactCredentials(entry.rationale) - : undefined; - - if (text === entry.text && rationale === entry.rationale) { - return entry; - } - - return { - ...entry, - text, - rationale, - updatedAt: nowIso, - }; - }); - - // ONE-TIME: Mark legacy snapshots as superseded - result = runMigrationP0Cleanup(result, nowIso); - - // Remove superseded from active rendering - const activeEntries = result.entries.filter(e => e.status !== "superseded"); - - // Apply dedup and limits to active entries only - const processed = enforceLongTermLimits(activeEntries); - - // Merge back: active entries + superseded entries (preserved in storage) - const superseded = result.entries.filter(e => e.status === "superseded"); - - return { - ...result, - entries: [...processed, ...superseded], - updatedAt: nowIso, - }; -} -``` - -### 6. Extend P0c Snapshot Filter (Not Deferred) - -```typescript -// src/extractors.ts - -// Add to isProjectSnapshotViolation() or equivalent filter - -// File counts - require snapshot context AND NOT limit context -const FILE_COUNT_PATTERN = /\d+\s*(?:個|个)?\s*(?:files?|文件)/i; -const FILE_SNAPSHOT_CONTEXT = /同步|synced|uploaded|downloaded|completed|generated|created|modified|processed|完成/i; -const FILE_LIMIT_CONTEXT = /limit|max|maximum|min|minimum|supports?|allowed|per\s+(?:batch|request|upload)/i; - -if (FILE_COUNT_PATTERN.test(text)) { - if (FILE_SNAPSHOT_CONTEXT.test(text) && !FILE_LIMIT_CONTEXT.test(text)) { - return true; // snapshot violation - } -} - -// Test/suite counts -if (/\d+\s+tests?\s+pass(?:ed)?/i.test(text)) return true; -if (/\d+\s+suites?\s+(?:pass|fail)/i.test(text)) return true; - -// Phase/Wave/Sprint/Milestone progress -if (/(?:phases?|waves?|sprints?|milestones?|tasks?)\s*\d+(?:\s*[-–]\s*\d+)?/i.test(text)) { - if (/completed|done|finished|完成/i.test(text)) return true; -} -if (/(?:已完成|完成).{0,30}(?:phases?|waves?|sprints?|milestones?|tasks?)/i.test(text)) return true; -``` - -**Note**: Do NOT use bare `upload|download` as context. Use past-tense verbs or process states. - ---- - -## Test Cases - -### Credential Redaction (Always-On) - -| Input | Expected Output | -|-------|-----------------| -| `Admin PIN 是 456123` | `Admin PIN 是 [REDACTED]` | -| `Admin PIN = 456123` | `Admin PIN = [REDACTED]` | -| `Admin PIN 456123` | `Admin PIN [REDACTED]` | -| `密碼:sushi` | `密碼:[REDACTED]` | -| `密码:sushi` | `密码:[REDACTED]` | -| `password: abc-123!` | `password: [REDACTED]` | -| `パスワード:secret` | `パスワード:[REDACTED]` | -| `비밀번호: secret` | `비밀번호: [REDACTED]` | -| `測試用戶名:shihlab,密碼:sushi` | `測試用戶名:[REDACTED],密碼:[REDACTED]` | -| `密碼:sushi,用於測試` | `密碼:[REDACTED],用於測試` | -| Credential in rationale | Redacted in both text and rationale | -| Explicit entry with PIN | Redacted, preserved | -| `[REDACTED]` in text | No change (idempotent) | - -### Snapshot Detection - -| Input | type | source | Is Violation? | -|-------|------|--------|---------------| -| `1237 tests pass, 226 suites` | project | compaction | ✅ Yes | -| `USB 同步:37 個文件` | project | compaction | ✅ Yes | -| `Phase 1-4 已完成` | project | compaction | ✅ Yes | -| `Waves 1-5 已完成` | project | compaction | ✅ Yes | -| `Upload limit is 10 files` | project | compaction | ❌ No (has "limit" context) | -| `Project supports 5 test suites` | project | compaction | ❌ No (no pass/fail) | -| `Phase 1-4 已完成` | project | explicit | ❌ No (explicit preserved) | -| Snapshot text | feedback | compaction | ❌ No (only project type) | -| Snapshot text | decision | compaction | ❌ No (only project type) | - -### Migration Behavior - -| Test | Description | -|------|-------------| -| Run once | Migration ID added | -| Run twice | No duplicate ID, entries unchanged | -| Non-project entry | Not marked superseded | -| Project snapshot | Marked superseded | -| Explicit project snapshot | Not marked (source check before type) | -| Credential in snapshot | Redacted, then marked superseded | - -### Integration Tests - -| Test | Description | -|------|-------------| -| `saveWorkspaceMemory()` | Superseded entries preserved in JSON | -| `updateWorkspaceMemory()` | Credential redaction runs on second normalize | -| New entry with PIN | Redacted on save (always-on) | -| `normalizeWorkspaceMemory()` | Preserves workspace root/key, limits, updatedAt | -| Memory render | Superseded entries excluded via `enforceLongTermLimits()` | - -### Extractor Tests - -| Input | Expected | -|-------|----------| -| `Upload limit is 10 files` | NOT a snapshot violation (has "limit" context) | -| `USB uploaded 37 files` | Snapshot violation (has "uploaded" process context) | -| `Project supports 5 test suites` | NOT a snapshot violation (no pass/fail context) | -| `1237 tests passed` | Snapshot violation (test count with pass) | - ---- - -## Edge Cases - -| Case | Handling | -|------|----------| -| Entry is explicit + snapshot | Not marked (source check before type check) | -| Entry has both snapshot + credential | Credential redacted, snapshot marked | -| Entry is already superseded | Keep status, still redact credentials | -| Migration runs twice | Skip if ID present | -| Store has no migrations field | Create empty array | -| `Upload limit is 10 files` | Not marked (has "limit" context) | -| Password with punctuation `abc-123!` | Captured by bounded pattern | -| Chinese comma after credential `密碼:sushi,用於測試` | Redact preserves `,用於測試` | -| Simplified Chinese `密码` | Preserved as `密码:[REDACTED]` | - ---- - -## Implementation Order - -1. Add `migrations` field to `WorkspaceMemoryStore` type -2. Add snapshot patterns to `src/extractors.ts` (not deferred) -3. Add `isProjectSnapshotViolation()` to `src/workspace-memory.ts` -4. Add `redactCredentials()` to `src/workspace-memory.ts` -5. Add `runMigrationP0Cleanup()` to `src/workspace-memory.ts` -6. Update `normalizeWorkspaceMemory()` with always-on redaction + migration -7. Do NOT add filtering to `getFrozenWorkspaceMemory()` - filtering happens in `enforceLongTermLimits()` -8. Add test cases for all patterns - ---- - -## What We Will NOT Do - -### Do NOT Add Project-Specific Entity Keys - -Cloud Run, Basic Auth, IP privacy — these are project-specific. Natural canonical dedup handles exact duplicates. - -### Do NOT Delete Superseded Entries - -Mark as `status: "superseded"`, preserve in storage, exclude from render. - -### Do NOT Gate Credential Redaction on Migration - -Credential redaction is always-on. Migration only marks legacy snapshots. - ---- - -## Summary - -| Issue | Priority | Solution | -|-------|----------|----------| -| Sensitive credentials | P0 | Always-on redaction | -| Snapshot violations | P0 | Migration-gated marking (project type only) | -| Wave progress not filtered | P0 | Add to extractors.ts now | -| Project-specific duplicates | N/A | Natural dedup | - -**Credential redaction runs on every normalize.** - -**Snapshot marking is one-time migration for legacy entries.** - -**Superseded entries preserved in storage, excluded from render.** - -**Wave/Sprint/Milestone filter added now, not deferred.** - ---- - -## Multilingual Scope - -### Snapshot Detection: Chinese + English Only - -Do **not** add Japanese/Korean/Spanish/French/German snapshot regexes now. - -Reasons: -- False positives silently suppress valid durable memories -- Audit evidence only shows Chinese and English pollution -- Words like "completed", "terminé", "abgeschlossen" can appear in durable process descriptions -- Extraction is always-on, so every false positive becomes permanent blind spot - -Add languages only after seeing real polluted memories in those languages. - -### Credential Redaction: Add Multilingual Labels - -For credentials, false negatives leak secrets. Add high-signal multilingual labels now. - -**Password labels:** - -```typescript -const PASSWORD_LABELS = - /password|passwd|pwd|密碼|密码|パスワード|비밀번호|contraseña|mot de passe|passwort/i; -``` - -**Username labels:** - -```typescript -const USERNAME_LABELS = - /username|user name|用戶名|用户名|ユーザー名|사용자명|usuario|utilisateur|benutzer/i; -``` - -PIN remains language-neutral: `/\bPIN\b/i` - -### Memory Trigger Patterns: Add Chinese Expansion + Japanese + Korean - -#### Chinese Expansion - -Add common phrases: - -```typescript -// Current: 记住/記住 -// Add: 记得/記得, 记下来/記下來 - -/(?:^|\n)\s*(?:请|請)?(?:帮我|幫我)?(?:记住|記住|记得|記得|记下来|記下來)(?:这一点|這一點|这点|這點|这个|這個)?[::,,]?\s*(.+)$/gim -``` - -#### Japanese Positive Triggers - -```typescript -/(?:^|\n)\s*(?:覚えておいて|覚えて|忘れないで|メモして)[::,,]?\s*(.+)$/gim -``` - -Note: `覚えておいて` must come before `覚えて` to prevent partial match in body. -Note: `忘れないで` ("don't forget") is a positive memory request despite negative morphology. - -#### Japanese Negation - -```typescript -/(?:覚えないで|記憶しないで|メモしないで)\s*$/u -``` - -#### Korean Positive Triggers - -```typescript -/(?:^|\n)\s*(?:기억해줘|기억해|잊지 마|잊지마|메모해줘|메모해)[::,,]?\s*(.+)$/gim -``` - -Note: `기억해줘` must come before `기억해`, `메모해줘` must come before `메모해` to prevent partial match in body. -Note: `잊지 마` ("don't forget") is a positive memory request despite negative morphology. - -#### Korean Negation - -```typescript -/(?:기억하지\s*마|기억하지마|메모하지\s*마|메모하지마)\s*$/u -``` - -#### Priority - -1. Chinese: `记得/記得`, `记下来/記下來` (small expansion) -2. Japanese (full patterns + negation) -3. Korean (full patterns + negation) -4. Defer: Spanish/German/French (higher collision risk with normal text) - -### Tests Required - -**Credential redaction:** - -```text -パスワード:secret → [REDACTED] -비밀번호: secret → [REDACTED] -contraseña: secret → [REDACTED] -mot de passe: secret → [REDACTED] -Passwort: secret → [REDACTED] -``` - -**Memory triggers (positive):** - -```text -记得:这个项目使用 pnpm -記下來:这个项目使用 pnpm -覚えて: このプロジェクトは pnpm を使う -覚えておいて: このプロジェクトは pnpm を使う -忘れないで: このプロジェクトは pnpm を使う -メモして: このプロジェクトは pnpm を使う -기억해: 이 프로젝트는 pnpm을 사용한다 -기억해줘: 이 프로젝트는 pnpm을 사용한다 -잊지 마: 이 프로젝트는 pnpm을 사용한다 -메모해: 이 프로젝트는 pnpm을 사용한다 -메모해줘: 이 프로젝트는 pnpm을 사용한다 -``` - -**Memory triggers (body extraction - must not include trigger suffix):** - -```text -覚えておいて: このプロジェクトは pnpm を使う -→ body is "このプロジェクトは pnpm を使う" (not "おいて: この...") - -기억해줘: 이 프로젝트는 pnpm을 사용한다 -→ body is "이 프로젝트는 pnpm을 사용한다" (not "줘: 이...") - -메모해줘: 이 프로젝트는 pnpm을 사용한다 -→ body is "이 프로젝트는 pnpm을 사용한다" (not "줘: 이...") -``` - -**Memory triggers (negation - should NOT trigger):** - -```text -覚えないで 覚えて: temporary note only -メモしないで メモして: temporary note only -기억하지 마 기억해: temporary note only -메모하지 마 메모해: temporary note only -``` - ---- - -## Memory Quality Bar (Prompt Improvement) - -### Problem - -Current extraction accepts "facts that were mentioned" instead of "facts that will change future behavior." - -Examples of low-value trivia: -- `Cloud Run revision: pre-cancer-atlas-website-00066-j8c` — transient deployment state -- `UI 要統一風格:兩個表格都要 scrollable,約 20 rows` — local implementation detail -- Paths observed from code/logs without stable contract - -### Solution: Prompt Quality Bar - -Add to compaction memory extraction prompt: - -```text -Memory quality bar: -Extract only durable facts that will change future behavior: user preferences, decisions with rationale, stable constraints, or hard-to-rediscover references. - -Do not extract trivia: transient IDs/revisions, task progress, test/file counts, bare status updates, local UI details, or facts easily rediscovered from the repo. - -When unsure, skip it. Fewer high-signal memories are better than many low-value ones. -``` - -### Example Pair (Optional) - -If model still stores junk, add one example: - -```text -Bad: Cloud Run revision: xyz-00066 -Good: Revision xyz-00066 is the last known good deploy before the auth regression. -``` - -### What This Captures - -| Keep | Reject | -|------|--------| -| User preferences | Transient IDs/revisions | -| Decisions with rationale | Task progress, test/file counts | -| Stable constraints | Bare status updates | -| Hard-to-rediscover references | Local UI details | -| | Rediscoverable facts | - -### Why Prompt Instead of Code Filters - -- Context matters: "Cloud Run revision" might be useful if framed as "last known good before regression" -- Avoid regex whack-a-mole for every trivia pattern -- Model can judge wording and context -- Easier to iterate on prompt than code - -### Code Filters (Stay Minimal) - -Keep only hard invariants: -- Credentials (security) -- Obvious snapshots (test counts, phase progress) - -Do NOT add new filters for deployment revisions, status updates, or UI trivia. Let prompt handle those. - ---- - -## Summary \ No newline at end of file diff --git a/docs/superpowers/plans/2026-04-27-memory-quality-optimization.md b/docs/superpowers/plans/2026-04-27-memory-quality-optimization.md deleted file mode 100644 index 096c237..0000000 --- a/docs/superpowers/plans/2026-04-27-memory-quality-optimization.md +++ /dev/null @@ -1,1260 +0,0 @@ -# Memory Quality Optimization Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Make workspace memory promotion and extraction safer by distinguishing promoted/absorbed/rejected pending memories, adding offline memory-quality evals, and tightening the compaction extraction prompt with concrete negative examples. - -**Architecture:** Keep the current three-layer memory architecture unchanged: frozen workspace snapshot, hot session state, and OpenCode todos. Add deterministic promotion accounting around the existing `updateWorkspaceMemory(...)` flow, add parser-level quality eval fixtures with no extra LLM calls, and make the compaction prompt more explicit about what must not become durable memory. - -**Tech Stack:** TypeScript ESM, OpenCode plugin hooks, Node `node:test`, existing JSON storage helpers, existing workspace memory normalization/dedup logic. - ---- - -## Goal - -- Prevent pending memories from staying in `SessionState.pendingMemories` or `workspace-pending-journal.json` forever when they were successfully absorbed by existing workspace memory through deduplication or supersession. -- Preserve the recent retained-keys safety fix: memories rejected by caps, pruning, or failed I/O must remain pending for retry instead of being silently lost. -- Add a repeatable memory-quality eval suite so future prompt/parser changes can be checked against durable-fact and noise examples. -- Tighten the compaction prompt so the model has concrete examples of what to extract and what to skip. - -## Non-Goals - -- Do not add semantic similarity, embeddings, or extra LLM calls. -- Do not add cross-workspace shared memory. -- Do not add manual `forget/list` commands in this P0 plan. -- Do not redesign storage layout or change workspace memory file paths. -- Do not change OpenCode core behavior. - -## Current System Map - -```text -normal chat turn - │ - ├── experimental.chat.system.transform - │ ├── promote old pending journal before first frozen snapshot - │ ├── process latest explicit user memory - │ ├── push frozen workspace memory into system[1] - │ └── push hot session state into system[2+] - │ - ├── tool.execute.after - │ ├── update active files / open errors - │ └── process latest explicit user memory once per message id - │ - └── session.compacted event - ├── parse compaction summary memory candidates - ├── append candidates to pending journal - ├── promote session pending + journal pending into workspace memory - └── clear only pending memories that are safe to clear -``` - -Current weak point: - -```text -pending memory - │ - ├── exact retained in workspace memory -> clear today - ├── absorbed by existing duplicate/topic -> NOT cleared today, can retry forever - ├── rejected by caps/pruning -> should stay pending - └── I/O failure -> should stay pending -``` - -Target behavior: - -```text -pending memory - │ - ├── promoted exact key appears after normalization -> clear - ├── absorbed same identity/topic already represented -> clear - ├── rejected no equivalent active entry survived -> keep pending - └── failed updateWorkspaceMemory threw -> keep pending -``` - -## File Structure - -### Modify - -- `src/workspace-memory.ts` - - Export a deterministic `workspaceMemoryIdentityKey(entry)` helper derived from the existing dedup/supersession identity rules. - - Reuse that helper inside `enforceLongTermLimits(...)` so promotion accounting and normalization agree on identity semantics. -- `src/plugin.ts` - - Capture workspace entries before promotion. - - Use promotion accounting to compute clearable pending keys. - - Keep rejected pending memories in session state and journal. - - Add concrete negative/positive examples to `buildCompactionPrompt(...)`. -- `tests/plugin.test.ts` - - Add regression tests for absorbed duplicate pending memories and rejected overflow pending memories. - - Add prompt test assertions for negative examples. - -### Create - -- `src/promotion-accounting.ts` - - Small pure helper for classifying pending memories as `promoted`, `absorbed`, or `rejected` after workspace normalization. -- `tests/promotion-accounting.test.ts` - - Unit tests for exact promotion, exact duplicate absorption, same-topic absorption, and rejected overflow. -- `tests/memory-quality-eval.test.ts` - - Offline eval fixtures for durable facts vs noise using `parseWorkspaceMemoryCandidates(...)`. - -### Existing verification commands - -```bash -npm test -npm run typecheck -``` - -Expected final result: both commands pass. - ---- - -## Wave 1 — Promotion Accounting - -### Task 1: Add shared workspace-memory identity keys - -**Objective:** Make dedup/supersession identity explicit and reusable. Promotion accounting must use the same identity semantics as `enforceLongTermLimits(...)`; otherwise the plugin will clear the wrong pending memories. - -**Files:** -- Modify: `src/workspace-memory.ts:338-383` -- Test: `tests/promotion-accounting.test.ts` in Task 2 - -- [ ] **Step 1: Export `workspaceMemoryIdentityKey(...)` from `src/workspace-memory.ts`** - -Add this function after `feedbackTopicKey(...)` and before `isPrunableByAge(...)`: - -```ts -export function workspaceMemoryIdentityKey(entry: Pick): string { - if (entry.type === "project" || entry.type === "reference") { - return `${entry.type}:${extractEntityKey(entry.text) ?? canonicalMemoryText(entry.text)}`; - } - - if (entry.type === "feedback") { - return `${entry.type}:${feedbackTopicKey(entry.text) ?? canonicalMemoryText(entry.text)}`; - } - - return `decision:${decisionTopicKey(entry.text) ?? canonicalMemoryText(entry.text)}`; -} -``` - -- [ ] **Step 2: Replace duplicated key construction in `enforceLongTermLimits(...)`** - -In the project/reference/feedback loop, replace: - -```ts -const entityKey = entry.type === "project" || entry.type === "reference" - ? extractEntityKey(entry.text) - : feedbackTopicKey(entry.text); -const key = entityKey ? `${entry.type}:${entityKey}` : `${entry.type}:${canonicalMemoryText(entry.text)}`; -``` - -with: - -```ts -const key = workspaceMemoryIdentityKey(entry); -const hasTopicIdentity = key !== `${entry.type}:${canonicalMemoryText(entry.text)}`; -``` - -Then replace: - -```ts -const mode = entry.type === "feedback" && entityKey ? "supersession" as const : "entity" as const; -``` - -with: - -```ts -const mode = entry.type === "feedback" && hasTopicIdentity ? "supersession" as const : "entity" as const; -``` - -In the decision loop, replace: - -```ts -const topic = decisionTopicKey(entry.text); -const key = topic ? `decision:${topic}` : `decision:${canonicalMemoryText(entry.text)}`; -``` - -with: - -```ts -const key = workspaceMemoryIdentityKey(entry); -``` - -- [ ] **Step 3: Run the existing workspace-memory tests** - -Run: - -```bash -node --test --experimental-strip-types tests/workspace-memory.test.ts -``` - -Expected: PASS. Existing dedup behavior must not change. - -**Risks and boundaries:** - -- Do not export `chooseBetterMemory(...)` unless a later task truly needs it. Identity is enough for P0 accounting. -- `workspaceMemoryIdentityKey(...)` must stay deterministic and must not depend on timestamps. -- This function intentionally mirrors current heuristic topic keys. It is not semantic similarity. - -### Task 2: Add pure promotion accounting helper - -**Objective:** Separate accounting from plugin I/O. The plugin should ask a pure function which pending keys are safe to clear after workspace normalization. - -**Files:** -- Create: `src/promotion-accounting.ts` -- Create: `tests/promotion-accounting.test.ts` - -- [ ] **Step 1: Write failing tests for promotion accounting** - -Create `tests/promotion-accounting.test.ts`: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import type { LongTermMemoryEntry } from "../src/types.ts"; -import { accountPendingPromotions } from "../src/promotion-accounting.ts"; -import { memoryKey } from "../src/pending-journal.ts"; - -function mem( - id: string, - text: string, - opts: Partial = {}, -): LongTermMemoryEntry { - const now = opts.createdAt ?? new Date().toISOString(); - return { - id, - type: opts.type ?? "decision", - text, - source: opts.source ?? "compaction", - confidence: opts.confidence ?? 0.75, - status: opts.status ?? "active", - createdAt: now, - updatedAt: opts.updatedAt ?? now, - staleAfterDays: opts.staleAfterDays, - rationale: opts.rationale, - supersedes: opts.supersedes, - tags: opts.tags, - }; -} - -test("accountPendingPromotions marks exact retained pending memory as promoted", () => { - const pending = [mem("pending", "Use frozen rendered snapshots for cache stability.")]; - const before: LongTermMemoryEntry[] = []; - const after = [pending[0]]; - - const result = accountPendingPromotions({ pending, before, after }); - - assert.deepEqual([...result.promotedKeys], [memoryKey(pending[0])]); - assert.equal(result.absorbedKeys.size, 0); - assert.equal(result.rejectedKeys.size, 0); - assert.deepEqual([...result.clearableKeys], [memoryKey(pending[0])]); -}); - -test("accountPendingPromotions marks exact duplicate already represented before promotion as absorbed", () => { - const existing = mem("existing", "Prefer stable cache boundaries.", { source: "explicit" }); - const pending = [mem("pending", "prefer stable cache boundaries.", { source: "explicit" })]; - const before = [existing]; - const after = [existing]; - - const result = accountPendingPromotions({ pending, before, after }); - - assert.equal(result.promotedKeys.size, 0); - assert.deepEqual([...result.absorbedKeys], [memoryKey(pending[0])]); - assert.equal(result.rejectedKeys.size, 0); - assert.deepEqual([...result.clearableKeys], [memoryKey(pending[0])]); -}); - -test("accountPendingPromotions marks same exact key present before promotion as absorbed, not promoted", () => { - const existing = mem("existing", "Use stable cache boundaries.", { source: "explicit" }); - const pending = [mem("pending", "Use stable cache boundaries.", { source: "explicit" })]; - const before = [existing]; - const after = [existing]; - - const result = accountPendingPromotions({ pending, before, after }); - - assert.equal(result.promotedKeys.size, 0, - "a pending memory whose exact key already existed before promotion is absorbed, not newly promoted"); - assert.deepEqual([...result.absorbedKeys], [memoryKey(pending[0])]); - assert.equal(result.rejectedKeys.size, 0); -}); - -test("accountPendingPromotions marks same-topic decision represented after normalization as absorbed", () => { - const existing = mem("existing", "Parser supports 2 candidate formats.", { - type: "decision", - source: "compaction", - confidence: 0.9, - createdAt: "2026-04-27T10:00:00.000Z", - updatedAt: "2026-04-27T10:00:00.000Z", - }); - const pending = [mem("pending", "Parser supports 3 candidate formats.", { - type: "decision", - source: "compaction", - confidence: 0.75, - createdAt: "2026-04-27T09:00:00.000Z", - updatedAt: "2026-04-27T09:00:00.000Z", - })]; - const before = [existing]; - const after = [existing]; - - const result = accountPendingPromotions({ pending, before, after }); - - assert.equal(result.promotedKeys.size, 0); - assert.deepEqual([...result.absorbedKeys], [memoryKey(pending[0])]); - assert.equal(result.rejectedKeys.size, 0); -}); - -test("accountPendingPromotions keeps pending memory rejected when no equivalent survived", () => { - const pending = [mem("pending", "Low priority memory that did not fit the workspace cap.", { - type: "reference", - source: "compaction", - })]; - const before: LongTermMemoryEntry[] = []; - const after: LongTermMemoryEntry[] = []; - - const result = accountPendingPromotions({ pending, before, after }); - - assert.equal(result.promotedKeys.size, 0); - assert.equal(result.absorbedKeys.size, 0); - assert.deepEqual([...result.rejectedKeys], [memoryKey(pending[0])]); - assert.equal(result.clearableKeys.size, 0); -}); -``` - -- [ ] **Step 2: Run the new test and verify it fails** - -Run: - -```bash -node --test --experimental-strip-types tests/promotion-accounting.test.ts -``` - -Expected: FAIL with module not found for `../src/promotion-accounting.ts`. - -- [ ] **Step 3: Implement `src/promotion-accounting.ts`** - -Create `src/promotion-accounting.ts`: - -```ts -import type { LongTermMemoryEntry } from "./types.ts"; -import { memoryKey } from "./pending-journal.ts"; -import { workspaceMemoryIdentityKey } from "./workspace-memory.ts"; - -export type PendingPromotionAccounting = { - promotedKeys: Set; - absorbedKeys: Set; - rejectedKeys: Set; - clearableKeys: Set; -}; - -export function accountPendingPromotions(input: { - pending: LongTermMemoryEntry[]; - before: LongTermMemoryEntry[]; - after: LongTermMemoryEntry[]; -}): PendingPromotionAccounting { - const beforeExactKeys = new Set(input.before.map(entry => memoryKey(entry))); - const afterExactKeys = new Set(input.after.map(entry => memoryKey(entry))); - const afterIdentityKeys = new Set(input.after.map(entry => workspaceMemoryIdentityKey(entry))); - - const promotedKeys = new Set(); - const absorbedKeys = new Set(); - const rejectedKeys = new Set(); - - for (const memory of input.pending) { - const key = memoryKey(memory); - const identityKey = workspaceMemoryIdentityKey(memory); - - if (beforeExactKeys.has(key)) { - absorbedKeys.add(key); - continue; - } - - if (afterExactKeys.has(key)) { - promotedKeys.add(key); - continue; - } - - if (afterIdentityKeys.has(identityKey)) { - absorbedKeys.add(key); - continue; - } - - rejectedKeys.add(key); - } - - return { - promotedKeys, - absorbedKeys, - rejectedKeys, - clearableKeys: new Set([...promotedKeys, ...absorbedKeys]), - }; -} -``` - -- [ ] **Step 4: Run the new accounting tests** - -Run: - -```bash -node --test --experimental-strip-types tests/promotion-accounting.test.ts -``` - -Expected: PASS. - -**Risks and boundaries:** - -- The helper deliberately works with keys, not entry ids. Pending journal dedup already uses `memoryKey(...)`, so clearing by key preserves current storage semantics. -- If the exact pending key already existed before promotion, classify it as absorbed, not promoted. It is safe to clear, but the classification must not imply a new write occurred. -- If `after` contains the same identity but no exact key, that means normalization kept a representative memory. For P0, this is treated as absorbed and safe to clear. -- Do not classify failed writes here. Failures are represented by `updateWorkspaceMemory(...)` throwing before this helper is called. - -### Task 3: Wire promotion accounting into `promotePendingMemories(...)` - -**Objective:** Clear pending memories when they were either exactly promoted or absorbed by an equivalent workspace memory. Keep rejected memories pending. - -**Files:** -- Modify: `src/plugin.ts:34-45`, `src/plugin.ts:222-261` -- Modify: `tests/plugin.test.ts` - -- [ ] **Step 1: Add failing plugin regression tests** - -Append these tests to `tests/plugin.test.ts`: - -These snippets use imports that already exist in the current `tests/plugin.test.ts`: `mkdir`, `mkdtemp`, `rm`, `dirname`, `join`, `workspaceMemoryPath`, `workspacePendingJournalPath`, `loadPendingJournal`, `savePendingJournal`, `updateWorkspaceMemory`, `loadSessionState`, `saveSessionState`, and `MemoryV2Plugin`. If `promotion failure does not clear pending memories in session or journal` already exists, keep the existing test and verify it still passes instead of adding a duplicate with the same body. - -```ts -test("session.compacted clears pending memory absorbed by existing workspace duplicate", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const now = new Date().toISOString(); - await updateWorkspaceMemory(tmpDir, store => { - store.entries.push({ - id: "mem_existing_duplicate", - type: "decision", - text: "Prefer stable cache boundaries.", - source: "explicit", - confidence: 1, - status: "active", - createdAt: now, - updatedAt: now, - }); - return store; - }); - - await saveSessionState(tmpDir, { - version: 1, - sessionID: "absorbed-duplicate-session", - turn: 0, - updatedAt: now, - activeFiles: [], - openErrors: [], - recentDecisions: [], - pendingMemories: [{ - id: "mem_pending_duplicate", - type: "decision", - text: "prefer stable cache boundaries.", - source: "explicit", - confidence: 1, - status: "active", - createdAt: now, - updatedAt: now, - }], - }); - - const plugin = await MemoryV2Plugin({ directory: tmpDir, client: mockRootClient() }); - await (plugin as Record)["event"]({ - event: { type: "session.compacted", properties: { sessionID: "absorbed-duplicate-session" } }, - }); - - const state = await loadSessionState(tmpDir, "absorbed-duplicate-session"); - assert.equal(state.pendingMemories.length, 0, - "duplicate pending memory should be cleared after it is absorbed by existing workspace memory"); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); - -test("session.compacted keeps pending memory rejected by workspace entry cap", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const now = new Date().toISOString(); - await updateWorkspaceMemory(tmpDir, store => { - for (let i = 0; i < 28; i += 1) { - store.entries.push({ - id: `mem_high_${i}`, - type: "feedback", - text: `High priority user feedback memory ${i} that should outrank low priority references.`, - source: "explicit", - confidence: 1, - status: "active", - createdAt: now, - updatedAt: now, - }); - } - return store; - }); - - await saveSessionState(tmpDir, { - version: 1, - sessionID: "rejected-cap-session", - turn: 0, - updatedAt: now, - activeFiles: [], - openErrors: [], - recentDecisions: [], - pendingMemories: [{ - id: "mem_low_priority_reference", - type: "reference", - text: "Low priority reference memory that should not fit when the workspace cap is full.", - source: "compaction", - confidence: 0.1, - status: "active", - createdAt: now, - updatedAt: now, - }], - }); - - const plugin = await MemoryV2Plugin({ directory: tmpDir, client: mockRootClient() }); - await (plugin as Record)["event"]({ - event: { type: "session.compacted", properties: { sessionID: "rejected-cap-session" } }, - }); - - const state = await loadSessionState(tmpDir, "rejected-cap-session"); - assert.equal(state.pendingMemories.length, 1, - "pending memory rejected by workspace cap should remain pending for retry"); - assert.match(state.pendingMemories[0].text, /Low priority reference/); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); - -test("session.compacted keeps pending memories when all rejected by workspace cap", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const now = new Date().toISOString(); - await updateWorkspaceMemory(tmpDir, store => { - for (let i = 0; i < 28; i += 1) { - store.entries.push({ - id: `mem_high_all_rejected_${i}`, - type: "feedback", - text: `Pinned high priority feedback ${i} that keeps the workspace entry cap full.`, - source: "explicit", - confidence: 1, - status: "active", - createdAt: now, - updatedAt: now, - }); - } - return store; - }); - - await saveSessionState(tmpDir, { - version: 1, - sessionID: "all-rejected-session", - turn: 0, - updatedAt: now, - activeFiles: [], - openErrors: [], - recentDecisions: [], - pendingMemories: [{ - id: "mem_session_rejected", - type: "reference", - text: "Session pending reference should remain when every pending memory is rejected by cap.", - source: "compaction", - confidence: 0.1, - status: "active", - createdAt: now, - updatedAt: now, - }], - }); - - const journal = await loadPendingJournal(tmpDir); - journal.entries = [{ - id: "mem_journal_rejected_other_session", - type: "reference", - text: "Journal pending reference from another session should not be cleared by an empty clearable set.", - source: "compaction", - confidence: 0.1, - status: "active", - createdAt: now, - updatedAt: now, - }]; - await savePendingJournal(tmpDir, journal); - - const plugin = await MemoryV2Plugin({ directory: tmpDir, client: mockRootClient() }); - await (plugin as Record)["event"]({ - event: { type: "session.compacted", properties: { sessionID: "all-rejected-session" } }, - }); - - const state = await loadSessionState(tmpDir, "all-rejected-session"); - assert.equal(state.pendingMemories.length, 1, - "session pending memory must remain when all pending memories are rejected"); - - const pendingAfter = await loadPendingJournal(tmpDir); - assert.equal(pendingAfter.entries.length, 1, - "journal pending memories must not be cleared when accounting.clearableKeys is empty"); - assert.match(pendingAfter.entries[0].text, /another session/); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); - -test("promotion failure does not clear pending memories in session or journal", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const now = new Date().toISOString(); - await saveSessionState(tmpDir, { - version: 1, - sessionID: "failure-session", - turn: 0, - updatedAt: now, - activeFiles: [], - openErrors: [], - recentDecisions: [], - pendingMemories: [{ - id: "mem_pending_failure", - type: "decision", - text: "Keep pending when promotion fails", - source: "explicit", - confidence: 1, - status: "active", - createdAt: now, - updatedAt: now, - }], - }); - - const journalPath = await workspacePendingJournalPath(tmpDir); - await mkdir(dirname(journalPath), { recursive: true }); - const journal = await loadPendingJournal(tmpDir); - journal.entries = [{ - id: "mem_pending_failure_journal", - type: "decision", - text: "Keep pending when promotion fails", - source: "explicit", - confidence: 1, - status: "active", - createdAt: now, - updatedAt: now, - }]; - await savePendingJournal(tmpDir, journal); - - const wmPath = await workspaceMemoryPath(tmpDir); - await rm(wmPath, { force: true }).catch(() => undefined); - await mkdir(wmPath, { recursive: true }); - - const plugin = await MemoryV2Plugin({ directory: tmpDir, client: mockRootClient() }); - let didThrow = false; - try { - await (plugin as Record)["event"]({ - event: { type: "session.compacted", properties: { sessionID: "failure-session" } }, - }); - } catch { - didThrow = true; - } - - assert.equal(didThrow, false, - "promotion failure should not throw from session.compacted handler"); - const state = await loadSessionState(tmpDir, "failure-session"); - assert.equal(state.pendingMemories.length, 1, - "session pending memories should remain when promotion fails"); - const pendingAfter = await loadPendingJournal(tmpDir); - assert.equal(pendingAfter.entries.length, 1, - "journal pending memories should remain when promotion fails"); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -- [ ] **Step 2: Run the failing tests** - -Run: - -```bash -node --test --experimental-strip-types tests/plugin.test.ts -``` - -Expected: FAIL on at least these two regressions before implementation: - -- `session.compacted clears pending memory absorbed by existing workspace duplicate`, because absorbed duplicate pending memory is not cleared yet. -- `session.compacted keeps pending memories when all rejected by workspace cap`, because calling `clearPendingMemories(directory, emptySet)` currently clears the whole pending journal. - -- [ ] **Step 3: Import accounting helper in `src/plugin.ts`** - -Add near the existing imports: - -```ts -import { accountPendingPromotions } from "./promotion-accounting.ts"; -``` - -- [ ] **Step 4: Capture `beforeEntries` and compute clearable keys** - -Replace the current `promotePendingMemories(...)` body from the `updateWorkspaceMemory(...)` call through clearing logic with this structure: - -```ts - let beforeEntries: Awaited>["entries"] = []; - - const updatedWorkspaceMemory = await updateWorkspaceMemory(directory, workspaceMemory => { - beforeEntries = [...workspaceMemory.entries]; - const existingKeys = new Set(workspaceMemory.entries.map(memory => memoryKey(memory))); - - for (const memory of pending) { - const key = memoryKey(memory); - if (!existingKeys.has(key)) { - workspaceMemory.entries.push(memory); - existingKeys.add(key); - } - } - - return workspaceMemory; - }); - - const accounting = accountPendingPromotions({ - pending, - before: beforeEntries, - after: updatedWorkspaceMemory.entries, - }); - - if (sessionID) { - await updateSessionState(directory, sessionID, state => { - state.pendingMemories = state.pendingMemories.filter(memory => !accounting.clearableKeys.has(memoryKey(memory))); - return state; - }); - clearFrozenWorkspaceMemoryCache(sessionID); - } - - if (accounting.clearableKeys.size > 0) { - await clearPendingMemories(directory, accounting.clearableKeys); - } -``` - -Keep the existing `try/catch` behavior in event handlers unchanged. If `updateWorkspaceMemory(...)` throws, this code is never reached, so nothing is cleared. - -- [ ] **Step 5: Run plugin tests** - -Run: - -```bash -node --test --experimental-strip-types tests/plugin.test.ts -``` - -Expected: PASS. - -**Risks and boundaries:** - -- Do not call `clearPendingMemories(...)` with no argument or an empty set in this flow. Its current contract treats `undefined` and `size === 0` as "clear all pending journal entries". -- The `if (accounting.clearableKeys.size > 0)` guard is data-loss protection. Do not remove it unless `clearPendingMemories(...)` gets a new contract. -- `beforeEntries` must be captured inside the `updateWorkspaceMemory(...)` callback after the store has been loaded and normalized. -- Keep cache invalidation only for the session being compacted/deleted. Do not clear all session caches. - -### Wave 1 verification checkpoint - -- [ ] **Step 1: Run targeted tests** - -Run: - -```bash -node --test --experimental-strip-types tests/promotion-accounting.test.ts tests/plugin.test.ts tests/workspace-memory.test.ts -``` - -Expected: PASS. - -Confirm these regression tests are present and passing: - -- `session.compacted clears pending memory absorbed by existing workspace duplicate` -- `session.compacted keeps pending memory rejected by workspace entry cap` -- `session.compacted keeps pending memories when all rejected by workspace cap` -- `promotion failure does not clear pending memories in session or journal` - -- [ ] **Step 2: Run full verification** - -Run: - -```bash -npm test -npm run typecheck -``` - -Expected: PASS. - -- [ ] **Step 3: Commit Wave 1** - -```bash -git add src/workspace-memory.ts src/promotion-accounting.ts src/plugin.ts tests/promotion-accounting.test.ts tests/plugin.test.ts -git commit -m "fix: account for absorbed pending memories" -``` - ---- - -## Wave 2 — Memory Quality Eval - -### Task 4: Add offline memory-quality fixture evals - -**Objective:** Create a zero-API-call quality gate for memory extraction. This catches regressions where parser/gate changes start accepting noisy compaction facts or rejecting useful durable facts. - -**Files:** -- Create: `tests/memory-quality-eval.test.ts` -- Modify only if tests expose a real gap: `src/extractors.ts:229-275` - -- [ ] **Step 1: Create fixture eval test file** - -Create `tests/memory-quality-eval.test.ts`: - -```ts -import test from "node:test"; -import assert from "node:assert/strict"; -import { parseWorkspaceMemoryCandidates } from "../src/extractors.ts"; - -const acceptedCases = [ - { - name: "durable user language preference", - line: "- [feedback] User prefers architecture reviews in Traditional Chinese", - expectedType: "feedback", - expectedText: /Traditional Chinese/, - }, - { - name: "stable cache architecture decision", - line: "- [decision] Use frozen workspace memory snapshots plus ephemeral hot state for cache stability", - expectedType: "decision", - expectedText: /frozen workspace memory/, - }, - { - name: "stable zero API call constraint", - line: "- [project] The plugin piggybacks memory extraction on OpenCode compaction and should not add extra LLM calls", - expectedType: "project", - expectedText: /extra LLM calls/, - }, - { - name: "hard to rediscover reference", - line: "- [reference] Workspace memory uses a frozen system[1] snapshot and pending memories remain in hot session state until compaction", - expectedType: "reference", - expectedText: /system\[1\]/, - }, - { - name: "short stable config reference", - line: "- [reference] Config parser supports bracketless format", - expectedType: "reference", - expectedText: /bracketless/, - }, -] as const; - -const rejectedCases = [ - { - name: "test count snapshot", - line: "- [project] 42 tests passed after the latest implementation", - }, - { - name: "suite count snapshot", - line: "- [project] 3 suites pass and 0 suites fail right now", - }, - { - name: "phase progress snapshot", - line: "- [project] Wave 2 completed successfully", - }, - { - name: "commit hash", - line: "- [reference] Commit 4309cb8 contains the promotion accounting fix", - }, - { - name: "raw transient error", - line: "- [feedback] TypeError: Cannot read properties of undefined", - }, - { - name: "path heavy rediscoverable fact", - line: "- [project] Important files are /src/plugin.ts /src/workspace-memory.ts /src/session-state.ts", - }, - { - name: "temporary pending task", - line: "- [decision] currently: run npm test before the next reply", - }, -] as const; - -for (const item of acceptedCases) { - test(`memory quality accepts ${item.name}`, () => { - const summary = ` -Memory candidates: -${item.line} -`; - const entries = parseWorkspaceMemoryCandidates(summary); - - assert.equal(entries.length, 1); - assert.equal(entries[0].type, item.expectedType); - assert.match(entries[0].text, item.expectedText); - }); -} - -for (const item of rejectedCases) { - test(`memory quality rejects ${item.name}`, () => { - const summary = ` -Memory candidates: -${item.line} -`; - const entries = parseWorkspaceMemoryCandidates(summary); - - assert.equal(entries.length, 0); - }); -} -``` - -- [ ] **Step 2: Run the eval test** - -Run: - -```bash -node --test --experimental-strip-types tests/memory-quality-eval.test.ts -``` - -Expected: likely FAIL if any new rejected case exposes a missing quality gate. - -- [ ] **Step 3: Tighten `shouldAcceptWorkspaceMemoryCandidate(...)` only for failing rejected cases** - -If the `temporary pending task` case fails, add this near the existing temporary progress checks in `src/extractors.ts:259-260`: - -```ts - if (/^(currently|now|pending|in progress|todo|wip)\b[::]?/i.test(text)) return false; -``` - -If the `commit hash` case fails because uppercase `Commit 4309cb8` is accepted, replace: - -```ts - if (/\b[0-9a-f]{7,40}\b/.test(text)) return false; -``` - -with: - -```ts - if (/\b[0-9a-f]{7,40}\b/i.test(text)) return false; -``` - -If all cases pass, do not change `src/extractors.ts`. - -- [ ] **Step 4: Re-run eval and extractor tests** - -Run: - -```bash -node --test --experimental-strip-types tests/memory-quality-eval.test.ts tests/extractors.test.ts -``` - -Expected: PASS. - -**Risks and boundaries:** - -- These are parser/gate evals, not live LLM evals. That is intentional for P0 because it preserves zero extra API calls. -- Do not over-tighten length limits. Short stable config references are already special-cased. -- If a rejected fixture seems useful, change the fixture, not the product, but document why in the test name. - -### Wave 2 verification checkpoint - -- [ ] **Step 1: Run targeted quality tests** - -Run: - -```bash -node --test --experimental-strip-types tests/memory-quality-eval.test.ts tests/extractors.test.ts -``` - -Expected: PASS. - -- [ ] **Step 2: Run full verification** - -Run: - -```bash -npm test -npm run typecheck -``` - -Expected: PASS. - -- [ ] **Step 3: Commit Wave 2** - -```bash -git add tests/memory-quality-eval.test.ts src/extractors.ts -git commit -m "test: add memory quality eval fixtures" -``` - -If `src/extractors.ts` did not change, use: - -```bash -git add tests/memory-quality-eval.test.ts -git commit -m "test: add memory quality eval fixtures" -``` - ---- - -## Wave 3 — Compaction Prompt Negative Examples - -### Task 5: Tighten compaction extraction prompt - -**Objective:** Reduce memory pollution at the source. The prompt already says to skip noise, but concrete examples give the model a sharper boundary between durable facts and session snapshots. - -**Files:** -- Modify: `src/plugin.ts:107-130` -- Modify: `tests/plugin.test.ts:221-292` - -- [ ] **Step 1: Add failing prompt assertions** - -In `tests/plugin.test.ts`, inside `test("compaction hook sets output.prompt with ---free template", ...)`, after the existing memory candidate assertions, add: - -```ts - assert.equal(prompt!.includes("Good memory examples:"), true, - "Prompt should include concrete positive memory examples"); - assert.equal(prompt!.includes("Bad memory examples to skip:"), true, - "Prompt should include concrete negative memory examples"); - assert.equal(prompt!.includes("42 tests passed"), true, - "Prompt should explicitly reject test-count snapshots"); - assert.equal(prompt!.includes("commit 4309cb8"), true, - "Prompt should explicitly reject commit-hash snapshots"); -``` - -- [ ] **Step 2: Run the prompt test and verify it fails** - -Run: - -```bash -node --test --experimental-strip-types tests/plugin.test.ts -``` - -Expected: FAIL because the prompt does not yet include these exact example headings/strings. - -- [ ] **Step 3: Update `buildCompactionPrompt(...)` in `src/plugin.ts`** - -Insert the following lines after the existing line: - -```ts -"When unsure, skip it. Fewer high-signal memories are better than many low-value ones.", -``` - -Add: - -```ts -"", -"Good memory examples:", -"- [feedback] User prefers architecture reviews in Traditional Chinese.", -"- [decision] Use frozen workspace memory snapshots plus ephemeral hot state for cache stability.", -"- [project] The plugin should piggyback memory extraction on OpenCode compaction and avoid extra LLM calls.", -"- [reference] Workspace memory appears in frozen system[1]; pending memories appear in hot session state until compaction.", -"", -"Bad memory examples to skip:", -"- 42 tests passed.", -"- Wave 2 completed successfully.", -"- Modified 5 files.", -"- commit 4309cb8 contains the latest fix.", -"- TypeError: Cannot read properties of undefined.", -"- Currently running npm test.", -"", -"A memory should still be useful if a new agent opens this workspace next week.", -``` - -Keep the existing candidate format section unchanged: - -```ts -"Memory candidates:", -"- [feedback] content", -"- [project] content", -"- [decision] content", -"- [reference] content", -``` - -- [ ] **Step 4: Run the prompt test** - -Run: - -```bash -node --test --experimental-strip-types tests/plugin.test.ts -``` - -Expected: PASS. - -**Risks and boundaries:** - -- Do not change the candidate output format in this wave. Parser compatibility depends on it. -- Keep examples short. The compaction prompt grows every compaction, so avoid turning this into a long policy document. -- Do not include secrets or real private project details in examples. - -### Wave 3 verification checkpoint - -- [ ] **Step 1: Run prompt and quality tests** - -Run: - -```bash -node --test --experimental-strip-types tests/plugin.test.ts tests/memory-quality-eval.test.ts -``` - -Expected: PASS. - -- [ ] **Step 2: Run full verification** - -Run: - -```bash -npm test -npm run typecheck -``` - -Expected: PASS. - -- [ ] **Step 3: Commit Wave 3** - -```bash -git add src/plugin.ts tests/plugin.test.ts -git commit -m "docs: sharpen compaction memory extraction prompt" -``` - ---- - -## Final Verification - -- [ ] **Step 1: Run all tests** - -```bash -npm test -``` - -Expected: all `tests/*.test.ts` pass. - -- [ ] **Step 2: Run typecheck** - -```bash -npm run typecheck -``` - -Expected: TypeScript exits with code 0. - -- [ ] **Step 3: Inspect changed files** - -```bash -git diff --stat -git diff -- src/plugin.ts src/workspace-memory.ts src/promotion-accounting.ts src/extractors.ts tests/plugin.test.ts tests/promotion-accounting.test.ts tests/memory-quality-eval.test.ts -``` - -Expected: - -- `src/plugin.ts` only changes promotion accounting and prompt examples. -- `src/workspace-memory.ts` only exports/reuses identity-key logic. -- `src/promotion-accounting.ts` is pure and has no filesystem or OpenCode client dependency. -- `tests/memory-quality-eval.test.ts` has no live API dependency. - -- [ ] **Step 4: Manual sanity check for no data-loss regression** - -Use the existing tests as the source of truth: - -```bash -node --test --experimental-strip-types tests/plugin.test.ts -``` - -Confirm these existing behaviors still pass: - -- `promotion failure does not clear pending memories in session or journal` -- `session.compacted promotes pending memories to workspace memory and clears pending list` -- `compaction intentionally refreshes frozen system[1] with promoted memories` - -## Test Coverage Diagram - -```text -PROMOTION ACCOUNTING -==================== -[+] src/promotion-accounting.ts - │ - └── accountPendingPromotions() - ├── [★★★ TESTED] exact pending retained after normalization -> promoted -> clear - ├── [★★★ TESTED] exact duplicate represented before/after -> absorbed -> clear - ├── [★★★ TESTED] exact key present before promotion -> absorbed, not promoted -> clear - ├── [★★★ TESTED] same-topic represented after normalization -> absorbed -> clear - └── [★★★ TESTED] no equivalent survived -> rejected -> keep pending - -[+] src/plugin.ts promotePendingMemories() - │ - ├── [★★★ TESTED] normal promotion clears pending - ├── [★★★ TESTED] absorbed duplicate clears pending - ├── [★★★ TESTED] cap-rejected memory stays pending - ├── [★★★ TESTED] all rejected with empty clearable set preserves pending journal - └── [★★★ TESTED] update failure keeps session + journal pending - -MEMORY QUALITY -============= -[+] src/extractors.ts parseWorkspaceMemoryCandidates() - │ - ├── [★★★ TESTED] accepts durable user preference - ├── [★★★ TESTED] accepts stable architecture decision - ├── [★★★ TESTED] accepts stable project constraint - ├── [★★★ TESTED] accepts hard to rediscover reference - ├── [★★★ TESTED] accepts short stable config reference - ├── [★★★ TESTED] rejects test/suite counts - ├── [★★★ TESTED] rejects phase/wave progress - ├── [★★★ TESTED] rejects commit hashes - ├── [★★★ TESTED] rejects raw errors - ├── [★★★ TESTED] rejects path-heavy rediscoverable facts - └── [★★★ TESTED] rejects temporary pending task - -COMPACTION PROMPT -================= -[+] src/plugin.ts buildCompactionPrompt() - │ - ├── [★★★ TESTED] preserves ---free formatting rules - ├── [★★★ TESTED] includes candidate output format - ├── [★★★ TESTED] includes positive examples - └── [★★★ TESTED] includes negative examples -``` - -## Risks and Edge Cases - -### Risk: clearing absorbed memories too aggressively - -If `workspaceMemoryIdentityKey(...)` is too broad, pending memory could be cleared even though the surviving workspace memory is not actually equivalent. - -Mitigation: - -- P0 uses only existing narrow topic heuristics and canonical text. -- No semantic similarity is introduced. -- Tests cover exact duplicate and known same-topic behavior only. - -### Risk: rejected memories retry forever - -This is already possible today. This plan intentionally keeps cap-rejected memories pending to avoid data loss. - -Mitigation: - -- Keep this behavior for P0. -- P1 should add manual forget/list and better rejected-state visibility. - -### Risk: prompt gets too long - -The prompt examples add roughly 15 short lines. - -Mitigation: - -- Keep examples compact. -- Do not add rationale/topic schema in P0. -- Do not add a long policy section. - -### Risk: offline eval gives false confidence - -The eval verifies parser/gate behavior, not live model behavior. - -Mitigation: - -- Name it an offline eval. -- Use it as a regression guard, not a quality guarantee. -- Future P1/P2 can add live evals if cost is acceptable. - -## Future Work, Not in This Plan - -- Candidate rationale parsing: use the existing `LongTermMemoryEntry.rationale` field and parse `Why: ...` from candidates. -- Topic-aware dedup: add optional model-supplied `topic="..."` without embeddings. -- Manual `memory list` and `memory forget` operations. -- Superseded storage cap for old inactive entries. -- Live LLM eval suite for extraction prompt quality. - -## Completion Criteria - -This plan is complete when: - -- Pending memories have explicit promoted/absorbed/rejected accounting. -- Absorbed duplicates no longer remain stuck in session state or the pending journal. -- Rejected/cap-dropped pending memories remain pending. -- Memory quality fixtures run without live API calls. -- The compaction prompt includes concrete good/bad memory examples. -- `npm test` passes. -- `npm run typecheck` passes. diff --git a/docs/superpowers/plans/2026-04-27-workspace-memory-cache-optimization.md b/docs/superpowers/plans/2026-04-27-workspace-memory-cache-optimization.md deleted file mode 100644 index 687bace..0000000 --- a/docs/superpowers/plans/2026-04-27-workspace-memory-cache-optimization.md +++ /dev/null @@ -1,1230 +0,0 @@ -# Workspace Memory Cache Optimization Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Keep OpenCode's cache-controlled system prefix stable during a session by freezing rendered workspace memory and moving dynamic hot state into an uncached ephemeral segment. - -**Architecture:** Split memory context into three layers: `system[0]` base agent header, `system[1]` frozen rendered workspace snapshot, and `system[2+]` dynamic ephemeral state. Mid-session explicit memory writes become pending deltas and are promoted to long-term workspace memory during compaction, so the running prompt remains stable. - -**Tech Stack:** TypeScript ESM, OpenCode plugin hooks, Node `node:test`, OpenCode provider transform cache-control behavior. - ---- - -## Goal - -- Make workspace memory behave like Hermes' frozen snapshot pattern: loaded and rendered once at session start, then immutable for the running session. -- Preserve Claude Code-style cache locality by separating stable facts from dynamic execution state. -- Use OpenCode's existing two cached system-message policy so only stable messages receive cache control. -- Expected outcome: in a typical 10-turn tool-heavy session, the cache-controlled system prefix should remain stable after the first request, improving effective cache reuse from roughly 30-45% to roughly 80-85% of reusable prompt bytes. - -## Background - -Current code injects both workspace memory and hot session state in the same hook: - -- `src/plugin.ts:264-291` registers `experimental.chat.system.transform`. -- `src/plugin.ts:274-275` loads frozen workspace memory through `getFrozenWorkspaceMemory(...)`. -- `src/plugin.ts:277-289` loads and injects hot session state on every chat turn. -- `src/plugin.ts:294-341` updates hot session state after every tool call. - -OpenCode then collapses plugin-added system messages: - -- `/Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts:114-119` joins all system messages after `system[0]` into `system[1]` when the header is unchanged. - -Provider cache control is applied later: - -- `/Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts:192-194` selects the first two system messages and last two non-system messages. -- `/Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts:217-238` adds provider-specific ephemeral cache-control metadata. -- `/Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts:281-295` calls `applyCaching(...)` for Anthropic/Claude-like providers. - -Because hot state is updated after each tool call, and OpenCode currently merges it into `system[1]`, the second cache-controlled system message changes frequently. Workspace memory is less dynamic, but it is cached as a store reference rather than as a rendered prompt, and explicit memory handling can mutate the cached store mid-session: - -- `src/plugin.ts:160-167` stores the frozen workspace cache entry. -- `src/plugin.ts:238-252` returns the same cached store for a session. -- `src/plugin.ts:172-193` processes explicit user memory and currently updates workspace memory plus the cached store. -- `src/workspace-memory.ts:430-464` renders workspace memory. -- `src/workspace-memory.ts:468` uses `Date.now()` to compute age markers, so render output can drift over long sessions even when the store does not change. - -## Proposed Changes - -### Change 1: Frozen Rendered Snapshot - -- **What:** Replace the session cache's mutable workspace-memory `store` usage with a session-frozen rendered snapshot string. The plugin should render workspace memory once per session and reuse that exact string for all later turns. -- **Why:** A rendered prompt string is the actual cache-key input. Freezing the store reference is not enough because rendering can depend on wall-clock time and explicit memory code can mutate the cache. -- **How (code reference):** - - Modify `src/plugin.ts:160-167` so cache entries include `renderedPrompt: string` and `storeLoadedAt: number`. - - Replace `getFrozenWorkspaceMemory(...)` at `src/plugin.ts:238-252` with `getFrozenWorkspaceMemorySnapshot(...)` returning `{ store, renderedPrompt }`. - - In `src/plugin.ts:274-284`, push `snapshot.renderedPrompt` instead of re-rendering with `renderWorkspaceMemory(...)` on each turn. - - In `src/plugin.ts:374-379`, use the same frozen rendered prompt when building compaction context. - - Keep `src/workspace-memory.ts:430-464` as the renderer, but call it only during snapshot creation for normal chat turns. -- **Files to modify:** - - `src/plugin.ts` - - `tests/plugin.test.ts` - -### Change 2: Ephemeral System Segment - -- **What:** Keep hot session state in `system[2+]` instead of allowing it to merge into cache-controlled `system[1]`. -- **Why:** OpenCode only applies cache control to the first two system messages. If `system[0]` is the base header and `system[1]` is frozen workspace memory, then `system[2+]` becomes a natural Hermes-like `ephemeral_system_prompt` segment without changing provider transforms. -- **How (code reference):** - - Preserve plugin push order in `src/plugin.ts:280-290`: workspace snapshot first, hot state second. - - Modify `/Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts:114-119` so it does not join dynamic `system[2+]` messages into `system[1]`. - - Keep `/Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts:192-194` unchanged: `slice(0, 2)` should continue to cache only stable system messages. -- **Files to modify:** - - `/Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts` - - Add or update OpenCode-side tests near the LLM/system-message transformation tests if present in `opencode-clone`. - - `tests/plugin.test.ts` for plugin-side message ordering. - -### Change 3: Pending Delta Promotion - -- **What:** Change explicit user memory handling from immediate workspace-memory mutation to session-local pending deltas, then promote those deltas during compaction. -- **Why:** Mid-session writes should update durable state on disk or pending session state, but they must not mutate the running frozen prompt. This matches Hermes' "mid-session writes update disk but do not mutate running prompt" behavior while preserving user intent in the current session through an ephemeral delta. -- **How (code reference):** - - Extend `SessionState` in `src/types.ts:64-72` with `pendingMemories: LongTermMemoryEntry[]`. - - Update `createEmptySessionState(...)`, `loadSessionState(...)`, `updateSessionState(...)`, and `normalizeSessionState(...)` in `src/session-state.ts:14-60` to initialize, normalize, and cap pending memories. - - Update `renderHotSessionState(...)` in `src/session-state.ts:174-208` to include `pending_memory_updates:` in the ephemeral hot state prompt. - - Modify `processLatestUserMessage(...)` in `src/plugin.ts:172-193` so explicit memories are appended to `SessionState.pendingMemories` instead of calling `updateWorkspaceMemory(...)`. - - Modify the compaction event handler at `src/plugin.ts:411-432` so it promotes both parsed compaction candidates and unpromoted `pendingMemories` to workspace memory, then clears the pending list. -- **Files to modify:** - - `src/types.ts` - - `src/session-state.ts` - - `src/plugin.ts` - - `tests/plugin.test.ts` - - `tests/workspace-memory.test.ts` only if memory-entry limit behavior changes. - -### Change 4: Durable Pending Journal (P0) - -- **What:** Add a workspace-level pending journal on disk so explicit memories survive sessions that end without compaction. -- **Why:** `SessionState.pendingMemories` is session-scoped. If the user says "remember X", then closes the session before `session.compacted`, the memory may never be promoted to `workspace-memory.json`. If `session.deleted` removes the session state first, the memory is lost. Explicit memory must be durable even when compaction never happens. -- **How (code reference):** - - Add a pending journal file at the workspace memory root, named `workspace-pending-journal.json`. - - Add `src/pending-journal.ts` with helpers: - - `loadPendingJournal(root)` - - `appendPendingMemories(root, memories)` - - `promotePendingJournal(root, promote)` - - `clearPromotedPendingMemories(root, promotedKeys)` - - Add `workspacePendingJournalPath(root)` to `src/paths.ts` near `workspaceMemoryPath(root)`. - - Modify `processLatestUserMessage(...)` in `src/plugin.ts:172-193` so explicit memory writes go to both: - - `SessionState.pendingMemories`, for same-session visibility through `system[2+]`. - - `workspace-pending-journal.json`, for durability across session end/no compaction. - - Modify `experimental.chat.system.transform` in `src/plugin.ts:264-291` so the first turn of a session attempts promotion-on-start from the journal into `workspace-memory.json` before creating the frozen rendered snapshot. This must run only before the session's first frozen snapshot, not on every turn, so current-session explicit memories do not get promoted into the same session's frozen `system[1]`. - - Modify `session.compacted` handling at `src/plugin.ts:411-432` so it promotes both session pending memories and journal pending memories. - - Modify `session.deleted` handling at `src/plugin.ts:435-442` so it promotes pending memories before deleting session state. -- **Files to modify:** - - Create `src/pending-journal.ts` - - Modify `src/paths.ts` - - Modify `src/plugin.ts` - - Modify `src/session-state.ts` - - Modify `src/types.ts` - - Modify `tests/plugin.test.ts` - -## Cache Impact Estimate - -### Assumptions for a typical 10-turn session - -- 10 model requests, with 9 follow-up turns after tool execution. -- Each follow-up turn has at least one tool call, so `tool.execute.after` updates hot state each time through `src/plugin.ts:294-341`. -- Base system header size: about 10-15 KB, depending on agent/provider prompt. -- Workspace memory snapshot size: target 4.2 KB, max 5.2 KB from `src/types.ts:74-80`. -- Hot session state size: 0.3-1.2 KB, max 1.2 KB from `src/types.ts:82-89`. -- A typical tool update changes: - - Active file count/action line: 5-25 changed characters, e.g. `(read, 1x)` to `(read, 2x)`. - - New active file line: 30-90 added characters. - - New error summary line: 80-220 added characters. - - Timestamp-derived ordering can reorder rendered lines without visible timestamps. -- Practical estimate: each tool call changes 50-300 visible characters in hot state; a failed `bash` command can change 150-500 visible characters. - -### Cache-control placement - -For Anthropic/Claude-like providers, OpenCode applies cache control here: - -```ts -// /Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts:192-194 -const system = msgs.filter((msg) => msg.role === "system").slice(0, 2) -const final = msgs.filter((msg) => msg.role !== "system").slice(-2) -``` - -Then the selected messages receive provider-specific cache-control metadata at `transform.ts:217-238`. - -### Before - -Prompt layering effectively becomes: - -```text -system[0] = base header cached, stable -system[1] = workspace memory + hot session state cached, changes after tool calls -final[-2:] = latest non-system messages cached, changes every turn -``` - -Estimated cacheability across 10 turns: - -- System[0] cached: ~95-100% after first request. -- System[1] cached: ~10-25% because hot state changes on most turns. -- Last two non-system messages cached: ~0-20% because each turn appends new assistant/tool/user content. - -Byte-weighted estimate: - -```text -Stable bytes before per request: - system[0] = 12 KB reusable - system[1] = 4.2 KB workspace + 0.8 KB hot = 5 KB, but invalidated on ~9/10 turns - -Reusable cached system bytes after warm-up: - system[0] reuse = 12 KB * 9 turns = 108 KB - system[1] reuse = 5 KB * 2 likely stable turns = 10 KB - total potential system bytes = 17 KB * 9 turns = 153 KB - -Effective cached system-prefix reuse = (108 + 10) / 153 = 77% -``` - -However provider cache behavior often invalidates later prefix segments when an earlier cached block's content changes or when the cache breakpoints move. Because `system[1]` is one of the explicit cache-control breakpoints, practical observed cache benefit is expected to be lower than the byte-only estimate: - -- System[0] cached: ~95-100%. -- System[1] cached: ~10-25%. -- Overall effective cache hit rate: ~30-45% for the cache-controlled prompt sections in tool-heavy sessions. - -### After - -Prompt layering becomes: - -```text -system[0] = base header cached, stable -system[1] = frozen rendered workspace memory snapshot cached, stable -system[2] = hot session state + pending memory deltas uncached ephemeral -final[-2:] = latest non-system messages cached, changes every turn -``` - -Estimated cacheability across 10 turns: - -- System[0] cached: ~100% after first request. -- System[1] cached: ~90-100% within a session. Use ~90% to account for new sessions, explicit session restarts, compaction boundaries, and provider-side eviction. -- System[2] uncached: N/A by design; it is not selected by `slice(0, 2)`. -- Last two non-system messages cached: ~0-20% because conversation tail remains dynamic. - -Byte-weighted estimate: - -```text -Stable cached prefix after per request: - system[0] = 12 KB - system[1] = 4.2 KB workspace snapshot - system[2] = 0.8 KB hot state, intentionally uncached - -Reusable cached system bytes after warm-up: - system[0] reuse = 12 KB * 9 turns = 108 KB - system[1] reuse = 4.2 KB * 8.5 effective turns = 35.7 KB - total stable cacheable system bytes = 16.2 KB * 9 turns = 145.8 KB - -Effective cached stable-prefix reuse = (108 + 35.7) / 145.8 = 98.6% -``` - -Including dynamic tail messages and provider eviction, a conservative end-to-end estimate is: - -- Expected cache hit rate for stable system-prefix bytes: ~95-99% after warm-up. -- Expected overall cache hit rate across cache-controlled sections: ~80-85% in a 10-turn tool-heavy session. -- Expected improvement versus current behavior: +35 to +50 percentage points, mainly by preventing hot state from invalidating `system[1]`. - -## Timeline - -- Phase 1: Implement frozen rendered snapshot inside the plugin and add tests proving workspace memory render output is stable. All tests must pass at the end of this phase. -- Phase 2: Modify OpenCode system-message merging so `system[2+]` remains separate and ephemeral; verify cache-control still targets only `system[0]` and `system[1]`. -- Phase 3: Add durable pending journal, pending memory deltas, promotion-on-compaction, promotion-on-start, and promotion-before-delete. All explicit memory durability tests must pass at the end of this phase. - -## Risk / Tradeoffs - -- **Requires OpenCode core change:** The biggest cache win depends on changing `/Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts:114-119`. Without that change, plugin-side ordering alone cannot prevent hot state from merging into `system[1]`. -- **Explicit memory becomes eventually consistent:** A user saying "remember X" will no longer alter the stable workspace snapshot immediately. The current session still sees the pending delta through `system[2+]`; future sessions see it after compaction promotion. -- **Compaction must not lose pending deltas:** Promotion logic needs tests for sessions with pending explicit memories but no parsed compaction candidates. -- **No-compaction sessions must not lose explicit memory:** `workspace-pending-journal.json` is required because many real sessions will end without `session.compacted`. -- **Durable pending journal adds disk I/O:** Explicit memory writes now touch session state and a workspace-level journal. This is acceptable because explicit memory events are rare compared with tool calls, but tests should cover corrupted/missing journal fallback. -- **Journal schema migration:** `workspace-pending-journal.json` needs the same boring normalization discipline as `workspace-memory.json`: tolerate missing fields, unknown versions, duplicate entries, and partial/corrupt files by falling back safely. -- **More session-state schema surface:** Adding `pendingMemories` increases normalization and migration responsibility, but this is contained in `src/session-state.ts`. -- **Provider-specific cache semantics vary:** The estimate is most applicable to Anthropic/Claude-like providers because OpenCode applies cache control to them in `transform.ts:281-295`. - -## Required Edge Cases - -- **No compaction, new session:** Explicit memory written in session A must survive in `workspace-pending-journal.json` and be promoted before session B freezes its workspace snapshot. -- **Session deleted:** `session.deleted` must promote pending memories before deleting the session state file. If promotion fails, do not delete the session state. -- **Duplicate explicit memory:** Dedupe by normalized `type + text`, not generated `id`, because `extractExplicitMemories(...)` creates a fresh id for each extraction. -- **Promotion failure:** If `updateWorkspaceMemory(...)` fails, leave both `SessionState.pendingMemories` and `workspace-pending-journal.json` intact. -- **Pending memory render cap:** Render at most `HOT_STATE_LIMITS.maxPendingMemoriesRendered` entries and keep total hot prompt within `HOT_STATE_LIMITS.maxRenderedChars`. -- **Oversized workspace memory:** Frozen rendered snapshot must still respect `LONG_TERM_LIMITS.maxRenderedChars` through `renderWorkspaceMemory(...)`. - ---- - -## File Structure - -- `src/plugin.ts`: Owns plugin hooks, frozen snapshot cache, explicit memory processing, compaction promotion, and injection order. -- `src/pending-journal.ts`: Owns durable workspace-level pending memories in `workspace-pending-journal.json`, including append, dedupe, promotion, clearing, and corrupt-file fallback. -- `src/paths.ts`: Owns path helpers for `workspace-memory.json`, session state, and `workspace-pending-journal.json`. -- `src/session-state.ts`: Owns hot state persistence and rendering, including pending memory deltas. -- `src/types.ts`: Owns the `SessionState` schema and limits for hot state and pending memories. -- `tests/plugin.test.ts`: Covers plugin hooks, frozen snapshot behavior, pending delta behavior, and compaction promotion. -- `/Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts`: Owns system-message structure before provider transform. -- `/Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts`: No planned change; keep cache-control selection as first two system messages plus final two non-system messages. - ---- - -## Wave 1 — Freeze Rendered Workspace Snapshot - -### Task 1: Add frozen rendered snapshot tests - -**Files:** -- Modify: `tests/plugin.test.ts` - -- [ ] **Step 1: Add a test proving workspace memory render output is reused within the same session** - -Append this test to `tests/plugin.test.ts`: - -```ts -test("chat system transform reuses frozen rendered workspace snapshot", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const client = mockRootClient(); - const plugin = await MemoryV2Plugin({ directory: tmpDir, client }); - - const output1 = { system: ["base header"] }; - await (plugin as Record)["experimental.chat.system.transform"]( - { sessionID: "snapshot-session", model: {} }, - output1, - ); - - const firstWorkspacePrompt = output1.system.find((part: string) => - part.startsWith("Workspace memory") - ); - - assert.equal(firstWorkspacePrompt, undefined, - "empty workspace memory should not render a prompt before any memories exist"); - - const output2 = { system: ["base header"] }; - await (plugin as Record)["experimental.chat.system.transform"]( - { sessionID: "snapshot-session", model: {} }, - output2, - ); - - assert.deepEqual(output2.system, ["base header"], - "no compaction summary means no workspace memory prompt is added"); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -- [ ] **Step 2: Run the focused test file and verify current behavior** - -Run: - -```bash -npm test -- tests/plugin.test.ts -``` - -Expected: existing tests pass. Wave 1 must not add pending-memory tests yet; pending-memory behavior belongs to Wave 3 so every wave remains green and committable. - -### Task 2: Implement frozen rendered snapshot cache - -**Files:** -- Modify: `src/plugin.ts:160-167` -- Modify: `src/plugin.ts:238-252` -- Modify: `src/plugin.ts:274-284` -- Modify: `src/plugin.ts:374-379` - -- [ ] **Step 1: Change the frozen cache entry shape** - -Replace the cache type at `src/plugin.ts:160-167` with: - -```ts - // Cache for frozen workspace memory per session - const frozenWorkspaceMemoryCache = new Map< - string, - { - store: Awaited>; - renderedPrompt: string; - loadedAt: number; - } - >(); -``` - -- [ ] **Step 2: Replace the loader with a rendered snapshot loader** - -Replace `getFrozenWorkspaceMemory(...)` at `src/plugin.ts:238-252` with: - -```ts - async function getFrozenWorkspaceMemorySnapshot( - root: string, - sessionID: string - ): Promise<{ - store: Awaited>; - renderedPrompt: string; - }> { - const now = Date.now(); - const cached = frozenWorkspaceMemoryCache.get(sessionID); - - // Cache is valid for the session lifetime. - if (cached) { - return { store: cached.store, renderedPrompt: cached.renderedPrompt }; - } - - const store = await loadWorkspaceMemory(root); - const renderedPrompt = renderWorkspaceMemory(store); - frozenWorkspaceMemoryCache.set(sessionID, { store, renderedPrompt, loadedAt: now }); - return { store, renderedPrompt }; - } -``` - -- [ ] **Step 3: Update chat system transform to use the rendered snapshot** - -Replace `src/plugin.ts:274-284` with: - -```ts - // Get frozen workspace memory snapshot (loaded and rendered once per session) - const workspaceSnapshot = await getFrozenWorkspaceMemorySnapshot(directory, sessionID); - - // Get current hot session state - const sessionState = await loadSessionState(directory, sessionID); - - // Inject frozen workspace memory snapshot - if (workspaceSnapshot.renderedPrompt) { - output.system.push(workspaceSnapshot.renderedPrompt); - } -``` - -- [ ] **Step 4: Update compaction context to use the frozen rendered prompt** - -Replace `src/plugin.ts:374-379` with: - -```ts - const workspaceSnapshot = await getFrozenWorkspaceMemorySnapshot(directory, sessionID); - if (workspaceSnapshot.renderedPrompt) { - contextParts.push(workspaceSnapshot.renderedPrompt); - } -``` - -- [ ] **Step 5: Rename remaining references** - -Run: - -```bash -rg "getFrozenWorkspaceMemory\(" src/plugin.ts -``` - -Expected: no matches. - -- [ ] **Step 6: Run typecheck** - -Run: - -```bash -npm run typecheck -``` - -Expected: PASS. If TypeScript reports missing `getFrozenWorkspaceMemory`, update any missed call to `getFrozenWorkspaceMemorySnapshot`. - -### Wave 1 verification checkpoint - -- [ ] **Step 1: Run test suite** - -Run: - -```bash -npm test -``` - -Expected: PASS. Wave 1 must end with a green test suite. - -- [ ] **Step 2: Commit wave after tests pass** - -```bash -git add src/plugin.ts tests/plugin.test.ts -git commit -m "feat: freeze rendered workspace memory snapshot" -``` - ---- - -## Wave 2 — Preserve Ephemeral System Segments in OpenCode - -### Task 3: Change OpenCode system message merge behavior - -**Files:** -- Modify: `/Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts:114-119` - -- [ ] **Step 1: Replace the merge logic** - -In `/Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts`, replace lines `114-119` with: - -```ts - // Preserve cache locality: - // - system[0] is the stable provider/agent header. - // - system[1] is the stable plugin snapshot, if present. - // - system[2+] is dynamic ephemeral context and must not be merged into system[1]. - if (system.length > 2 && system[0] === header) { - const stableSnapshot = system[1] - const ephemeral = system.slice(2) - system.length = 0 - system.push(header) - if (stableSnapshot) system.push(stableSnapshot) - system.push(...ephemeral) - } -``` - -- [ ] **Step 2: Add or update a focused OpenCode test** - -Search for existing LLM/session tests: - -```bash -cd /Users/sd_wo/work/opencode-clone -rg "rejoin to maintain 2-part structure|experimental.chat.system.transform|system\[1\]" packages/opencode/test packages/opencode/src -g "*test*" -g "*.ts" -``` - -If an existing test harness can instantiate the LLM path, add a test asserting this final system layout: - -```ts -assert.deepEqual(system, [ - "base header", - "Workspace memory (cross-session, verify if stale):\nproject:\n- stable fact", - "Hot session state (current session):\nactive_files:\n- src/plugin.ts (read, 2x)", -]); -``` - -If no focused harness exists, create the smallest unit around the extracted merge helper in the same package. Extract the merge block to a local helper named `preserveEphemeralSystemSegments(system: string[], header: string): void` in `llm.ts`, export it only if the package's test pattern requires exports. - -- [ ] **Step 3: Verify provider transform remains unchanged** - -Run: - -```bash -sed -n '192,241p' /Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts -``` - -Expected: `const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)` remains unchanged, so `system[2+]` is not cache-controlled. - -- [ ] **Step 4: Run OpenCode package checks** - -Run the appropriate package checks from `/Users/sd_wo/work/opencode-clone`. If the repository uses Bun, run: - -```bash -cd /Users/sd_wo/work/opencode-clone -bun test packages/opencode -``` - -If that command is not available, run the package's documented test command from its `package.json` and record the command/output in the implementation notes. - -Expected: PASS. - -### Wave 2 verification checkpoint - -- [ ] **Step 1: Verify cache-control targets only stable messages** - -Confirm these two facts in code review: - -- `llm.ts` preserves `system[2+]` as separate messages. -- `transform.ts:192-194` still selects only `system.slice(0, 2)` for cache control. - -- [ ] **Step 2: Commit OpenCode wave** - -```bash -cd /Users/sd_wo/work/opencode-clone -git add packages/opencode/src/session/llm.ts -git commit -m "feat: preserve ephemeral system prompt segments" -``` - ---- - -## Wave 3 — Durable Pending Journal and Promotion - -### Task 4: Add durable workspace pending journal - -**Files:** -- Create: `src/pending-journal.ts` -- Modify: `src/paths.ts` -- Test: `tests/plugin.test.ts` - -- [ ] **Step 1: Add pending journal path helper** - -In `src/paths.ts`, add this helper near `workspaceMemoryPath(root)`: - -```ts -export async function workspacePendingJournalPath(root: string): Promise { - return join(await memoryRoot(root), "workspace-pending-journal.json"); -} -``` - -- [ ] **Step 2: Create the pending journal module** - -Create `src/pending-journal.ts`: - -```ts -import { workspacePendingJournalPath } from "./paths.ts"; -import { atomicWriteJSON, readJSON, updateJSON } from "./storage.ts"; -import type { LongTermMemoryEntry } from "./types.ts"; - -export type PendingJournal = { - version: 1; - entries: LongTermMemoryEntry[]; - updatedAt: string; -}; - -function emptyPendingJournal(): PendingJournal { - return { version: 1, entries: [], updatedAt: new Date().toISOString() }; -} - -export function memoryKey(memory: Pick): string { - return `${memory.type}:${memory.text.toLowerCase().replace(/\s+/g, " ").trim()}`; -} - -function normalizeJournal(input: Partial): PendingJournal { - return { - version: 1, - entries: Array.isArray(input.entries) ? input.entries : [], - updatedAt: input.updatedAt ?? new Date().toISOString(), - }; -} - -export async function loadPendingJournal(root: string): Promise { - return normalizeJournal(await readJSON(await workspacePendingJournalPath(root), emptyPendingJournal)); -} - -export async function appendPendingMemories(root: string, memories: LongTermMemoryEntry[]): Promise { - const path = await workspacePendingJournalPath(root); - return updateJSON(path, emptyPendingJournal, current => { - const journal = normalizeJournal(current); - const existing = new Set(journal.entries.map(memoryKey)); - for (const memory of memories) { - const key = memoryKey(memory); - if (!existing.has(key)) { - journal.entries.push(memory); - existing.add(key); - } - } - journal.updatedAt = new Date().toISOString(); - return journal; - }); -} - -export async function hasPendingJournalEntries(root: string): Promise { - const journal = await loadPendingJournal(root); - return journal.entries.length > 0; -} - -export async function clearPendingMemories(root: string, promotedKeys: Set): Promise { - const path = await workspacePendingJournalPath(root); - return updateJSON(path, emptyPendingJournal, current => { - const journal = normalizeJournal(current); - journal.entries = journal.entries.filter(memory => !promotedKeys.has(memoryKey(memory))); - journal.updatedAt = new Date().toISOString(); - return journal; - }); -} - -export async function savePendingJournal(root: string, journal: PendingJournal): Promise { - await atomicWriteJSON(await workspacePendingJournalPath(root), normalizeJournal(journal)); -} -``` - -- [ ] **Step 3: Add journal unit coverage through plugin tests** - -Add tests later in Task 7 for no-compaction, session-deleted, duplicate explicit memory, and promotion failure. Do not add behavior tests here until session state and promotion paths exist. - -### Task 5: Extend session state with pending memories - -**Files:** -- Modify: `src/types.ts:64-72` -- Modify: `src/session-state.ts:14-60` -- Modify: `src/session-state.ts:174-208` - -- [ ] **Step 1: Extend the `SessionState` type** - -In `src/types.ts`, change `SessionState` to: - -```ts -export type SessionState = { - version: 1; - sessionID: string; - turn: number; - updatedAt: string; - activeFiles: ActiveFile[]; - openErrors: OpenError[]; - recentDecisions: SessionDecision[]; - pendingMemories: LongTermMemoryEntry[]; -}; -``` - -Add this limit to `HOT_STATE_LIMITS`: - -```ts - maxPendingMemoriesStored: 12, - maxPendingMemoriesRendered: 6, -``` - -- [ ] **Step 2: Initialize pending memories** - -In `src/session-state.ts`, update `createEmptySessionState(...)` to include: - -```ts - pendingMemories: [], -``` - -- [ ] **Step 3: Normalize pending memories on load/update** - -In both `loadSessionState(...)` and the `updateJSON(...)` callback inside `updateSessionState(...)`, add: - -```ts - loaded.pendingMemories = Array.isArray(loaded.pendingMemories) ? loaded.pendingMemories : []; -``` - -and: - -```ts - current.pendingMemories = Array.isArray(current.pendingMemories) ? current.pendingMemories : []; -``` - -In `normalizeSessionState(...)`, add: - -```ts - state.pendingMemories = state.pendingMemories.slice(-HOT_STATE_LIMITS.maxPendingMemoriesStored); -``` - -- [ ] **Step 4: Render pending memories as ephemeral hot state** - -In `renderHotSessionState(...)`, add this after the `decisions` variable: - -```ts - const pendingMemories = state.pendingMemories.slice(-HOT_STATE_LIMITS.maxPendingMemoriesRendered); -``` - -Change the empty check to: - -```ts - if ( - activeFiles.length === 0 && - openErrors.length === 0 && - decisions.length === 0 && - pendingMemories.length === 0 - ) return ""; -``` - -Add this block before the final return: - -```ts - if (pendingMemories.length > 0) { - lines.push("pending_memory_updates:"); - for (const memory of pendingMemories) { - lines.push(`- [${memory.type}] ${memory.text}`); - } - } -``` - -- [ ] **Step 5: Update existing test fixtures** - -In `tests/plugin.test.ts`, update every inline `SessionState` fixture to include: - -```ts -pendingMemories: [], -``` - -This includes `createSessionWithError(...)` at `tests/plugin.test.ts:21-31` and the compaction fixture at `tests/plugin.test.ts:206-214`. - -- [ ] **Step 6: Run typecheck** - -Run: - -```bash -npm run typecheck -``` - -Expected: PASS after all fixtures include `pendingMemories`. - -### Task 6: Store explicit memories as pending deltas and durable journal entries - -**Files:** -- Modify: `src/plugin.ts:172-193` -- Modify: `src/pending-journal.ts` - -- [ ] **Step 1: Replace immediate workspace-memory update in `processLatestUserMessage(...)`** - -Replace `src/plugin.ts:180-193` with: - -```ts - if (memories.length > 0) { - await updateSessionState(directory, sessionID, state => { - const existingKeys = new Set(state.pendingMemories.map(memoryKey)); - for (const memory of memories) { - const key = memoryKey(memory); - if (!existingKeys.has(key)) { - state.pendingMemories.push(memory); - existingKeys.add(key); - } - } - return state; - }); - - await appendPendingMemories(directory, memories); - } -``` - -Add imports at the top of `src/plugin.ts`: - -```ts -import { - appendPendingMemories, - clearPendingMemories, - hasPendingJournalEntries, - loadPendingJournal, - memoryKey, -} from "./pending-journal.ts"; -``` - -Keep the decisions block at `src/plugin.ts:195-204`, but ensure it still runs after pending memories are recorded. - -- [ ] **Step 2: Confirm frozen cache is no longer mutated by explicit memory** - -Run: - -```bash -rg "cached\.store|Update frozen cache|workspaceMemory = await updateWorkspaceMemory" src/plugin.ts -``` - -Expected: no matches for explicit-memory cache mutation. `updateWorkspaceMemory(...)` should still exist in the `session.compacted` event handler. - -- [ ] **Step 3: Run tests** - -Run: - -```bash -npm test -``` - -Expected: PASS for the current suite. Pending-memory behavior tests are added in Task 7 after the journal and session-state plumbing exists. - -### Task 7: Promote pending deltas during compaction, session start, and delete - -**Files:** -- Modify: `src/plugin.ts:411-432` -- Modify: `src/plugin.ts:264-291` -- Modify: `src/plugin.ts:435-442` -- Modify: `tests/plugin.test.ts` - -- [ ] **Step 1: Add helper for explicit-memory client messages** - -Append this helper to `tests/plugin.test.ts`: - -```ts -function mockClientWithLatestUser(text: string, id = "msg-explicit-1") { - return { - session: { - get: async () => ({ data: { parentID: null } }), - messages: async () => ({ - data: [ - { - id, - role: "user", - parts: [{ type: "text", text }], - }, - ], - }), - }, - }; -} -``` - -- [ ] **Step 2: Add failing test for same-session pending visibility without workspace mutation** - -Append this test to `tests/plugin.test.ts`: - -```ts -test("explicit memory is pending and does not mutate frozen workspace prompt", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const client = mockClientWithLatestUser("remember: Use SQLite snapshots for workspace memory."); - const plugin = await MemoryV2Plugin({ directory: tmpDir, client }); - - const first = { system: ["base header"] }; - await (plugin as Record)["experimental.chat.system.transform"]( - { sessionID: "explicit-session", model: {} }, - first, - ); - - const second = { system: ["base header"] }; - await (plugin as Record)["experimental.chat.system.transform"]( - { sessionID: "explicit-session", model: {} }, - second, - ); - - const workspacePrompts = second.system.filter((part: string) => part.startsWith("Workspace memory")); - const hotPrompts = second.system.filter((part: string) => part.startsWith("Hot session state")); - - assert.equal(workspacePrompts.length, 0, - "explicit memory must not appear in the frozen workspace prompt during the same session"); - assert.equal(hotPrompts.length, 1, - "explicit memory should be visible through the ephemeral hot-state prompt"); - assert.match(hotPrompts[0], /pending_memory_updates:/); - assert.match(hotPrompts[0], /Use SQLite snapshots for workspace memory/); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -- [ ] **Step 3: Add failing test for no-compaction new-session durability** - -Append this test to `tests/plugin.test.ts`: - -```ts -test("no compaction: explicit memory is promoted on next session start from durable journal", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const firstClient = mockClientWithLatestUser("remember: Prefer boring cache boundaries.", "msg-remember-1"); - const firstPlugin = await MemoryV2Plugin({ directory: tmpDir, client: firstClient }); - - await (firstPlugin as Record)["experimental.chat.system.transform"]( - { sessionID: "session-without-compaction", model: {} }, - { system: ["base header"] }, - ); - - const secondClient = mockRootClient(); - const secondPlugin = await MemoryV2Plugin({ directory: tmpDir, client: secondClient }); - const output = { system: ["base header"] }; - - await (secondPlugin as Record)["experimental.chat.system.transform"]( - { sessionID: "new-session", model: {} }, - output, - ); - - const workspacePrompt = output.system.find((part: string) => part.startsWith("Workspace memory")); - assert.match(workspacePrompt ?? "", /Prefer boring cache boundaries/); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -- [ ] **Step 4: Add failing test for session delete durability** - -Append this test to `tests/plugin.test.ts`: - -```ts -test("session.deleted promotes pending memories before deleting session state", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const client = mockClientWithLatestUser("remember: Promote pending memories before delete.", "msg-delete-1"); - const plugin = await MemoryV2Plugin({ directory: tmpDir, client }); - - await (plugin as Record)["experimental.chat.system.transform"]( - { sessionID: "delete-session", model: {} }, - { system: ["base header"] }, - ); - - await (plugin as Record)["event"]({ - event: { - type: "session.deleted", - properties: { info: { id: "delete-session" } }, - }, - }); - - const nextPlugin = await MemoryV2Plugin({ directory: tmpDir, client: mockRootClient() }); - const output = { system: ["base header"] }; - await (nextPlugin as Record)["experimental.chat.system.transform"]( - { sessionID: "after-delete-session", model: {} }, - output, - ); - - const workspacePrompt = output.system.find((part: string) => part.startsWith("Workspace memory")); - assert.match(workspacePrompt ?? "", /Promote pending memories before delete/); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -- [ ] **Step 5: Add failing test for duplicate explicit memory dedupe by text** - -Append this test to `tests/plugin.test.ts`: - -```ts -test("duplicate explicit memories dedupe by normalized type and text, not generated id", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const pluginA = await MemoryV2Plugin({ - directory: tmpDir, - client: mockClientWithLatestUser("remember: Prefer stable cache boundaries.", "msg-a"), - }); - await (pluginA as Record)["experimental.chat.system.transform"]( - { sessionID: "dedupe-session", model: {} }, - { system: ["base header"] }, - ); - - const pluginB = await MemoryV2Plugin({ - directory: tmpDir, - client: mockClientWithLatestUser("remember: prefer stable cache boundaries.", "msg-b"), - }); - await (pluginB as Record)["experimental.chat.system.transform"]( - { sessionID: "dedupe-session", model: {} }, - { system: ["base header"] }, - ); - - await (pluginB as Record)["event"]({ - event: { type: "session.compacted", properties: { sessionID: "dedupe-session" } }, - }); - - const output = { system: ["base header"] }; - const pluginC = await MemoryV2Plugin({ directory: tmpDir, client: mockRootClient() }); - await (pluginC as Record)["experimental.chat.system.transform"]( - { sessionID: "dedupe-next", model: {} }, - output, - ); - - const joined = output.system.join("\n"); - assert.equal((joined.match(/stable cache boundaries/gi) ?? []).length, 1); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -- [ ] **Step 6: Add failing test for compaction promotion** - -Append this test to `tests/plugin.test.ts`: - -```ts -test("session.compacted promotes pending memories to workspace memory and clears pending list", async () => { - const tmpDir = await mkdtemp(join(tmpdir(), "memory-plugin-test-")); - - try { - const client = mockRootClient(); - const plugin = await MemoryV2Plugin({ directory: tmpDir, client }); - - await saveSessionState(tmpDir, { - version: 1, - sessionID: "promote-session", - turn: 1, - updatedAt: new Date().toISOString(), - activeFiles: [], - openErrors: [], - recentDecisions: [], - pendingMemories: [{ - id: "mem_pending_1", - type: "decision", - text: "Use frozen rendered snapshots for cache stability.", - source: "explicit", - confidence: 1, - status: "active", - createdAt: new Date().toISOString(), - updatedAt: new Date().toISOString(), - }], - }); - - await (plugin as Record)["event"]({ - event: { - type: "session.compacted", - properties: { sessionID: "promote-session" }, - }, - }); - - const state = await loadSessionState(tmpDir, "promote-session"); - assert.equal(state.pendingMemories.length, 0, - "pending memories should be cleared after promotion"); - - const after = { system: ["base header"] }; - await (plugin as Record)["experimental.chat.system.transform"]( - { sessionID: "new-session-after-promotion", model: {} }, - after, - ); - - const workspacePrompt = after.system.find((part: string) => part.startsWith("Workspace memory")); - assert.match(workspacePrompt ?? "", /Use frozen rendered snapshots for cache stability/); - } finally { - await rm(tmpDir, { recursive: true, force: true }); - } -}); -``` - -- [ ] **Step 7: Add failing test for promotion failure retaining pending memories** - -Add a test that makes `updateWorkspaceMemory(...)` fail by replacing the workspace memory path with a directory before promotion. The assertion must be that `loadSessionState(tmpDir, sessionID).pendingMemories.length` remains `1` and the journal still contains the pending memory after the event handler rejects or returns. - -- [ ] **Step 8: Add render cap tests** - -Add tests asserting: - -```text -pending memories > 6 - → renderHotSessionState renders only HOT_STATE_LIMITS.maxPendingMemoriesRendered entries - → final hot prompt length <= HOT_STATE_LIMITS.maxRenderedChars - -workspace memory entries exceed LONG_TERM_LIMITS.maxEntries / maxRenderedChars - → renderWorkspaceMemory remains capped by LONG_TERM_LIMITS.maxRenderedChars -``` - -- [ ] **Step 9: Run test to verify failures before implementation** - -Run: - -```bash -npm test -- tests/plugin.test.ts -``` - -Expected: FAIL because durable journal, promotion-on-start, promotion-before-delete, text-key dedupe, and failure retention are not yet implemented. - -- [ ] **Step 10: Implement promotion helper and promotion-on-start** - -Add an internal helper in `src/plugin.ts`: - -```ts -async function promotePendingMemories(sessionID?: string): Promise { - const journal = await loadPendingJournal(directory); - const sessionState = sessionID ? await loadSessionState(directory, sessionID) : undefined; - const pending = [ - ...(sessionState?.pendingMemories ?? []), - ...journal.entries, - ]; - if (pending.length === 0) return; - - const promotedKeys = new Set(); - await updateWorkspaceMemory(directory, workspaceMemory => { - const existingKeys = new Set(workspaceMemory.entries.map(memoryKey)); - for (const memory of pending) { - const key = memoryKey(memory); - if (!existingKeys.has(key)) { - workspaceMemory.entries.push(memory); - existingKeys.add(key); - } - promotedKeys.add(key); - } - return workspaceMemory; - }); - - if (sessionID) { - await updateSessionState(directory, sessionID, state => { - state.pendingMemories = state.pendingMemories.filter(memory => !promotedKeys.has(memoryKey(memory))); - return state; - }); - } - - await clearPendingMemories(directory, promotedKeys); - if (sessionID) clearFrozenWorkspaceMemoryCache(sessionID); -} -``` - -Call this helper in `experimental.chat.system.transform` before `processLatestUserMessage(sessionID)` and before `getFrozenWorkspaceMemorySnapshot(...)`, but only when this session has not frozen a snapshot yet: - -```ts - // Promote durable pending memories from prior sessions before freezing this session's snapshot. - // Only do this before the first snapshot for this session. Later turns must not promote - // current-session explicit memories into the same session's frozen system[1]. - if (!frozenWorkspaceMemoryCache.has(sessionID) && await hasPendingJournalEntries(directory)) { - await promotePendingMemories(); - } - - // Process explicit user memory after prior-session promotion. New explicit memory from - // this session becomes pending + ephemeral, not part of the frozen workspace snapshot. - await processLatestUserMessage(sessionID); -``` - -Remove the old unconditional `await processLatestUserMessage(sessionID);` if it now appears twice in the hook. - -- [ ] **Step 11: Implement compaction promotion** - -Replace the body inside `if (event.type === "session.compacted") { ... }` at `src/plugin.ts:411-432` with logic equivalent to: - -```ts - // Parse latest compaction summary for memory candidates - const summary = await latestCompactionSummary(client, sessionID); - const candidates = summary ? parseWorkspaceMemoryCandidates(summary) : []; - if (candidates.length > 0) { - await appendPendingMemories(directory, candidates); - } - await promotePendingMemories(sessionID); -``` - -- [ ] **Step 12: Implement promotion-before-delete** - -In the `session.deleted` handler at `src/plugin.ts:435-442`, call promotion before removing session state: - -```ts - await promotePendingMemories(sessionID); -``` - -Only delete the session state after promotion succeeds. If promotion fails, leave session state and journal intact. - -- [ ] **Step 13: Run tests** - -Run: - -```bash -npm test -``` - -Expected: PASS. - -### Wave 3 verification checkpoint - -- [ ] **Step 1: Run typecheck and tests** - -Run: - -```bash -npm run typecheck -npm test -``` - -Expected: both PASS. - -- [ ] **Step 2: Commit plugin wave** - -```bash -git add src/types.ts src/session-state.ts src/plugin.ts src/paths.ts src/pending-journal.ts tests/plugin.test.ts tests/workspace-memory.test.ts -git commit -m "feat: persist explicit memory through durable pending journal" -``` - ---- - -## Final Verification - -- [ ] **Step 1: Verify no dynamic state is in cached system[1]** - -Manually inspect: - -```bash -sed -n '264,291p' src/plugin.ts -sed -n '108,122p' /Users/sd_wo/work/opencode-clone/packages/opencode/src/session/llm.ts -sed -n '192,241p' /Users/sd_wo/work/opencode-clone/packages/opencode/src/provider/transform.ts -``` - -Expected: - -- Plugin pushes workspace snapshot before hot state. -- OpenCode preserves `system[2+]` instead of merging it into `system[1]`. -- Provider transform still cache-controls only first two system messages. - -- [ ] **Step 2: Run all plugin checks** - -```bash -npm run typecheck -npm test -``` - -Expected: PASS. - -- [ ] **Step 3: Record cache-impact evidence** - -During manual dogfooding, capture one 10-turn tool-heavy session and record: - -```text -turn_count = 10 -workspace_snapshot_chars = length(system[1]) -hot_state_chars_by_turn = [length(system[2]) per turn] -system_1_changed_between_turns = false -system_2_changed_between_turns = true -``` - -Expected: `system_1_changed_between_turns = false` for all turns until compaction/session boundary. - ---- - -## Self-Review - -- Spec coverage: The plan covers frozen rendered snapshot, ephemeral `system[2+]`, durable pending journal, pending delta promotion, no-compaction durability, delete-time promotion, dedupe, caps, failure retention, and cache impact estimate. -- Placeholder scan: No placeholder tasks remain; each implementation step identifies exact files and code blocks. -- Type consistency: `pendingMemories` is added to `SessionState`, initialized in session-state helpers, rendered in hot state, mirrored into `workspace-pending-journal.json`, and promoted through shared plugin promotion logic. -- Wave coherence: Wave 1 creates frozen snapshot support and ends green, Wave 2 changes OpenCode message boundaries, Wave 3 implements durable pending memory and promotion. Each wave has a verification checkpoint and commit boundary.