53 Commits

Author SHA1 Message Date
Ralph Chang cc9656ed59 refactor(memory-diag): remove legacy aliases, centralize command metadata, prepare v1.5.4
- Remove legacy CLI aliases (health, quality, rejections, disappearances, trace)
- Centralize command metadata in command-metadata.ts
- Move trace lifecycle into explain command
- Move disappearance helpers into missing formatter
- Remove cleanup:workspaces from package scripts (dev tool preserved)
- Bump version to 1.5.4
2026-05-02 21:57:13 +08:00
Ralph Chang cf05b9fa69 feat(memory-diag): publish diagnostics CLI 2026-05-02 20:36:58 +08:00
Ralph Chang aaa4016ae8 feat(reinforcement): compaction prompt wording reuse, migration evidence, and validation baseline
Wave 1 — Compaction prompt improvement:
- Add three wording-reuse bullets to buildCompactionPrompt() under
  CRITICAL MEMORY RULES: do not create rephrased duplicates, reuse
  existing wording exactly when re-emitting, only emit new memories
  when the fact is new, materially corrected, or more specific.
- This attacks the root cause of zero reinforcement: compaction
  generating variant text for the same durable fact.

Wave 2 — Bug fixes:
- Bug #2: Add placeholder comment to superseded_existing branch in
  decision dedupe (unreachable until v1.5.4 numbered refs). Preserve
  as const type assertions.
- Bug #3: Add memory_migration_superseded evidence event type. Both
  P0 and quality cleanup migrations now produce evidence events for
  superseded entries. loadWorkspaceMemory appends migration evidence
  on first-load migrations only (idempotent via migration IDs). No
  historical backfill.
- Bug #4: Add documentation comment explaining that feedback identity
  key returns exact key (absorbed_identity currently impossible for
  feedback). Add test verifying this behavior.

Wave 3 — Validation baseline script:
- Add scripts/dev/validate-identity-keys.ts: read-only script that
  scans workspace memory stores, computes exact/identity key
  collisions, and reports reinforcement statistics. Baseline matches
  audit: 0 exact collisions, 0 identity collisions, 0 reinforcement
  events across 123 active memories.

Identity extension is gated on measurement: if the prompt change
produces measurable reinforcement (reinforcementCount > 0), identity
extension may be unnecessary. Decision dedupe stays exact-only
(Wave 4 deferred).
2026-05-02 15:03:34 +08:00
Ralph Chang ff5c568cb7 chore(release): prepare v1.5.2 2026-05-01 16:00:44 +08:00
Ralph Chang d569297c30 fix(retention): add UTC calendar-day diversity gate to reinforceMemory
Implement OQ-2 decision: allow at most one reinforcement per memory
identity per UTC calendar day. Same-day reinforcement is blocked
regardless of session or interval. This prevents repetitive-task
gaming where a daily recurring task could reach MAX_COUNT=6 in hours.

Guard order: same-session → calendar-day → 1-hour → max-count
(existing guards kept as defense-in-depth)

1 hour guard is redundant within same day but preserved for
sub-hour edge cases.
2026-04-30 18:38:29 +08:00
Ralph Chang 4f1c0348b4 feat(explainability): add diagnostics JSON, per-memory explain, lifecycle trace
Phase 4 Tasks 4.1-4.3:
- memory-diag health --json: machine-readable MemoryDiagJSON output
- memory-diag explain: per-memory render status with strength, reasons,
  evidence event IDs
- memory-diag trace --memory <id>: lifecycle history from evidence events
  and relations (superseded_by, reinforced_by)
- MemoryRenderStatus type with 9 statuses
- All diagnostics are read-only, no storage mutations
- Privacy-safe: redacted text previews, no raw secrets
- 270 tests pass, typecheck pass
2026-04-30 18:06:28 +08:00
Ralph Chang bc0847e3ed feat(evidence): wire evidence events into extraction, promotion, reinforcement, render, storage, and hook lifecycle
Phase 3 Tasks 3.2-3.6:
- Extraction evidence: accepted/rejected/explicit_detected/explicit_ignored
- Promotion evidence with relation edges (superseded/superseded_by, absorbed/retained)
- Reinforcement evidence with reinforced/reinforced_by relations
- Render accounting helper with render_selected/render_omitted evidence
- Storage evidence: corrupt_json_quarantined, stale_lock_recovered, lock_timeout
- Hook failure evidence in plugin
- All evidence failures swallowed, never throw into memory behavior
- Privacy-safe textPreview (redacted + truncated)
- 266 tests pass, typecheck pass
2026-04-30 17:54:13 +08:00
Ralph Chang 6a81fc384c feat(evidence): add evidence infrastructure - types, append, query, retention
Phase 3 Task 3.1:
- Create src/evidence-log.ts with EvidenceEventType, EvidencePhase,
  EvidenceOutcome, MemoryEvidenceRef, EvidenceRelation, EvidenceEventV1,
  EvidenceEventInput types
- Add appendEvidenceEvent/appendEvidenceEvents with safe write, privacy
  hashing (SHA-256 truncated), textPreview redaction, bounded retention
- Add queryEvidenceEvents, summarizeMemoryEvidence, traceMemoryLifecycle
- Add workspaceEvidenceLogPath to src/paths.ts
- Add 8 evidence-log tests: round-trip, privacy, query, resilience, retention
- Relations limited to wiring roles only (no kind/derived_from/validates)
- 253 tests pass
2026-04-30 17:33:40 +08:00
Ralph Chang ed4590ca18 refactor(retention): extract retention module from workspace-memory
Move retention constants and math to a focused src/retention.ts module:
- All half-life, reinforcement, dormancy constants
- TYPE_FACTOR, SOURCE_FACTOR, USER_IMPORTANCE_FACTOR
- RETENTION_TYPE_MAX (renamed from TYPE_MAX)
- calculateInitialStrength, calculateEffectiveHalfLife,
  calculateRetentionStrength, calculateDormantDays,
  calculateEffectiveAgeDays, reinforceMemory

No behavior changes. retention.ts imports only types from types.ts.
Workspace-memory.ts still owns storage, consolidation, and rendering.
2026-04-30 17:28:31 +08:00
Ralph Chang 09cc4a2ffb feat(deprecation): remove safetyCritical retention multiplier and type-cap bypass
- Remove SAFETY_CRITICAL_FACTOR = 6.0 from workspace-memory.ts
- Remove safetyFactor from calculateInitialStrength() - all memories now
  fade according to the same rules
- Remove safetyCritical bypass from applyTypeMaxCaps() - safetyCritical
  entries compete normally under TYPE_MAX caps
- Preserve safetyCritical?: boolean in LongTermMemoryEntry type for
  backward compatibility (no producer sets it to true)
- Update memory-diag to show deprecation warning instead of capacity alert
- Update tests: add backward-compatibility fixture test, deprecation
  strength test, normal cap competition test
- Update docs/architecture.md, RELEASE_NOTES.md, CHANGELOG.md,
  docs/configuration.md

Phase 1.5 complete: safetyCritical is now a deprecated field with no
active behavior. Safety rules belong in user-controlled agent.md files.
2026-04-30 17:23:01 +08:00
Ralph Chang c0ebd84d7e fix(security): harden hooks, quarantine corrupt JSON, test locks, fix promotion dedupe
- Wrap hooks with try/catch to prevent OpenCode disruption
- Add warnMemoryHook() for safe error logging
- Quarantine corrupt JSON files before fallback
- Add cross-process lock safety tests
- Fix pending promotion same-batch dedupe
- Update docs/architecture.md with lock semantics
- 242 tests passing
2026-04-30 11:52:01 +08:00
Ralph Chang 20a6cfe1a6 chore(release): prepare v1.5.0 2026-04-29 16:56:47 +08:00
Ralph Chang 36b78ea91c feat(memory): add retention model test gaps and health diagnostics
Wave 1 - P0 Test Gaps:
- Add hard stale prune removed regression test
- Add dormant overlap tests (entry created during dormancy)
- Add invalid timestamp NaN protection test
- Add reinforcement ordering test with reference type
- Add dedupe same-session/under-1hr guard tests
- Fix NaN handling with Number.isFinite check

Wave 2 - Helper Functions:
- Add timestampMs() for safe timestamp conversion
- Add isSafetyCriticalForDiag() aligned with runtime

Wave 3 - Health Output Format:
- Fix top rendered candidates sorted by strength (not text length)
- Add stored vs rendered counts breakdown
- Add type caps and global cap overflow display
- Track globalCapped array explicitly
- Add dormant status section

Wave 4 - Monitoring Metrics:
- Add high_importance_ratio (alert > 30%)
- Add safety_critical_count (alert > 5)
- Add max_reinforced_count (alert > 10% active)

Wave 5 - Integration Fixture:
- Add 34-entry over-cap test
- Add mixed retention regression fixture
- Test TYPE_MAX caps, safety-critical exemption, reinforcement ordering

Tests: 224 → 237
2026-04-29 15:26:44 +08:00
Ralph Chang 406c160c9f fix(memory): correct dormant formula, remove hard prune, integrate reinforcement
P0.1 - Fix dormant effective age formula:
- Use overlap logic: only apply dormancy to entry's lifetime
- Formula: activeDays + dormantOverlapDays * 0.25
- calculateDormantDays now returns total days (not excess past grace)
- Test: 28 dormant days → 17.5 effective days

P0.2 - Remove hard stale pruning:
- Remove isPrunableByAge from enforcement
- Remove rejected_stale from accounting reasons
- Elimination now by cap competition only

P0.3 - Integrate reinforcement:
- Call reinforceMemory in dedupe absorption path
- Call reinforceMemory in promotion duplicate path
- Update retentionClock on reinforcement

A1 - Retention clock reset on reinforcement

A4 - Fix tests to encode correct formula
2026-04-29 14:55:25 +08:00
Ralph Chang 968aedd5c5 feat(memory): add dormant tracking and reinforcement mechanism
Wave 2c - Dormant workspace tracking:
- Add lastActivityAt to WorkspaceMemoryStore
- Implement calculateDormantDays with 14-day grace period
- Wire dormant days into retention-strength calculation

Wave 3 - Reinforcement:
- Add lastReinforcedSessionID to LongTermMemoryEntry
- Implement reinforceMemory with guards (same-session, 1hr interval, max 6)
- Set retentionClock on memory creation in extractors.ts and plugin.ts

Tests: 219 → 222, all pass
2026-04-29 14:32:39 +08:00
Ralph Chang d4053b2d35 feat(memory): implement retention decay model with strength-based ordering
- Add retention model constants (45-day half-life, 6.0 safety factor)
- Add TYPE_MAX caps (feedback:10, decision:10, project:8, reference:6)
- Add strength calculation: initialStrength × 2^(-age/halfLife)
- Integrate strength-based sorting into enforceLongTermLimits
- Safety-critical entries bypass type caps
- Add fields: retentionClock, reinforcementCount, userImportance, safetyCritical
2026-04-29 14:18:51 +08:00
Ralph Chang 60b9ca75c8 fix(memory): isolate test workspace cleanup 2026-04-28 14:50:30 +08:00
Ralph Chang 8da39c7a9d fix(memory): address quality cleanup audit findings 2026-04-28 14:29:28 +08:00
Ralph Chang 56d7ef9a68 test(memory): add real workspace quality cleanup regression fixture 2026-04-28 14:17:43 +08:00
Ralph Chang 7427221640 feat(memory): add local quality cleanup audit logs 2026-04-28 14:17:17 +08:00
Ralph Chang 9991c95ff6 fix(memory): make quality cleanup migration conservative 2026-04-28 14:15:34 +08:00
Ralph Chang 465edfabf1 fix: unify all memory quality rules in single module 2026-04-28 13:34:33 +08:00
Ralph Chang 6a80f4b047 fix: auto-supersede low-quality compaction memories 2026-04-28 13:29:28 +08:00
Ralph Chang b21347c12b fix: tighten compaction memory candidate prompt 2026-04-28 13:24:43 +08:00
Ralph Chang ffb0477251 fix: unify workspace memory quality gate 2026-04-28 13:21:15 +08:00
Ralph Chang a762e863d1 fix: owner scope in global unowned promotion
Problem: clearPendingMemories() and recordPromotionRejections() would
incorrectly clear or mutate owned entries during global unowned promotion.

Fixes:
1. clearPendingMemories() now respects owner/unowned scope:
   - global clearUnowned only clears unowned same-key entries
   - owned same-key entries are preserved
   - explicit global clear-all-by-key fallback still works

2. recordPromotionRejections() now has includeUnownedOnly option:
   - global unowned rejection only increments/exhausts unowned entries
   - owned same-key entries are preserved

3. Added regression tests:
   - global unowned clear keeps owned same-key entries
   - global unowned rejection only exhausts unowned same-key entries

Tests: 182 pass, 0 fail
2026-04-28 12:27:46 +08:00
Ralph Chang 53aa6d3c31 feat: implement Plan 1 - Critical Stability fixes
Wave 1: Storage and Journal Safety
- Add frozen cache TTL (1h) and size bounds (50 sessions)
- Add pending journal source-aware retention (compaction-only TTL)
- Add inter-process file lock with stale recovery
- Move processLatestUserMessage to first transform (after isSubAgent guard)

Wave 2: Promotion Ownership and Bounded Rejection
- Add pendingOwnerSessionID/pendingMessageID metadata
- Add owner-aware pending journal clearing
- Add explicit/manual bounded retry (max 3 attempts)
- Fix session.deleted cleanup idempotency

Wave 3: Normalize, Security, and Cache Hardening
- Fix load-time write loop (only write on security/migration change)
- Add deterministic sort tie-breaker (createdAt -> id)
- Add Bearer token redaction
- Add processed message cache bounds
- Remove priorityWithFreshness dead code

Tests: 180 pass, 0 fail
2026-04-28 11:59:29 +08:00
Ralph Chang 77d60abf5f refactor: make memory dedupe repo-agnostic 2026-04-27 21:19:42 +08:00
Ralph Chang 11361abc91 test: cover security hardening edge cases 2026-04-27 20:22:09 +08:00
Ralph Chang e071095422 merge: integrate PR #3 security hardening 2026-04-27 20:14:08 +08:00
Ralph Chang 909d6c7767 docs: document concise compatibility limitations 2026-04-27 19:57:21 +08:00
Ralph Chang c697f63c67 fix: cap and prune pending memory journal 2026-04-27 18:54:44 +08:00
Ralph Chang 25b673fbb7 test: add opencode plugin compatibility checks 2026-04-27 18:54:14 +08:00
Steven Choo acaa829df4 feat: implement indirect prompt injection protection and expanded secret redaction 2026-04-27 12:42:20 +02:00
Ralph Chang 3cc6dff7ae feat: add consolidation accounting for workspace memory promotion
P0 implementation with four waves:

Wave 1: Dedup with accounting
- Add dedupeLongTermEntriesWithAccounting()
- Classify exact duplicate, identity duplicate, topic duplicate

Wave 2: Normalization with accounting
- Add normalizeWorkspaceMemoryWithAccounting()
- Chain redaction → migration → enforceLongTermLimitsWithAccounting

Wave 3: Promotion accounting integration
- Update accountPendingPromotions() to use new accounting API
- Add supersededKeys to classification
- Distinguish promoted / absorbed / superseded / rejected

Wave 4: Integration tests
- End-to-end tests covering full pipeline

Bug fixes:
- Fix active vs superseded boundary (superseded entries no longer block promotion)
- Remove unused rejected_duplicate_lower_quality type
- Defer pending journal safety cap (TODO added)

Tests: 135 passing (up from 115)
2026-04-27 16:45:55 +08:00
Ralph Chang ca68b7f55c feat: sharpen compaction memory extraction prompt
Wave 3 of memory quality optimization plan.

- Add good memory examples in buildCompactionPrompt()
- Add bad memory examples to skip (test counts, commit hashes, etc.)
- Add prompt assertions in tests to prevent regression
- Emphasize 'useful if a new agent opens this workspace next week'
2026-04-27 14:40:32 +08:00
Ralph Chang 023589a905 test: add memory quality eval fixtures
Wave 2 of memory quality optimization plan.

- 5 accepted cases: durable facts that should be kept
- 7 rejected cases: noise that should be filtered
- Parser-level regression guard (zero API call)
- All cases pass against current extractors.ts
2026-04-27 14:34:53 +08:00
Ralph Chang 24f807fed0 fix: account for absorbed pending memories
- Add workspaceMemoryIdentityKey() to unify dedup/supersession identity semantics
- Add accountPendingPromotions() to distinguish promoted/absorbed/rejected
- Wire promotion accounting into promotePendingMemories()
- Add clearableKeys.size > 0 guard to prevent journal wipe
- Add regression tests for absorbed duplicate, cap-rejected, all-rejected edge cases

Wave 1 of memory quality optimization plan.
2026-04-27 14:27:43 +08:00
Ralph Chang 4309cb855f fix: promotion accounting, sessionID extraction, and strengthened regression tests
Architecture review fixes:

- Promotion accounting: only clear pending memories that survived
  workspace memory normalization/cap limits. Use retainedKeys from
  the returned normalized store instead of attemptedKeys.

- Shared sessionID extraction: add sessionIDFromEventProperties()
  helper and use it in both session.compacted and session.deleted,
  fixing the previous gap where session.deleted only read info.id.

- Strengthen compaction refresh test: seed workspace memory before
  first transform so firstSystem1 is non-empty, then assert
  refreshed system[1] preserves existing entries AND contains
  promoted memories.
2026-04-27 10:02:18 +08:00
Ralph Chang 2437a9dc71 fix: clarify cache epoch semantics and add regression tests
- Update plugin.ts comments to describe 'session cache epoch' instead
  of misleading 'session lifetime' wording
- Add regression test: same-session explicit memory does not mutate
  frozen system[1]; pending memory goes to ephemeral system[2+]
- Add regression test: session.compacted intentionally refreshes
  system[1] as a new cache epoch boundary (promotes pending memories,
  clears frozen cache, next transform re-renders workspace memory)
- Both tests use one plugin instance with mutable mock client to
  preserve in-memory frozen cache across turns
2026-04-27 09:55:03 +08:00
Ralph Chang e7c7a5cfb2 feat: add durable pending memory journal 2026-04-27 02:20:26 +08:00
Ralph Chang 026c75a5e4 feat: freeze rendered workspace memory snapshot 2026-04-27 01:57:41 +08:00
Ralph Chang f6f35e87c1 feat: release v1.2.2 with multilingual memory hardening 2026-04-27 00:21:18 +08:00
Ralph Chang 3d44269228 fix: resolve remaining architect issues - split feedback keys, remove generic config key, supersession mode
- Split feedbackTopicKey: server-error now separate from port-occupied-environment
- Remove generic plugin.*config entity key (too broad), fall back to canonical dedup
- Feedback topic conflicts now use supersession mode (newer beats longer)
- Add 3 regression tests: English port/split, unrelated configs, feedback supersession

70/70 tests pass.
2026-04-26 16:54:24 +08:00
Ralph Chang a154139b27 fix: P0c/P0d architect review corrections
P0c fixes:
- Chinese file count regex now accepts 個/个 between number and 文件
- Admin PIN short reference (<20 chars) passes via config value allowlist
- Phase snapshot uses semantic window (.{0,20}) instead of absolute position

P0d fixes:
- Feedback key split: 500 error and port issue remain separate entries
- extractEntityKey avoids over-merging unrelated plugin configs
- chooseBetterMemory supports supersession mode (newer beats longer)
- Sort comparator now includes source priority as secondary tie-breaker

New regression tests (11 total):
- Real Admin PIN short reference passes
- Real Chinese 37 個文件 snapshot rejected
- Real pathology Phase 1-4 snapshot rejected
- Feedback 500 vs port entries not collapsed
- Unrelated plugin configs not collapsed
- Supersession prefers newer shorter over older longer

67/67 tests pass.
2026-04-26 16:50:58 +08:00
Ralph Chang 7527765207 feat: storage-time dedupe, stale pruning, and supersession (P0d)
- Project/reference entries dedupe by entity key (bilingual aware)
- Decision entries supersede by topic key (parser formats, template, etc)
- Feedback entries supersede by topic key (same issue, newer fix wins)
- Stale compaction/manual entries pruned after staleAfterDays + 30
- Explicit and feedback entries never age-pruned
- Freshness used as tie-breaker in priority-based trimming
- Adds 10 new tests covering dedup, supersession, staleness, and freshness
2026-04-26 16:37:18 +08:00
Ralph Chang f9acfd6136 fix: parser accepts bracketless format, rejects project snapshots, adds durable-content prompt
P0a: Parser now accepts both - [type] text and - type text formats
P0b: Prompt adds durable-content guidance to avoid session-specific snapshots
P0c: Parser quality gate rejects exact test counts, file counts, phase progress
- Only rejects phase progress when it appears early in the string (snapshot)
- Stable config values with numbers (Admin PIN, Scrypt) still pass
- Adds 7 new tests covering bracketless parsing and snapshot rejection
2026-04-26 16:28:55 +08:00
Ralph Chang 5e9ada6859 fix: replace default compaction template to prevent purple italic rendering
Root cause: OpenCode's default compaction template uses --- separators.
When our plugin adds structured context (Memory candidates: format), the
model strictly follows the template, outputting --- at position 0. The
markdown textmate grammar treats this as YAML frontmatter, applying the
'comment' syntax scope (purple + italic in themes like palenight).

Fix: Set output.prompt in the compacting hook to replace the entire
template with a ---free version. Uses only ## Markdown headings and
explicitly forbids YAML frontmatter, horizontal rules, and delimiter
lines. Preserves context from other plugins by merging output.context.

- Replace compactionContextHeader() with buildCompactionPrompt()
- Set output.prompt instead of pushing to output.context
- Merge existing output.context from other plugins before clearing
- Add 'Instructions' section to the template (per architect review)
- Update tests: verify output.prompt, ---free format, context merging
2026-04-26 15:46:41 +08:00
Ralph Chang 721544e7a8 fix: use plain text labels instead of Markdown headers
- Changed '## Memory Candidates' to 'Memory candidates:' in compaction context
- Changed '## Pending Todos' to 'Pending todos:' in todo rendering
- Updated extractCandidateBlock() to parse plain text format (primary)
- Removed stripXmlTags() function (no longer needed)
- All 42 tests pass

Root cause: Markdown headings (##) render as purple in OpenCode UI,
same issue as XML tags and HTML comments. Plain text labels avoid
all special markup rendering.
2026-04-26 15:13:58 +08:00
Ralph Chang eff0d3784c fix: change compaction output to HTML comment, prevent Markdown rendering issues
Root cause: Model was instructed to output <workspace_memory_candidates> XML
tags in the user-visible compaction summary, causing purple/italic rendering
when combined with --- delimiters in Markdown.

Fixes:
- compactionContextHeader(): Now instructs model to use HTML comment format
  <!-- workspace_memory_candidates ... --> which is hidden from users
- extractCandidateBlock(): New function supports 3 formats:
  1. HTML comment (preferred, hidden from user)
  2. Markdown section (visible but clean)
  3. Legacy XML (backward compatible)
- Added "DO NOT use XML tags" and "DO NOT start with ---" instructions

Tests:
- Verify compaction context header uses HTML comment format
- Test parser accepts all 3 formats (HTML comment, Markdown, legacy XML)
2026-04-26 14:49:38 +08:00