Stock Windows 10/11 ships C:\Windows\System32\bash.exe (the WSL
launcher) as the first match for `where bash`. WSL's bash cannot
execute Windows-style script paths, so when Git Bash is installed
outside the two standard system locations -- specifically the
per-user "Only for me" Git for Windows installer
(%LOCALAPPDATA%\Programs\Git) or a Scoop install
(%USERPROFILE%\scoop\apps\git\current\usr\bin) -- run-hook.cmd
silently fails: WSL prints "Windows Subsystem for Linux must be
updated", the script returns 0, and Superpowers' SessionStart
bootstrap is never injected. From the user's perspective skills
auto-trigger inconsistently or not at all, with no surfaced error.
Add explicit probes for both locations between the existing system-
wide Git for Windows checks and the `where bash` fallback. Also add
a comment to the fallback documenting the WSL-launcher trap so future
maintainers understand why the explicit probes must come first.
Verified on a Windows 11 VM (dockur/windows 11, Git Bash 2.x, Node
22):
- System Git present: existing probe still matches (no regression)
- System Git absent, per-user Git present via junction: new probe
matches, hook produces valid 6422-byte JSON, exit 0
- All Git probes absent: confirmed WSL trap fires
("Windows Subsystem for Linux must be updated") and the hook exits 0
silently, demonstrating the original bug
Existing tests/hooks/test-session-start.sh still passes on macOS (7/7).
Reported by @ytchenak in #1607.
Co-authored-by: ytchenak <ytchenak@users.noreply.github.com>
Closes#1607.
On Windows + Git Bash, the SessionStart hook prints a confusing
diagnostic at every startup ("printf: write error: Permission denied")
when Claude Code closes the hook's stdout pipe before the printf has
finished writing. The hook still runs to completion and context still
gets injected, but the diagnostic surfaces every session because
Git Bash's printf reports EPIPE as "Permission denied" (not "Broken
pipe" like Linux) and our `set -euo pipefail` lets that error escape.
Piping each printf through `cat` makes the external cat process the
recipient of any SIGPIPE / EPIPE. cat's failure does not propagate to
the parent bash under pipefail because cat is the last command in the
pipeline and exits cleanly when the pipe stays open long enough to
hold the data. On macOS/Linux the cat passthrough is transparent (no
behavior change, no measurable cost).
Verified:
- Existing tests/hooks/test-session-start.sh: 7/7 pass on macOS
- Manual run on Windows 11 + Git Bash 5.2 + Node 22 produces valid JSON,
clean stderr, and exit 0
- JSON output is byte-identical to the unpatched hook
Reported by @silvertakana in #1612, attribution preserved in the
Co-authored-by trailer below — this is the same fix shape the original
PR proposed.
Co-authored-by: silvertakana <silvertakana@users.noreply.github.com>
Closes#1612.
The "Signals You're Doing It Wrong" bullet in systematic-debugging/SKILL.md
contains the literal token Claude Code's runtime scans for in tool result
bodies. Every Skill-tool invocation of this skill caused the harness to
inject a spurious system-reminder claiming the user requested deeper
reasoning, silently bumping every session into extended thinking.
Replace the bullet's spelling so the contiguous letter sequence the scanner
matches is broken with a hyphen. The signal text remains recognizable to
the agent and the documented action ("Question fundamentals, not just
symptoms") is unchanged.
Fixesobra/superpowers#1283
Issue #1134: agents reading visual-companion.md see bare commands like
`scripts/start-server.sh`, correctly identify the plugin install
directory, then look for `<plugin>/scripts/start-server.sh` instead of
`<plugin>/skills/brainstorming/scripts/start-server.sh`. The file
doesn't exist at the plugin-rooted path, so the agent concludes the
visual companion isn't available and falls back to text-only
brainstorming.
Multiple independent reproductions in the issue thread, plus one user's
agent self-reported: "I assumed the scripts folder was in the root
directory of the plugin, it didn't realize it could have been talking
about the skill folder itself."
Change all `scripts/<file>` references in visual-companion.md to
`skills/brainstorming/scripts/<file>`. Agents that correctly identify
the plugin root will now join to the right path.
Closes#1134.
The test had drifted behind three server implementation changes and no
longer ran against the actual server:
- Server entrypoint renamed from server.js to server.cjs; the test still
invoked node on server.js and failed with MODULE_NOT_FOUND.
- Server state moved to a state/ subdirectory (state/server-info,
state/server.pid); the test still waited on .server-info and wrote
.server.pid at the session root.
- Owner-PID startup validation now keeps the server running when the
owner PID is dead at startup: it logs owner-pid-invalid, disables
owner monitoring, and falls back to the idle timeout. The test still
expected the server to self-terminate within 60s of a dead-at-startup
owner.
Update file/path references to match the current server, and rewrite
the dead-at-startup test to assert the current behavior: server
survives, log contains owner-pid-invalid, log does not contain a
spurious "owner process exited" line.
Verified locally: 9 passed, 0 failed, 3 skipped (Windows-only).
Matches the style used by the spec-reviewer-prompt.md and
code-quality-reviewer-prompt.md call sites, which already use square
brackets ([VAR] or [VAR — description]). No semantic change — these
placeholders are filled in by the controller; nothing programmatic
substitutes them.
Two problems with the SDD reviewer prompts on dev:
- spec-reviewer-prompt.md never received a git range, so the
general-purpose subagent had to crawl the entire codebase to find what
changed. Reporter measured 20-33 minute spec reviews on simple tasks
(#1538).
- Neither reviewer prompt told the subagent that review is read-only.
A spec reviewer running `git checkout <parent-sha>` for historical
comparison silently detached HEAD on the controller's branch, then
subsequent task commits accumulated on the detached HEAD and were
effectively orphaned (#1543, reproduced independently in #1543's
thread).
Add a Git Range to Review section to spec-reviewer-prompt.md that
mirrors the one code-reviewer.md already has, plus a Read-Only Review
section in both reviewer prompt templates stating the principle: do
not mutate the working tree, the index, HEAD, or branch state. Allow
inspecting other revisions via a separate temporary worktree, so the
read-only rule does not block legitimate historical comparison.
Closes#1538.
Closes#1543.
Replace Claude-Code-specific tool names in skill prose, prompt
templates, and OpenCode-facing docs with action-language descriptions
that resolve to each runtime's native tool via the per-platform refs.
Changes by category:
- Prose mentions ("Use TodoWrite to track...", "Use Task tool with
general-purpose type") → action language ("Track each item as a
todo", "Dispatch a general-purpose subagent")
- Prompt template headers (6 files): "Task tool (general-purpose):"
→ "Subagent (general-purpose):" — preserves the type information
without naming Claude Code's specific dispatch tool
- DOT flowchart node labels: "Invoke Skill tool" → "Invoke the
skill"; "Create TodoWrite todo per item" → "Create a todo per
item"
- OpenCode INSTALL.md and docs/README.opencode.md: replace the old
"TodoWrite → todowrite, Task → @mention" mapping (which both
taught a vocabulary skills no longer use AND was wrong about
@mention being a real OpenCode syntax) with an action-language
mapping verified against the installed OpenCode CLI's tool
inventory.
The platform-tools refs landed in Phase B already document each
runtime's resolution; skills now speak in the actions those refs
map. Tool names that genuinely belong only in the per-platform
dispatch section ("In Claude Code: Use the `Skill` tool") and the
Claude-Code-specific Bash run_in_background flag note in
visual-companion remain — those are intentional carve-outs.
Misc platform/runtime statements and adjacencies that don't fit the
prose, config-ref, README-ordering, or tool-vocabulary buckets:
- visual-companion frame template: rename CSS/HTML id #claude-content
→ #frame-content. The id is purely styling — nothing external
references it. The brainstorm-server test that asserted the old
string is updated in lockstep.
- visual-companion launch instructions: add a Copilot CLI section
alongside Claude Code, Codex, and Gemini CLI; combine the Claude
Code (macOS / Linux) and (Windows) sections so heading style
matches the other (non-OS-qualified) platforms.
- visual-companion: "Use Write tool" → "Use your file-creation tool"
for the cat/heredoc warning. The prohibition is what's load-
bearing, not the tool name.
- executing-plans/SKILL.md: list all subagent-capable runtimes
(Claude Code, Codex CLI, Codex App, Copilot CLI, Gemini CLI) and
point at the per-platform tool refs as the source of truth.
- executing-plans/SKILL.md: relative path "using-superpowers/
references/" → "../using-superpowers/references/" to resolve
correctly from the executing-plans/ directory.
No bundled spec doc here — Phase D was scope-extension work that
took place across rounds, with no standalone spec authored.
Quickstart link list and the per-harness install sub-sections both
reorder to strict alphabetical:
Claude Code, Codex App, Codex CLI, Cursor, Factory Droid,
Gemini CLI, GitHub Copilot CLI, OpenCode
Three blocks moved (Codex App swaps with Codex CLI; Cursor moves up
two slots; GitHub Copilot CLI moves up one). Claude Code stays first
by alphabetical chance.
Each install sub-section's content is byte-identical pre/post —
only the positions change. Quickstart anchors verified against the
new heading order.
Two structural changes:
1. Generalize CLAUDE.md-specific guidance:
- "Project-specific conventions (put in CLAUDE.md)" → "(put in
your instructions file)" in writing-skills/SKILL.md
- "(explicit CLAUDE.md violation)" → "(explicit instruction-file
violation)" in receiving-code-review/SKILL.md
- The instruction-priority list in using-superpowers/SKILL.md
stays inclusive (CLAUDE.md, GEMINI.md, AGENTS.md) — that's
load-bearing, not a substitution opportunity.
2. Per-platform tool reference files at skills/using-superpowers/
references/{claude-code,codex,copilot,gemini}-tools.md. Each ref
documents:
- The runtime's preferred instructions file (CLAUDE.md, AGENTS.md,
GEMINI.md, etc.) and how it loads
- The runtime's personal-skills directory + cross-runtime
~/.agents/skills/ path where applicable
- Action-language → tool-name mapping table
Tool names and table content reflect the source-verified state from
direct inspection of openai/codex, google-gemini/gemini-cli,
sst/opencode, and the installed @github/copilot package. Filenames
and behaviors are sourced from each runtime's official docs.
Files in this commit also pick up later-phase changes that
accumulated on the same files (using-superpowers/SKILL.md "How to
Access Skills" overhaul, action-language flowchart, refs' final
table content). The bundled spec records original scope.
Replace generic third-person "Claude" with "agents" / "your agent"
forms across active skill prose, the README intro, and the vendored
anthropic-best-practices.md reference. Carve-outs preserved:
historical attribution paths, the "Variant C: Claude.AI Emphatic
Style" example label, model identifiers (Haiku/Sonnet/Opus), and the
"In Claude Code:" per-platform skill-dispatch list.
Coined-term rename: "Claude Search Optimization (CSO)" → "Skill
Discovery Optimization (SDO)" in writing-skills/SKILL.md.
Files in this commit also pick up later-phase changes that
accumulated on the same files (dispatching-parallel-agents code-
example transformation, writing-skills numbering and path fixes).
The bundled spec at docs/superpowers/specs/ records the original
scope and the carve-outs.
README.md gets only its prose change here; the alphabetization
lands in Phase C's commit.
- evals/README.md, evals/CLAUDE.md: fix uv install command from
'uv sync --dev' to 'uv sync --extra dev'. Drill's pyproject.toml
uses [project.optional-dependencies], so --dev is a no-op for
pytest/ruff/ty; --extra dev is the correct invocation.
- tests/claude-code/run-skill-tests.sh: drop test-requesting-code-review.sh
from integration_tests array (file deleted earlier in this branch).
- tests/claude-code/README.md: replace test-requesting-code-review.sh
section with test-worktree-native-preference.sh (the worktree test
is kept; the code-review test was lifted into drill).
- docs/testing.md, CLAUDE.md: remove "Copilot CLI" from the harness
list. evals/backends/ has claude*, codex, gemini configs but no
copilot.yaml, so the claim was unsupported.
Adversarial review credit: reviewer #2 found four legitimate issues
(uv-sync, run-skill-tests stale ref, README stale ref via #1, and
Copilot CLI fabrication); reviewer #1 found two distinct issues
(run-skill-tests + tests/claude-code/README.md). Reviewer #2 wins
this round.
- docs/testing.md split into Plugin tests + Skill behavior evals.
Plugin tests section enumerates the bash tests that survive
(kept by drill-coverage analysis or as describe-skill tests).
- CLAUDE.md adds Eval harness section pointing at evals/.
- README.md Contributing section mentions evals/ alongside tests/.
- .gitignore adds evals/{results,.venv,.env} as belt-and-suspenders
(evals/.gitignore covers these locally; root-level entries help
tooling that does not recurse into nested ignore files).
- RELEASE-NOTES.md: note that test-requesting-code-review.sh and
test-document-review-system.sh were lifted into drill scenarios
on 2026-05-06; references are preserved as dated artifacts.
- docs/superpowers/plans/2026-03-23-codex-app-compatibility.md:
note that tests/skill-triggering/ was lifted into drill scenarios
on 2026-05-06; the run-all.sh reference is a dated artifact.
Subagent second-pass scrub confirmed no other active references in
the tree (excluding evals/ and the spec/plan for this work itself).
- test-worktree-native-preference.sh: drill covers PRESSURE phase only;
RED + GREEN baselines have no drill counterpart and are kept so
the RED-GREEN-REFACTOR validation remains rerunnable end-to-end.
- test-subagent-driven-development-integration.sh: drill covers the
YAGNI subset (forbidden exports + reviewer-as-gate). Bash adds
>=3 commits, >=2 subagent dispatches, TodoWrite usage, test file
existence check, and token-budget telemetry. Kept until drill
scenario covers those or they are retired.
- test-subagent-driven-development.sh: tests agent's ability to
*describe* SDD (string matches against expected keywords). Drill
scenarios test behavior, not description-recall. Kept by design.
Subagent verification recorded in commit messages of subsequent
deletions; gap analyses driving these annotations are also in the
verification subagent reports for the gating sweep.
Subagent verification: every bash assertion (skill invocation,
subagent dispatch, SQL injection flagged, credential handling
flagged, no merge approval) maps to drill verify checks. Drill is
stricter: bundles severity (Critical/Important) into the same
criteria as the finding itself (bash split severity into a separate
test). Setup parity covered (src/db.js with string concat + identity
hash, two commits).
The drill scenario header explicitly says it is the
"cross-harness, semantically-judged replacement for the bash test."
Subagent verification: every bash assertion (TODO in Requirements
section flagged, "specified later" deferral flagged, Issues section
present, did-not-approve verdict) maps to drill verify.criteria
entries. Setup parity covered by setup.assertions (test-feature-design.md
exists with TODO + 'specified later' content). Drill is stricter:
asserts tool-called Agent (subagent dispatch) which the bash test
did not check.
The bash test had ZERO output assertions — it just ran claude -p
and printed token usage. Drill's scenarios are strictly more
rigorous:
go-fractals: skill-called SDD + tool-called Agent + go test ./...
passes + cmd/fractals/main.go exists + >=4 commits + LLM criteria
verifying real SDD workflow.
svelte-todo: skill-called SDD + tool-called Agent + npm test passes
+ playwright e2e passes + package.json + svelte.config.js or
vite.config.ts + >=4 commits + LLM criteria.
design.md and plan.md are byte-identical between bash fixtures and
drill fixtures (evals/fixtures/sdd-{go-fractals,svelte-todo}/).
Drill's setup helper (scaffold_sdd_*) forces git init -b main
(stricter than bash's reliance on init.defaultBranch). The
.claude/settings.local.json from bash scaffold.sh is unnecessary
for drill since permissions are managed via backend YAML.
Subagent verification: SAFE TO DELETE for both.
Subagent verification: every bash assertion (Skill tool invoked +
specific skill name 'subagent-driven-development' loaded after the
agent describes it conversationally in turn 1) maps to the drill
scenario's skill-called assertion + criteria paragraph requiring
the skill to fire in direct response to the second user message.
Drill additionally asserts tool-called Agent (subagent dispatch)
which is stricter than the bash test.
Other runners in tests/explicit-skill-requests/ (haiku, multiturn,
extended-multiturn) and their prompt files are preserved — they
have no drill coverage and exercise different behaviors.
Subagent verification confirmed each prompt's intent matches its
corresponding drill scenario's turns[].intent verbatim, and each
scenario has both a deterministic skill-called assertion and a
semantic LLM criterion confirming the matching skill was loaded
(actually a stronger check than the bash test, which only confirms
the skill fires anywhere in the stream).
All 6 prompts deleted. The runner had no remaining prompts to drive,
so run-test.sh and run-all.sh deleted as well.
These backends only read SUPERPOWERS_ROOT via engine.py/setup.py's
os.environ access, which the new cli.py default helper supplies
automatically. claude*.yaml keep SUPERPOWERS_ROOT in required_env
because they interpolate ${SUPERPOWERS_ROOT} into --plugin-dir args.
Adds _set_superpowers_root_default() to drill/cli.py, called at
module import after load_dotenv(). PROJECT_ROOT resolves to evals/
post-lift; its parent is the superpowers repo root, which is the
correct value for SUPERPOWERS_ROOT.
Existing env values are respected as overrides via os.environ.setdefault.
Tests:
- helper sets default when var is unset
- helper does not override when var is already set
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.
The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.
Source SHA recorded at evals/.drill-source-sha for divergence
detection.
15-task implementation plan derived from the design spec at
docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md.
Each task is bite-sized (2-5 min steps) with exact commands, exact
file paths, and exact code where required. Subagent verification
gates per the spec are written out as concrete prompt templates.
Self-review:
- Spec coverage: every spec section maps to a task
- Placeholder scan: no TBD/TODO/placeholder/fill-in-later language
- Type consistency: helper named _set_superpowers_root_default
consistently; drill SHA recorded in evals/.drill-source-sha
consistently
Two parallel reviewers raised legitimate issues against the lift-drill-
into-evals spec. Updates:
- Coverage map for tests/explicit-skill-requests/ corrected: 6 run-*.sh
scripts + prompts, not "2 scenarios cover all". Several scripts
(Haiku, multi-turn, please-use-brainstorming, use-systematic-debugging)
have no drill counterpart and stay.
- tests/claude-code/test-subagent-driven-development.sh marked as
meta/documentation test (asks agent to describe SDD); no drill
scenario covers description tests; defaults to keep.
- Path-defaults section now shows verified evidence: PROJECT_ROOT
resolves to evals/ post-move; only claude*.yaml substitute
${SUPERPOWERS_ROOT} in args (codex/gemini use it via os.environ
in pre-run hooks); helper invocation order specified (after
load_dotenv, before click definitions).
- Step 2 copy uses explicit rsync excludes (.git, .venv, results,
.env, __pycache__, *.egg-info, .private-journal); checksum-level
verification rather than file-count.
- Drill SHA recorded at copy time in commit message and
evals/.drill-source-sha for divergence detection.
- evals/tests/ pytest suite added to verification protocol.
- Reference scrub list expanded: RELEASE-NOTES.md,
docs/superpowers/plans/, .codex-plugin/ (corrected from .codex/),
lefthook.yml. Excluded dirs called out (node_modules/, .venv/,
evals/).
- Historical plan docs / RELEASE-NOTES handling: annotate, don't
rewrite.
- evals/lefthook.yml move documented (drill ships its own;
contributors run cd evals && lefthook run pre-commit manually).
- PR description checklist includes archival action item for
obra/drill post-merge.
False finding rejected: svelte-todo fixture is complete on disk
(design.md + plan.md + scaffold.sh present); reviewer #1#3 dropped.