remove goose2 related skills (#9189)

This commit is contained in:
Jack Amadeo
2026-05-13 22:10:42 -04:00
committed by GitHub
parent 6c935dd8a2
commit c89bf0c30c
5 changed files with 0 additions and 964 deletions
-366
View File
@@ -1,366 +0,0 @@
---
name: code-review
description: >-
Senior engineer code review focused on catching issues before they become PR
comments. Reviews only changed lines, categorizes issues by priority, and fixes
them one by one. Includes a focused ui/goose2 refactor-quality pass for
maintainability, decomposition, layering, type hygiene, duplication, and dead
code. Use when the user says "code review", "review my code", "review this
branch", or wants pre-PR feedback.
---
# Pre-PR Code Review
You are a senior engineer conducting a thorough code review. Review **only the lines that changed** in this branch (via `git diff main...HEAD`) and provide actionable feedback on code quality. Do not flag issues in unchanged code.
## Determine Files to Review
**Before starting the review**, identify which files to review by checking:
1. **Run git commands** to check both:
- Committed changes: `git diff --name-only main...HEAD`
- Unstaged/staged changes: `git status --short`
2. **Ask the user which set to review** if both exist:
- If there are both committed changes AND unstaged/staged changes, ask: "I see you have both committed changes and unstaged/staged changes. Which would you like me to review?"
- **Option A**: Committed changes in this branch (compare against main)
- **Option B**: Current unstaged/staged changes
- **Option C**: Both
3. **Proceed automatically** if only one set exists:
- If only committed changes exist → review those
- If only unstaged/staged changes exist → review those
- If neither exist → inform the user there are no changes to review
4. **Get the file list** based on the user's choice:
- For committed changes: Use `git diff --name-only main...HEAD`
- For unstaged/staged: Use `git diff --name-only` and `git diff --cached --name-only`
- Filter to only include files that exist (some may be deleted)
**Only proceed with the review once you have the specific list of files to review.**
## Review Passes
Run these as passes, then consolidate findings before presenting them. A finding should appear once, even if multiple sections support it.
- Use the baseline pass for correctness, regressions, async state, API/backend contracts, accessibility, i18n completeness, CI failures, and obvious cleanup.
- For `ui/goose2` maintainability, use `UI Refactor Quality` as the authoritative pass for decomposition, layering, hooks vs helpers, type hygiene, duplication, naming, module boundaries, and refactor structure.
- Do not duplicate the same underlying concern across the baseline pass and the `UI Refactor Quality` pass. Prefer the `UI Refactor Quality` framing for `ui/goose2` maintainability issues.
### Baseline Safety Pass
#### React Best Practices
- **Components**: Are functional components with hooks used consistently?
- **State Management**: Is `useState` and `useEffect` used properly? Any unnecessary re-renders?
- **Props**: Are prop types properly defined with TypeScript interfaces?
- **Keys**: Are list items using proper unique keys (not array indices)?
- **Hooks Rules**: Are hooks called at the top level and in the correct order?
#### TypeScript Best Practices
- **const vs let vs var**: Is `const` used by default? Is `let` only used when reassignment is needed? Is `var` avoided entirely?
- **Type Safety**: Are types explicit and avoiding `any`? Are proper interfaces/types defined?
- **Type Assertions**: Are type assertions (`as`) used sparingly and only when necessary?
- **Non-null Assertions**: Are non-null assertions (`!`) avoided? They bypass TypeScript's null safety and hide bugs. Use proper null checks or optional chaining instead.
- **React Ref Types**: Are React refs properly typed as nullable (`useRef<T>(null)` with `RefObject<T | null>`)? Refs are null on first render and during unmount.
- **Optional Chaining**: Is optional chaining (`?.`) used appropriately for potentially undefined values?
- **Enums vs Union Types**: Are union types preferred over enums where appropriate?
#### Design System & Styling
- **Component Usage**: Are design system components used instead of raw HTML elements (`<Button>` not `<button>`, `<Input>` not `<input>`)?
- **No Custom Styling**: Is custom inline styling or CSS avoided in favor of design system utilities?
- **Tailwind Classes**: Are Tailwind utility classes used properly and consistently?
- **Tailwind JIT Compilation**: Are Tailwind classes using static strings? JavaScript variables in template literals (e.g., `` `max-w-[${variable}]` ``) break JIT compilation. Use static strings or conditional logic instead (e.g., `condition ? 'max-w-[100px]' : 'max-w-[200px]'`).
- **Theme Tokens**: Are theme tokens used for colors that adapt to light/dark mode (e.g., `text-foreground`, `bg-card`, `text-muted-foreground`) instead of hardcoded colors (e.g., `text-black`, `bg-white`)?
- **Variants**: Could any components benefit from additional variants or properties in the design system?
- **Light and Dark Mode Support**: Are colors working properly in both light and dark modes? No broken colors?
- **Responsive Layout**: Does the layout work correctly at all breakpoints? No broken layout on mobile, tablet, or desktop?
#### Localization (i18n)
- **New Keys**: When new translation keys are added to one locale (e.g., `en`), are all other supported locales updated too? i18next falls back gracefully, but incomplete locales should be flagged.
- **Removed Keys**: When UI text is removed, are the corresponding translation keys removed from all locale files?
- **Raw Strings**: Are user-facing strings wrapped in `t()` calls instead of hardcoded in JSX? Non-translatable symbols (icons, punctuation, HTML entities) are acceptable with an `i18n-check-ignore` annotation.
- **Stable Keys**: Are translation keys stable and domain-specific instead of mirroring incidental English copy?
- **Catch Blocks**: Are user-facing errors routed through translation keys instead of raw English strings in `catch` blocks?
#### Code Simplicity (DRY Principle)
- **Duplication**: Is there any repeated code that could be extracted into functions or components?
- **Complexity**: Are there overly complex functions that could be broken down?
- **Logic**: Is the logic straightforward and easy to follow?
- **Abstractions**: Are abstractions appropriate (not too early, not too late)?
- **Guard Clauses**: Are early-return guards used to keep code shallow and readable?
#### Code Cleanliness
- **Comments**: Are there unnecessary comments explaining obvious code? (Remove them)
- **Console Logs**: Are there leftover `console.log` statements? (Remove them)
- **Dead Code**: Is there unused code, commented-out code, or unused imports?
- **Cross-Boundary Dead Data**: Are there struct/interface fields computed on one side of a boundary (e.g., Rust backend) but never consumed on the other (e.g., TypeScript frontend)? This wastes computation and adds noise to data contracts.
- **Naming**: Are variable and function names clear and descriptive?
- **Magic Numbers**: Are there magic numbers without explanation? Should they be named constants?
#### Animation & UI Polish
- **Race Conditions**: Are there any animation race conditions or timing issues?
- **Single Source of Truth**: Is state managed in one place to avoid conflicts?
- **AnimatePresence**: Is it used properly with unique keys for dialog/modal transitions?
- **Reduced Motion**: Is `useReducedMotion()` respected for accessibility?
#### Async State, Defaults & Persistence
- **Async Source of Truth**: During async provider/model/session mutations, does UI/session/localStorage state update only after the backend accepts the change? If the UI updates optimistically, is there an explicit rollback path?
- **UI/Backend Drift**: Could the UI show provider/model/project/persona X while the backend is still on Y after a failed mutation, delayed prepare, or pending-to-real session handoff?
- **Requested vs Fallback Authority**: Do explicit user or caller selections stay authoritative over sticky defaults, saved preferences, aliases, or fallback resolution?
- **Dependent State Invalidation**: When a parent selection changes (provider/project/persona/workspace/etc.), are dependent values like `modelId`, `modelName`, defaults, or cached labels cleared or recomputed so stale state does not linger?
- **Persisted Preference Validation**: Are stored selections validated against current inventory/capabilities before reuse, and do stale values fail soft instead of breaking creation flows?
- **Compatibility of Fallbacks**: Are default or sticky selections guaranteed to remain compatible with the active concrete provider/backend, instead of leaking across providers?
- **Best-Effort Lookups**: Do inventory/config/default-resolution lookups degrade gracefully on transient failure, or can they incorrectly block a primary flow that should still work with a safe fallback?
- **Draft/Home/Handoff Paths**: If the product has draft, Home, pending, or pre-created sessions, did you review those handoff paths separately from the already-active session path?
#### General Code Quality
- **Error Handling**: Are errors handled gracefully with user-friendly messages?
- **Notifications**: Are success and error messages routed through the app's shared notification primitive instead of one-off notification UI?
- **Loading States**: Are loading states shown during async operations?
- **Accessibility**: Are ARIA labels, keyboard navigation, and screen reader support considered?
- **Performance**: Are there any obvious performance issues (unnecessary re-renders, heavy computations)?
- **Git Hygiene**: Are there any files that shouldn't be committed (env files, etc.)?
- **Unrelated Changes**: Are there any stray files or changes that don't relate to the branch's main purpose? (Accidental commits, unrelated fixes)
### UI Refactor Quality
Use this focused pass for `ui/goose2` changes, especially when the user asks about cleanup, maintainability, decomposition, layering, type hygiene, duplication, dead code, readability, or extensibility.
Keep the focus on behavior-preserving frontend improvement. Favor the repo's existing architecture and patterns over generic frontend advice.
- Review changed code for refactor quality, not just correctness.
- Bias toward detecting maintainability smells even when the code is still functionally correct.
- Review the final shape of the changed code, not whether it is better than what came before.
- Judge changes by whether they leave the code easier to maintain and extend in future work, not just whether they are correct today.
- Ask for approval before making code changes unless the user explicitly asks for fixes.
- Preserve `ui/goose2` boundaries: `ui/`, `hooks/`, `api/`, `lib/`, `stores/`, and `shared/`.
#### Strict Mode
- Any confirmed smell in the changed code must be reported as an `Issue`.
- Review the post-PR shape. Do not grade on a curve.
- Partial extraction, partial deduplication, or partial cleanup does not clear a remaining smell.
- If multiple distinct smells remain in one file, report each distinct responsibility problem separately.
#### Smell Checklist
Before finalizing the review, explicitly ask:
- Is any view or page component still doing too many jobs?
- Is any pure derivation logic still trapped in a component instead of `lib/`?
- Is any repeated async UI workflow ready for a focused hook?
- Are helpers duplicated or living in the wrong layer?
- Are any inline object shapes large enough to deserve a named type?
- Did logic move without moving or adding the right tests?
- Did the refactor preserve feature wiring while improving structure?
#### Size And Decomposition
- Treat these as smell thresholds, not hard limits:
- components around 200 lines
- functions around 40 lines
- files around 300 lines
- JSX nesting around 4 levels
- Treat "many unrelated state variables + many handlers + many effects in one view" as a smell even when the line count is still tolerated.
- Treat a file that owns multiple unrelated responsibilities across data loading, derivation, mutation, and rendering orchestration as a smell unless there is a strong reason to keep it together.
- If a component does more than its name claims, rename it or split it.
- Split by responsibility, not by arbitrary line count.
- When a view contains substantial pure derivation logic, prefer extracting it into `lib/` helpers with direct tests.
- When a view contains substantial effectful workflow logic, prefer extracting it into a focused hook.
- Do not suppress a decomposition `Issue` just because the PR already extracted some responsibilities.
- If the remaining file still does too many jobs, report that as an `Issue`.
- File size alone is not the finding. The finding is the number of unrelated responsibilities still owned by the final file.
#### Naming Reveals Intent
- Use names that describe intent, not implementation trivia.
- Prefer domain terms over generic placeholders like `data`, `value`, or `handler`.
- A helper name should describe what it returns or decides, not how it computes it.
- Rename misleading functions before adding comments to explain them.
#### Layer Discipline
- `ui/`: rendering and light view logic only.
- `hooks/`: glue between React state/effects and lower layers.
- `api/`: backend transport wrappers and DTO adaptation only.
- `lib/`: pure functions and domain helpers only.
- `stores/`: shared feature state only.
- Keep business logic out of render-heavy components when a hook or utility would make it clearer.
- If a component mixes pure transforms and UI event orchestration, split the pure transforms out first.
- Do not move simple local state into a store unless multiple consumers truly need it.
- Keep `api/` free of UI imports, path logic, and unrelated domain policy.
- Keep `lib/` free of React, DOM, `window`, and I/O.
- Prefer shared domain helpers in `lib/` when the same normalization, formatting, or parsing logic appears in multiple modules.
- If logic lives in the wrong layer after the PR, report that as an `Issue` even if the PR reduced the amount of misplaced logic.
#### Module Encapsulation
- Export the minimum surface a module needs to share.
- Keep helpers, constants, and intermediate transforms private unless another module genuinely needs them.
- Treat removing stale exports as a quality improvement.
- If a helper is used in only one module, default to keeping it local.
- If similar helpers appear across two modules, default to extracting them.
#### DRY And Hooks
- Extract shared behavior once the duplication is clear and the shared abstraction is stable.
- Two call sites can be enough when the shared shape is obvious and both call sites become simpler.
- Prefer a hook when the shared logic is stateful or effectful.
- Keep each hook focused on one job.
- Keep hook return shapes stable so callers are not forced to handle shifting contracts.
- Do not use a hook as the default extraction target for oversized components.
- If the logic is pure and React-independent, report extraction to `lib/`.
- If the logic coordinates React state, effects, async actions, or UI event orchestration, report extraction to `hooks/`.
- Treat repeated pure UI derivation logic as helper extraction candidates.
- If repeated effectful orchestration remains in the changed code, report that as an `Issue`.
- If repeated pure transforms remain in the changed code, report that as an `Issue`.
#### Type Hygiene
- Keep canonical cross-feature types in `src/shared/types/`.
- Do not duplicate types across features when one shared type should exist.
- Give inline object types with 3 or more fields a name when they start obscuring the code.
- Prefer `Pick`, `Omit`, and `Partial` over restating shapes by hand.
- Avoid `any`, unchecked `as`, non-null assertions, and string-encoded pseudo-unions when a discriminated union would be clearer.
- Treat repeated or verbose inline object shapes as extraction candidates for named types.
- If verbose or repeated inline shapes remain after the PR, report that as an `Issue`.
#### React And UI
- Prefer straight-line render logic, guard clauses, and early returns over deep nesting.
- Prefer controlled components where practical.
- Use semantic HTML like `<main>`, `<nav>`, `<header>`, and `<aside>`.
- Prefer existing shared UI button primitives over plain `<button>` elements.
- Treat new plain `<button>` usage as a refactor smell unless there is a specific semantic or integration reason.
- If a plain `<button>` is genuinely necessary, it must use `type="button"` in goose2.
- Use `cn()` from `@/shared/lib/cn` for Tailwind class merging.
- Prefer existing shared UI primitives before creating new one-off markup patterns.
- Avoid inline styles except for truly dynamic values.
- Respect reduced-motion behavior when touching animation.
#### Notifications, Localization, And Accessibility
- Route success and error feedback through the app's shared notification primitive.
- Route user-facing Goose UI copy through `react-i18next` in already-migrated surfaces.
- Prefer stable translation keys over inline English strings.
- Avoid raw user-facing strings inside `catch` blocks.
- Add text alternatives for icon-only or color-only affordances.
- Keep interactive semantics explicit with labels, roles, and selected state where applicable.
#### Tauri And Backend Boundaries
- Frontend-to-core communication goes through `SDK -> ACP -> goose`.
- Do not add ad hoc `fetch()` calls for goose core behavior.
- Do not add `invoke()` calls as proxies to goose core behavior; reserve them for desktop-shell concerns.
- Do not call ACP clients directly from UI components; keep backend access in `shared/api/` or `features/*/api/`.
#### Errors, State Drift, And Dead Code
- Handle errors explicitly and close to the source.
- Keep the happy path easy to see.
- In async UI flows, keep local state, persisted state, and backend-confirmed state from drifting apart.
- Delete unused exports, imports, parameters, fields, and commented-out code.
- Remove tests that only protect deleted internals rather than user-visible behavior.
- When logic moves across modules, expect coverage to move with it rather than disappear.
- Treat coverage loss in refactors as suspicious unless the behavior was intentionally removed.
- If behavior-preserving logic moved but coverage did not move with it, report that as an `Issue`.
- Report redundant props, fields, parameters, and intermediate values as `Issues`.
## Review & Fix Process
### Step 0: Run Quality Checks
Before reading any code, run the project's CI gate to establish a baseline. Use **check-only** commands so the baseline never mutates the working tree — otherwise auto-formatters can introduce unstaged diffs and you'll end up reviewing formatter output instead of the author's actual changes.
Avoid `just check-everything` as the baseline in this repo: that recipe runs `cargo fmt --all` in write mode and will modify the working tree. Run the non-mutating equivalents instead:
```bash
cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
(cd ui/desktop && pnpm run lint:check)
./scripts/check-openapi-schema.sh
```
If the project has a stronger pre-push or CI gate than this helper set, run that fuller gate when the review is meant to be PR-ready, but only after confirming it is also non-mutating (or run it from a clean stash). In this repo, targeted tests for the changed area plus the pre-push checks are often the practical follow-up.
Report the results as pass/fail. Any failures are automatically **P0** issues and should appear at the top of the findings list. Do not skip this step even if the user only wants a quick review.
### Step 1: Conduct Review
For each file in the list:
1. Run `git diff main...HEAD -- <file>` to get the exact lines that changed
2. Review **only those changed lines** against the review passes — do not flag issues in unchanged code, but follow changed code paths into surrounding modules when needed to verify the issue
3. For stateful UI or async flow changes, trace the full path end to end: user selection -> local/session state update -> persistence -> backend prepare/set/update call -> failure/rollback path
4. For `ui/goose2` refactors, run the UI Refactor Quality pass before finalizing findings
5. Note the file path and line numbers from the diff output for each issue found
### Step 2: Categorize Issues
Assign each issue a priority level:
- **P0**: Breaks functionality, TypeScript errors, security issues
- **P1P2**: Performance problems, accessibility issues, code quality, unnecessary complexity, poor practices, design system violations
- **P3**: Style inconsistencies, minor improvements, missing type safety, animation issues, theme token usage
- **P4**: Cleanup — console logs, unused imports, dead code, unnecessary comments, unrelated changes
If many high-severity issues exist in a file, assess whether a full refactor would be simpler than individual fixes.
### Step 3: Present Findings
After reviewing all files, provide:
- **Summary**: Total files reviewed, overall quality rating (1-5 stars)
- **Issues**: A single numbered list ordered by priority (P0 first, P4 last). Each issue must follow this exact format:
```
1. Short Issue Title (P0) [Must Fix]
- Description of the issue and why it matters
- User effect if this ships
- Recommended fix
2. Short Issue Title (P3) [Your Call]
- Description of the issue and why it may or may not need addressing
- User effect if this ships
- Recommended fix if the user chooses to act on it
```
Write the user-effect bullet in product language: describe what the user would experience, misunderstand, lose, or be blocked from doing if the issue reached production.
Use a short, descriptive title (36 words max) so issues can be referenced by number (e.g. "fix issue 3").
### Step 3b: Self-Check
Before presenting findings to the user, silently review the issue list three times:
1. **Pass 1**: For each issue, ask — is this genuinely a problem, or could it be intentional/acceptable? Remove false positives.
2. **Pass 2**: For each remaining issue, ask — does the recommended fix actually improve the code, or is it a matter of preference?
3. **Pass 3**: For async state/default-resolution issues, ask — can the UI, persisted state, and backend ever disagree after a failure, fallback, or session handoff?
4. **Pass 4**: For `ui/goose2` refactors, ask — did any confirmed final-shape smell survive in decomposition, layering, hooks/effects, pure helpers, type shapes, duplication, tests, or feature wiring?
After these passes, tag each surviving issue as one of:
- **[Must Fix]** — clear violation, will likely get flagged in PR review
- **[Your Call]** — valid concern but may be intentional or a reasonable tradeoff (e.g. stepping outside the design system for a specific reason). Present it but let the user decide.
Only present issues that survived these passes.
Merge duplicate concerns before presenting findings. If the same underlying issue appears in multiple passes, report it once under the most specific applicable reason. Prefer the `UI Refactor Quality` framing for `ui/goose2` maintainability issues.
Do not include an "Applied Well" section in the review output. If there are no issues, say that clearly and mention any remaining test gaps or residual risk.
### Step 4: Fix Issues
**Before fixing**, ask: "Would you like me to fix these issues in order? Or do you have questions about any of them first? I will fix each issue one by one and ask for approval before moving to the next one."
**When approved**, work through issues one at a time in numbered order (P0 → P4). After each fix:
1. Explain what was changed and why
2. Ask: "Does that look good? Ready to move on to issue [N]?"
3. Wait for confirmation before proceeding to the next issue
**Important**: When adding documentation comments:
- Only add comments for non-obvious things: magic numbers, complex logic, design decisions, or workarounds
- If you call out something as confusing or hard-coded in your review and suggest adding documentation, it's acceptable to add a comment when approved
- Don't add comments that just restate what the code does
Explain each change as you make it. If an issue is too subjective or minor, skip it and note why.
**Remember**: Cleanup tasks like removing comments should always be done LAST, because earlier fixes might introduce new comments that also need removal.
### Step 5: Ready to Ship
Once all issues are fixed, display:
---
**✅ Code review complete! All issues have been addressed.**
Your code is ready to commit and push. Lefthook and CI will run the repo's configured gates when you push.
Next steps: generate a PR summary that explains the intent of this change, what files were modified and why, and how to verify the changes work.
---
-173
View File
@@ -1,173 +0,0 @@
---
name: create-app-e2e-test
description: Create app e2e tests for the Goose desktop app using the Tauri app test driver. Use when the user wants to generate, write, or verify UI tests that run against the live app.
---
# Create App E2E Test
You are an AI agent that creates app e2e tests for the Goose desktop app using the Tauri app test driver.
## Goal
Given a test scenario in natural language, you will:
1. Explore the app using the test driver CLI to discover what's on screen
2. Write a Vitest test file that verifies the scenario using stable selectors
**Do NOT read source code to understand the UI.** Do not read `.tsx`, `.ts`, or `.css` files to find elements. Use `snapshot` to discover what is on the page — that is your only method. The one exception: read source code only when you need to add a `data-testid` attribute.
## Prerequisites
The Tauri app must already be running in dev mode with the app test driver enabled.
## Test Driver CLI
All exploration commands use the test driver client CLI:
```bash
pnpm test-driver <action> [selector] [value]
```
Available commands:
| Command | Description | Example |
|---------|-------------|---------|
| `snapshot` | Get a text DOM of visible elements | `pnpm test-driver snapshot` |
| `getText <selector>` | Get inner text of an element | `pnpm test-driver getText "h1"` |
| `count <selector>` | Count matching elements | `pnpm test-driver count "button"` |
| `click <selector>` | Click an element | `pnpm test-driver click "button"` |
| `fill <selector> <value>` | Fill an input/textarea | `pnpm test-driver fill "textarea" "hello"` |
| `keypress <selector> <key>` | Dispatch a keyboard event | `pnpm test-driver keypress "textarea" Enter` |
| `waitForText <text>` | Wait for text to appear in body (30s default) | `pnpm test-driver waitForText "Success"` |
| `scroll <direction>` | Scroll the page (up/down/top/bottom) | `pnpm test-driver scroll down` |
| `screenshot [path]` | Take a screenshot | `pnpm test-driver screenshot test.png` |
### Snapshot Format
The `snapshot` command returns a simplified text DOM:
```
[e1] input type="text" placeholder="Ask anything..."
[t1] label "Name:"
[e2] button "Send"
[t2] span "Click me"
[t3] h1 "Good afternoon"
```
- `[eN]` — interactive elements (input, button, select, textarea, a) with auto-assigned `data-tid`
- `[tN]` — visible text nodes
- Hidden elements are excluded
- Indentation shows DOM hierarchy
`data-tid` attributes (e.g., `[data-tid='e3']`) are assigned dynamically during each snapshot and **must not** be used in test files — they change between runs.
## Workflow
### Phase 1: Explore
1. Navigate to home first, then `snapshot` to see the current page state.
```bash
pnpm test-driver click '[data-testid="nav-home"]'
pnpm test-driver snapshot
```
Tests always start from the home screen (`useTestDriver()` navigates home in `beforeEach`), so exploration should too.
2. For each element you need to interact with or verify:
- Identify it from the snapshot (e.g., `[e3] button "Send"`)
- Pick a stable selector using the **Element Locating Strategy** below — never use `data-tid`
- Use `count` with that stable selector to verify it matches exactly one element
- Use `getText` to verify text content
- Use `click`/`fill` to perform actions during exploration
3. After each action, run `snapshot` again — the DOM may have changed.
Example exploration session:
```bash
# 1. See what's on the page
pnpm test-driver snapshot
# 2. Pick a selector for the element you need, verify it matches exactly 1
pnpm test-driver count 'textarea[placeholder="Ask anything..."]'
# 3. Interact
pnpm test-driver fill 'textarea[placeholder="Ask anything..."]' "hello world"
# 4. Snapshot again to see the result
pnpm test-driver snapshot
```
### Phase 2: Write the Test
Create a Vitest test file at `tests/app-e2e/<name>.test.ts`.
Use `useTestDriver()` from `tests/app-e2e/lib/setup.ts` to get a shared test driver connection with automatic teardown and screenshot-on-failure. See `tests/app-e2e/chat.test.ts` as a reference:
```typescript
import { describe, it, expect } from "vitest";
import { useTestDriver } from "./lib/setup";
describe("<Feature>", () => {
const testDriver = useTestDriver();
it("does something", async () => {
const text = await testDriver.getText('[data-testid="greeting"]');
expect(text).toContain("Good");
});
});
```
Test driver API methods available in tests:
- `testDriver.snapshot()` → text DOM string
- `testDriver.getText(selector, { timeout? })` → inner text string
- `testDriver.count(selector)` → number of matching elements
- `testDriver.click(selector, { timeout? })` → clicks element
- `testDriver.fill(selector, value, { timeout? })` → fills input/textarea
- `testDriver.keypress(selector, key, { timeout? })` → dispatches keyboard event
- `testDriver.waitForText(text, { selector?, timeout? })` → waits for text to appear (default: body, 30s)
- `testDriver.scroll(direction)` → scrolls page ("up", "down", "top", "bottom")
- `testDriver.screenshot(path)` → saves screenshot
All `timeout` options default to 5 seconds. `waitForText` defaults to 30 seconds.
### Phase 3: Verify the Test
Run the test to make sure it passes:
```bash
pnpm test:app-e2e
```
If it fails, use the test driver CLI to re-explore. The issue could be a wrong selector, an incorrect assertion, or a bug in the app implementation.
## Element Locating Strategy
**Always** verify uniqueness with `count` before using any selector in a test. If count > 1, the test will be flaky.
For each element, find a stable selector using this priority:
1. **`data-testid` (preferred)**: if the element already has a `data-testid`, use `'[data-testid="my-element"]'`.
- Verify with `count` that it's unique
2. **Semantic selector**: combine attributes, hierarchy, or roles to target a specific element.
- `'input[placeholder="Ask anything..."]'` — narrow by attribute value
- `'.sidebar [role="navigation"] a'` — narrow by parent context
- Always verify with `count` that the selector matches exactly 1 element
3. **Add a `data-testid` (last resort)**: if no stable selector exists, add a `data-testid` to the source code.
- Names must be descriptive and include context (e.g., `home-greeting` not `greeting`, `sidebar-new-chat-button` not `button`)
- Only add the `data-testid` attribute — do not change any other source code
- Note the code change so it can be committed alongside the test
**Never** use `data-tid` attributes (`[data-tid='e3']`) in test files — they are assigned dynamically by `snapshot` and change between runs.
## Rules
- One test file per feature area (e.g., `home.test.ts`, `sidebar.test.ts`, `settings.test.ts`)
- Keep test descriptions specific: "shows time-based greeting on home screen", not "home works"
- Always check `count` === 1 for selectors before using them in assertions
- Do not use `snapshot` in test assertions — it's for exploration only
- The test driver automatically waits up to 5 seconds (configurable via `{ timeout }`) for elements to appear before `click`, `fill`, `getText`, and `keypress`
- Use `waitForText` to wait for specific text content to appear (e.g., after submitting a form or waiting for an LLM response)
- If the DOM updates after an action, run `snapshot` again to see the new state before writing assertions
- Do not add comments in test files — the test descriptions and code should be self-explanatory
-88
View File
@@ -1,88 +0,0 @@
---
name: create-pr
description: >-
Create a GitHub PR from the current branch: handle uncommitted changes, generate
a summary, and submit via gh CLI. Use when the user says "create PR", "open PR",
"submit PR", "push PR", or wants to create a pull request.
---
# Create PR
Create a GitHub PR from the current branch: handle uncommitted changes, generate a summary, and submit.
## Step 1: Rebase Reminder
Before doing anything else, remind the user to rebase onto main if they haven't already. Ask if they'd like to proceed or rebase first.
## Step 2: Check for Uncommitted Changes
Run `git status` to check for staged, unstaged, or untracked changes.
- If there are uncommitted changes, show the user what's outstanding and ask if they'd like to commit them before creating the PR.
- If the user says yes, stage the relevant files, draft a concise commit message based on the changes, and commit.
- If there are no uncommitted changes, move on.
## Step 3: Gather Branch Context
Run these commands in parallel to understand the branch:
1. `git log main..HEAD --oneline` to see all commits on this branch.
2. `git diff main..HEAD --stat` to get the list of changed files.
3. `git diff main..HEAD` to understand what changed in each file.
4. `git rev-parse --abbrev-ref HEAD` to get the current branch name.
5. `git status` to check if the branch has been pushed to remote.
## Step 4: Generate PR Title and Summary
**Title:** Generate a concise PR title (under 72 characters) that captures the intent of the change. Use conventional style: lowercase, imperative mood (e.g., "prevent chat list from reordering when renaming sessions").
**Body:** Generate a PR summary with these four sections:
### Section 1: Overview
Start with metadata tags, then a Problem/Solution block:
- `**Category:**` — one of: `new-feature`, `improvement`, `fix`, `infrastructure`
- `**User Impact:**` — one sentence describing what changed from the user's perspective. Write this as a standalone sentence a non-technical stakeholder would understand (e.g., "Users can now create and schedule repeatable tasks directly from the desktop app."). This line is used for project changelogs.
- `**Problem:**` — describe the user-facing confusion, mismatch, or friction this PR addresses.
- `**Solution:**` — explain how the change resolves that UX problem and, if applicable, why the approach was chosen.
Keep Problem + Solution to 2-4 sentences total. Prioritize intent and expected user experience, but include brief high-level implementation rationale when it explains reliability, maintainability, or code quality.
### Section 2: Changes
Wrap this section in a collapsible `<details>` block with the summary "File changes".
Inside, list every changed file. For each file, use the filename as a bold header, then underneath write one or two sentences about what was changed and why. Focus on intent, not implementation details.
Format:
```
<details>
<summary>File changes</summary>
**path/to/file.ts**
What changed and why.
**path/to/other.rs**
What changed and why.
</details>
```
### Section 3: Reproduction Steps
Numbered steps in plain English for how an engineer can see the outcome of this PR. Assume they know how to run the project. Focus on where to look and what they should see.
### Section 4: Screenshots/Demos (for UX changes)
If the PR includes visual changes, include before/after screenshots or a short demo. If there are no visual changes, omit this section entirely.
## Step 5: Push and Create PR
1. Push the branch to remote if it hasn't been pushed yet: `git push -u origin HEAD`
2. Create the PR using `gh pr create` with the generated title and body. Use a HEREDOC for the body to preserve formatting.
3. Output the PR URL as a clickable hyperlink so the user can open it directly.
## Tone
Write from the perspective of a product designer explaining their thinking to engineers. Be clear and concise — just enough to establish intent. They can read the code; your job is to guide their understanding of the "why."
-149
View File
@@ -1,149 +0,0 @@
---
name: edge-case-finder
description: >-
Analyzes branch changes to find edge cases, error states, and untested user
flow paths in UI code. Use when the user says "find edge cases", "what am I
missing", "edge cases", "test my flows", "what could go wrong", or wants to
harden a feature before shipping.
---
# Edge Case Finder
You are a senior QA engineer and UX specialist. Your job is to analyze the code changes on this branch and systematically identify every edge case, error state, and untested user flow path. The user is a product designer who builds the happy path in code and needs help finding what they missed.
## Step 1: Understand What Changed
Start by checking what's on the branch and what's still in the working tree:
```bash
git status --short
git diff --name-only main...HEAD
```
If there are both committed and uncommitted changes, ask the user which to analyze: committed (branch diff), staged, unstaged, or all.
Then read the diffs for the selected change set:
- **Committed changes** (branch vs. main): `git diff main...HEAD -- <file>`
- **Staged changes**: `git diff --cached -- <file>`
- **Unstaged changes**: `git diff -- <file>`
**While reading the diffs, identify:**
- What user-facing feature or flow is being built/modified
- What components, pages, or routes are involved
- What data flows in (props, API calls, user input, URL params)
- What actions the user can take (clicks, form submissions, navigation)
Summarize your understanding in 2-3 sentences before proceeding. Ask the user to confirm you've got it right.
## Step 2: Map the Happy Path
Based on the code, describe the intended happy path flow:
1. **Entry point**: How does the user reach this feature?
2. **Steps**: What is the expected sequence of user actions?
3. **Success state**: What does "it worked" look like?
Present this as a numbered flow the user can verify. Example:
> **Happy path**: User opens settings → clicks "Add workspace" → fills in name → clicks save → sees new workspace in list
Ask: "Is this the happy path you built? Anything I'm missing?"
## Step 3: Find Edge Cases
Now systematically analyze every category below. For each changed file, examine the code for gaps. Consult `references/edge-case-categories.md` for the full checklist.
### 3a. Empty & Missing States
- What happens when there's no data yet? (empty arrays, null responses, first-time user)
- What if a required field is missing or undefined?
- What if the API returns an empty response vs. an error?
- Is there an empty state UI, or does it just show a blank screen?
### 3b. Error & Failure States
- What if the API call fails? (network error, 500, 403, 404)
- What if the user submits invalid input? (too long, wrong format, special characters, XSS attempts)
- What if a mutation fails partway through?
- Are error messages user-friendly, or do they expose technical details?
- What if the user's session expires mid-action?
### 3c. Loading & Async States
- Is there a loading indicator while data fetches?
- What if the response is slow (2+ seconds)?
- Can the user double-click a submit button and trigger duplicate actions?
- What happens if the user navigates away during an async operation?
- Are there race conditions between multiple async operations?
### 3d. Boundary & Overflow
- What happens with extremely long text? (names, descriptions, URLs)
- What if there are 0 items? 1 item? 1,000 items?
- What about numeric limits? (negative numbers, zero, MAX_INT)
- Does the layout break with unusual content sizes?
- What if pagination or infinite scroll hits the last page?
### 3e. User Input Variations
- Can the user paste content? (formatted text, images, huge strings)
- What about keyboard-only navigation? (Tab, Enter, Escape)
- What if the user types while a debounced search is pending?
- Copy-paste of multi-line content into single-line fields?
- Emoji, RTL text, Unicode edge cases in text inputs?
### 3f. Navigation & State Persistence
- What if the user hits the back button mid-flow?
- Does refresh preserve the current state or reset it?
- What happens with deep linking — can someone bookmark this URL and come back?
- What if the user opens the same flow in two tabs?
- Does the URL update to reflect the current state?
### 3g. Permissions & Access
- What if the user doesn't have permission for this action?
- What if the resource they're trying to access was deleted by someone else?
- What if they're logged out while the page is still open?
- Does the UI hide actions the user can't perform, or show them disabled?
### 3h. Responsive & Accessibility
- Does the layout work at mobile, tablet, and desktop widths?
- Are interactive elements reachable via keyboard?
- Do screen readers get meaningful labels?
- Is there sufficient color contrast? Does it work in dark mode?
- Are touch targets large enough on mobile (44x44px minimum)?
## Step 4: Present Findings
Organize findings by severity:
**Critical** — The user will definitely hit this in normal usage
- Example: "No loading state while workspace list fetches — user sees blank screen for 1-2s"
**Likely** — Common scenarios that aren't handled
- Example: "No error message if workspace name already exists — form silently fails"
**Defensive** — Less common but worth handling
- Example: "No character limit on workspace name field — 500+ chars breaks card layout"
**Hardening** — Polish items for production readiness
- Example: "Back button from workspace detail doesn't return to the same scroll position in the list"
For each finding, include:
1. **What the edge case is** (one sentence)
2. **Where in the code** it applies (file:line)
3. **What the user would experience** if not handled
4. **Suggested fix** (concrete, 1-2 sentences)
## Step 5: Prioritize with the User
After presenting findings, ask:
"Which of these would you like to fix now? I'd recommend starting with the **Critical** items. Want me to work through them in order, or is there a specific one you want to tackle first?"
When approved, fix each issue one at a time:
1. Make the change
2. Explain what was done
3. Ask for approval before moving to the next
## Step 6: Verify
After all fixes are applied, re-scan the changed files for any new edge cases introduced by the fixes themselves. Report either:
- "No new edge cases found — you're good to ship."
- "Found N new items introduced by the fixes" → list them and offer to address.
@@ -1,188 +0,0 @@
# Edge Case Categories — Full Reference
This document provides an exhaustive checklist for each edge case category. Use it when the SKILL.md categories need deeper investigation.
## Empty & Missing States
### Data states
- [ ] Empty array / collection (0 items)
- [ ] Single item in collection
- [ ] Null or undefined response from API
- [ ] API returns success but with empty body
- [ ] Missing optional fields in response object
- [ ] First-time user with no historical data
- [ ] Deleted data that's still referenced (dangling references)
### UI states
- [ ] Empty state component exists and is meaningful (not just blank)
- [ ] Empty state has a call-to-action (not a dead end)
- [ ] Skeleton/placeholder shown while determining if data exists
- [ ] Search with no results shows helpful message
## Error & Failure States
### Network errors
- [ ] Complete network failure (offline)
- [ ] Timeout (slow response > 10s)
- [ ] Intermittent connectivity (request starts, connection drops)
### API errors
- [ ] 400 Bad Request — validation errors shown to user
- [ ] 401 Unauthorized — redirect to login
- [ ] 403 Forbidden — show access denied, not a broken page
- [ ] 404 Not Found — resource was deleted or never existed
- [ ] 409 Conflict — concurrent edit by another user
- [ ] 422 Unprocessable Entity — semantic validation failure
- [ ] 429 Too Many Requests — rate limiting
- [ ] 500 Internal Server Error — generic fallback error UI
- [ ] 503 Service Unavailable — maintenance mode
### Client errors
- [ ] JavaScript runtime errors caught by error boundary
- [ ] Failed to parse JSON response
- [ ] LocalStorage/SessionStorage full or unavailable
- [ ] Third-party script fails to load (analytics, fonts, CDN)
### Recovery
- [ ] Retry mechanism for transient failures
- [ ] User can dismiss error and try again
- [ ] Error state doesn't block the entire page
- [ ] Partial failure (some items load, some don't)
## Loading & Async States
### Timing
- [ ] Loading spinner/skeleton for operations > 300ms
- [ ] Optimistic UI for instant-feel interactions
- [ ] Progress indicator for multi-step operations
- [ ] Timeout handling for long-running operations
### Race conditions
- [ ] Double-click on submit button
- [ ] Rapid toggle on/off (debouncing)
- [ ] Navigation during pending request (abort controller)
- [ ] Multiple overlapping search queries (only use latest result)
- [ ] Stale data after background tab becomes active again
### State management
- [ ] Loading state resets properly after error
- [ ] Success state shown after async completion
- [ ] Form data preserved if submission fails
## Boundary & Overflow
### Text
- [ ] 0 characters (empty string)
- [ ] 1 character
- [ ] Maximum allowed length
- [ ] 10x maximum length (what if validation fails?)
- [ ] Multi-line text in single-line display
- [ ] Text with only whitespace
- [ ] Very long single word (no natural break point)
### Numbers
- [ ] 0
- [ ] Negative numbers
- [ ] Decimal numbers (0.1 + 0.2 !== 0.3)
- [ ] Very large numbers (display formatting)
- [ ] NaN or Infinity from calculations
### Collections
- [ ] 0 items
- [ ] 1 item (singular vs. plural labels)
- [ ] Exactly at page size boundary (e.g., 20/20)
- [ ] Page size + 1 (triggers pagination)
- [ ] Thousands of items (virtual scrolling needed?)
- [ ] Items added/removed while user is viewing list
### Layout
- [ ] Content wider than container
- [ ] Content taller than viewport
- [ ] Image fails to load (broken image icon vs. fallback)
- [ ] Dynamic content pushes layout (CLS)
## User Input Variations
### Text input
- [ ] Paste from clipboard (plain text)
- [ ] Paste from rich text source (Word, Google Docs)
- [ ] Paste HTML/markdown
- [ ] Emoji characters (multi-byte)
- [ ] RTL languages (Arabic, Hebrew)
- [ ] CJK characters (Chinese, Japanese, Korean)
- [ ] Mathematical symbols, currency symbols
- [ ] Control characters (tab, newline)
- [ ] Zero-width characters (invisible but present)
- [ ] Script injection attempts (`<script>`, `onclick=`)
### Interaction patterns
- [ ] Keyboard-only flow (Tab, Shift+Tab, Enter, Escape, Space)
- [ ] Mouse + keyboard switching mid-flow
- [ ] Touch on desktop (Surface, iPad with keyboard)
- [ ] Drag and drop (if applicable)
- [ ] Right-click / context menu
- [ ] Browser autofill vs. manual entry
## Navigation & State
### Browser behavior
- [ ] Back button preserves state
- [ ] Forward button works after going back
- [ ] Refresh preserves or gracefully resets state
- [ ] Deep link (sharing URL) loads correct state
- [ ] Bookmark + return later
- [ ] Open in new tab
### Multi-tab / multi-window
- [ ] Same flow open in two tabs
- [ ] Data modified in one tab while other is stale
- [ ] Session expired in background tab
### Route changes
- [ ] URL parameters validated (malformed values)
- [ ] Missing required URL parameters
- [ ] Navigating to a resource that was deleted
- [ ] Hash/fragment navigation
## Permissions & Access Control
### Authorization
- [ ] UI hides/disables actions user can't perform
- [ ] Server validates permissions (not just UI hiding)
- [ ] Permission change while user has page open
- [ ] Role escalation — user's role changes mid-session
### Resource lifecycle
- [ ] Resource deleted while user is editing it
- [ ] Resource moved while user has it open
- [ ] Concurrent edits by multiple users
- [ ] Optimistic locking / last-write-wins handling
## Responsive & Accessibility
### Responsive
- [ ] Mobile (320px - 480px)
- [ ] Small tablet (481px - 768px)
- [ ] Large tablet / small desktop (769px - 1024px)
- [ ] Desktop (1025px - 1440px)
- [ ] Large desktop (1441px+)
- [ ] Portrait vs. landscape orientation
### Accessibility
- [ ] All interactive elements focusable via keyboard
- [ ] Focus order matches visual order
- [ ] Focus trap in modals/dialogs
- [ ] Focus returns to trigger element when dialog closes
- [ ] ARIA labels on icon-only buttons
- [ ] ARIA live regions for dynamic content updates
- [ ] Color contrast ratio >= 4.5:1 (text) / 3:1 (large text)
- [ ] Not relying solely on color to convey information
- [ ] Reduced motion respected (prefers-reduced-motion)
- [ ] Screen reader announces state changes
- [ ] Skip navigation link present (if applicable)
### Touch
- [ ] Touch targets >= 44x44px
- [ ] No hover-only interactions (tooltips need tap alternative)
- [ ] Swipe gestures have button alternatives
- [ ] No tiny close buttons on mobile