Compress agent skills

2026-05-11 12:05:04 +01:00
parent e145482cd3
commit 7fed91d98b
20 changed files with 1009 additions and 2350 deletions
--- a/dogfood/SKILL.md
+++ b/dogfood/SKILL.md
@@ -1,220 +1,81 @@
 ---
 name: dogfood
-description: Systematically explore and test a web application to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", "test this app/site/platform", or review the quality of a web application. Produces a structured report with full reproduction evidence -- step-by-step screenshots, repro videos, and detailed repro steps for every issue -- so findings can be handed directly to the responsible teams.
-allowed-tools: Bash(agent-browser:*), Bash(npx agent-browser:*)
+description: Systematically explore and test a web application to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", "test this app/site/platform", or review the quality of a web application. Produces a structured report with reproduction evidence.
+allowed-tools: Bash(agent-browser:*) Bash(npx agent-browser:*)
 ---

 # Dogfood

-Systematically explore a web application, find issues, and produce a report with full reproduction evidence for every finding.
+Systematically explore a web application, find issues, and produce a report with reproduction evidence for each finding.

-## Setup
+## Defaults

-Only the **Target URL** is required. Everything else has sensible defaults -- use them unless the user explicitly provides an override.
+Only the target URL is required. Use defaults unless the user overrides them.

-| Parameter | Default | Example override |
-|-----------|---------|-----------------|
-| **Target URL** | _(required)_ | `vercel.com`, `http://localhost:3000` |
-| **Session name** | Slugified domain (e.g., `vercel.com` -> `vercel-com`) | `--session my-session` |
-| **Output directory** | `./dogfood-output/` | `Output directory: /tmp/qa` |
-| **Scope** | Full app | `Focus on the billing page` |
-| **Authentication** | None | `Sign in to user@example.com` |
+| Parameter | Default |
+|-----------|---------|
+| Target URL | Required |
+| Session name | Slugified domain |
+| Output directory | `./dogfood-output/` |
+| Scope | Full app |
+| Authentication | None |

-If the user says something like "dogfood vercel.com", start immediately with defaults. Do not ask clarifying questions unless authentication is mentioned but credentials are missing.
+If the user says "dogfood <url>", start immediately. Ask only when authentication is required but credentials are missing.

-Always use `agent-browser` directly -- never `npx agent-browser`. The direct binary uses the fast Rust client. `npx` routes through Node.js and is significantly slower.
+Use `agent-browser` directly, not `npx agent-browser`, when available.

 ## Workflow

-```
-1. Initialize    Set up session, output dirs, report file
-2. Authenticate  Sign in if needed, save state
-3. Orient        Navigate to starting point, take initial snapshot
-4. Explore       Systematically visit pages and test features
-5. Document      Screenshot + record each issue as found
-6. Wrap up       Update summary counts, close session
-```
-
-### 1. Initialize
+1. Initialize output dirs and copy `templates/dogfood-report-template.md` to `{OUTPUT_DIR}/report.md`.
+2. Open `{TARGET_URL}` in a named `agent-browser` session and wait for `networkidle`.
+3. Authenticate if needed. Ask the user for OTP/email codes.
+4. Take initial annotated screenshot and `snapshot -i`.
+5. Read `references/issue-taxonomy.md` for severity and exploration checklist.
+6. Explore top-level navigation, core workflows, forms, empty/error states, boundaries, modals/dropdowns, console errors, and realistic create/edit/delete flows.
+7. Document each issue immediately as it is found; never batch findings at the end.
+8. Aim for 5-10 well-documented issues. Depth of evidence beats count.
+9. Re-read the report, update summary counts, close the browser session, and summarize findings.

 ```bash
 mkdir -p {OUTPUT_DIR}/screenshots {OUTPUT_DIR}/videos
-```
-
-Copy the report template into the output directory and fill in the header fields:
-
-```bash
 cp {SKILL_DIR}/templates/dogfood-report-template.md {OUTPUT_DIR}/report.md
-```
-
-Start a named session:
-
-```bash
 agent-browser --session {SESSION} open {TARGET_URL}
 agent-browser --session {SESSION} wait --load networkidle
-```
-
-### 2. Authenticate
-
-If the app requires login:
-
-```bash
-agent-browser --session {SESSION} snapshot -i
-# Identify login form refs, fill credentials
-agent-browser --session {SESSION} fill @e1 "{EMAIL}"
-agent-browser --session {SESSION} fill @e2 "{PASSWORD}"
-agent-browser --session {SESSION} click @e3
-agent-browser --session {SESSION} wait --load networkidle
-```
-
-For OTP/email codes: ask the user, wait for their response, then enter the code.
-
-After successful login, save state for potential reuse:
-
-```bash
-agent-browser --session {SESSION} state save {OUTPUT_DIR}/auth-state.json
-```
-
-### 3. Orient
-
-Take an initial annotated screenshot and snapshot to understand the app structure:
-
-```bash
 agent-browser --session {SESSION} screenshot --annotate {OUTPUT_DIR}/screenshots/initial.png
 agent-browser --session {SESSION} snapshot -i
 ```

-Identify the main navigation elements and map out the sections to visit.
+## Evidence Rules

-### 4. Explore
-
-Read [references/issue-taxonomy.md](references/issue-taxonomy.md) for the full list of what to look for and the exploration checklist.
-
-**Strategy -- work through the app systematically:**
-
- Start from the main navigation. Visit each top-level section.
- Within each section, test interactive elements: click buttons, fill forms, open dropdowns/modals.
- Check edge cases: empty states, error handling, boundary inputs.
- Try realistic end-to-end workflows (create, edit, delete flows).
- Check the browser console for errors periodically.
-
-**At each page:**
-
-```bash
-agent-browser --session {SESSION} snapshot -i
-agent-browser --session {SESSION} screenshot --annotate {OUTPUT_DIR}/screenshots/{page-name}.png
-agent-browser --session {SESSION} errors
-agent-browser --session {SESSION} console
-```
-
-Use your judgment on how deep to go. Spend more time on core features and less on peripheral pages. If you find a cluster of issues in one area, investigate deeper.
-
-### 5. Document Issues (Repro-First)
-
-Steps 4 and 5 happen together -- explore and document in a single pass. When you find an issue, stop exploring and document it immediately before moving on. Do not explore the whole app first and document later.
-
-Every issue must be reproducible. When you find something wrong, do not just note it -- prove it with evidence. The goal is that someone reading the report can see exactly what happened and replay it.
-
-**Choose the right level of evidence for the issue:**
-
-#### Interactive / behavioral issues (functional, ux, console errors on action)
-
-These require user interaction to reproduce -- use full repro with video and step-by-step screenshots:
-
-1. **Start a repro video** _before_ reproducing:
+Interactive/behavioral issues need full repro evidence:

 ```bash
 agent-browser --session {SESSION} record start {OUTPUT_DIR}/videos/issue-{NNN}-repro.webm
-```
-
-2. **Walk through the steps at human pace.** Pause 1-2 seconds between actions so the video is watchable. Take a screenshot at each step:
-
-```bash
 agent-browser --session {SESSION} screenshot {OUTPUT_DIR}/screenshots/issue-{NNN}-step-1.png
-sleep 1
-# Perform action (click, fill, etc.)
-sleep 1
-agent-browser --session {SESSION} screenshot {OUTPUT_DIR}/screenshots/issue-{NNN}-step-2.png
-sleep 1
-# ...continue until the issue manifests
-```
-
-3. **Capture the broken state.** Pause so the viewer can see it, then take an annotated screenshot:
-
-```bash
-sleep 2
+# perform action at human pace
 agent-browser --session {SESSION} screenshot --annotate {OUTPUT_DIR}/screenshots/issue-{NNN}-result.png
-```
-
-4. **Stop the video:**
-
-```bash
 agent-browser --session {SESSION} record stop
 ```

-5. Write numbered repro steps in the report, each referencing its screenshot.
-
-#### Static / visible-on-load issues (typos, placeholder text, clipped text, misalignment, console errors on load)
-
-These are visible without interaction -- a single annotated screenshot is sufficient. No video, no multi-step repro:
+Static visible-on-load issues need one annotated screenshot and no video:

 ```bash
 agent-browser --session {SESSION} screenshot --annotate {OUTPUT_DIR}/screenshots/issue-{NNN}.png
 ```

-Write a brief description and reference the screenshot in the report. Set **Repro Video** to `N/A`.
+For every issue, verify it reproduces before collecting evidence, increment `ISSUE-001`, `ISSUE-002`, and write report steps that map to screenshots.

---
+## Guardrails

-**For all issues:**
+- Test as a user. Do not read the target app's source code.
+- Never delete output files mid-session.
+- Do not close and restart the session unless necessary; work forward.
+- Use `snapshot -i` for clickable/fillable elements and `snapshot` for reading content.
+- Check console/errors periodically.
+- During repro video, use `type` instead of `fill` and add small pauses so the video is watchable.
+- Batch independent browser commands when efficient.

-1. **Append to the report immediately.** Do not batch issues for later. Write each one as you find it so nothing is lost if the session is interrupted.
+## Resources

-2. **Increment the issue counter** (ISSUE-001, ISSUE-002, ...).
-
-### 6. Wrap Up
-
-Aim to find **5-10 well-documented issues**, then wrap up. Depth of evidence matters more than total count -- 5 issues with full repro beats 20 with vague descriptions.
-
-After exploring:
-
-1. Re-read the report and update the summary severity counts so they match the actual issues. Every `### ISSUE-` block must be reflected in the totals.
-2. Close the session:
-
-```bash
-agent-browser --session {SESSION} close
-```
-
-3. Tell the user the report is ready and summarize findings: total issues, breakdown by severity, and the most critical items.
-
-## Guidance
-
- **Repro is everything.** Every issue needs proof -- but match the evidence to the issue. Interactive bugs need video and step-by-step screenshots. Static bugs (typos, placeholder text, visual glitches visible on load) only need a single annotated screenshot.
- **Verify reproducibility before collecting evidence.** Before recording video or taking screenshots, verify the issue is reproducible with at least one retry. If it can't be reproduced consistently, it's not a valid issue.
- **Don't record video for static issues.** A typo or clipped text doesn't benefit from a video. Save video for issues that involve user interaction, timing, or state changes.
- **For interactive issues, screenshot each step.** Capture the before, the action, and the after -- so someone can see the full sequence.
- **Write repro steps that map to screenshots.** Each numbered step in the report should reference its corresponding screenshot. A reader should be able to follow the steps visually without touching a browser.
- **Use the right snapshot command.**
-  - `snapshot -i` — for finding clickable/fillable elements (buttons, inputs, links)
-  - `snapshot` (no flag) — for reading page content (text, headings, data lists)
- **Be thorough but use judgment.** You are not following a test script -- you are exploring like a real user would. If something feels off, investigate.
- **Write findings incrementally.** Append each issue to the report as you discover it. If the session is interrupted, findings are preserved. Never batch all issues for the end.
- **Never delete output files.** Do not `rm` screenshots, videos, or the report mid-session. Do not close the session and restart. Work forward, not backward.
- **Never read the target app's source code.** You are testing as a user, not auditing code. Do not read HTML, JS, or config files of the app under test. All findings must come from what you observe in the browser.
- **Check the console.** Many issues are invisible in the UI but show up as JS errors or failed requests.
- **Test like a user, not a robot.** Try common workflows end-to-end. Click things a real user would click. Enter realistic data.
- **Type like a human.** When filling form fields during video recording, use `type` instead of `fill` -- it types character-by-character. Use `fill` only outside of video recording when speed matters.
- **Pace repro videos for humans.** Add `sleep 1` between actions and `sleep 2` before the final result screenshot. Videos should be watchable at 1x speed -- a human reviewing the report needs to see what happened, not a blur of instant state changes.
- **Be efficient with commands.** Batch multiple `agent-browser` commands in a single shell call when they are independent (e.g., `agent-browser ... screenshot ... && agent-browser ... console`). Use `agent-browser --session {SESSION} scroll down 300` for scrolling -- do not use `key` or `evaluate` to scroll.
-
-## References
-
-| Reference | When to Read |
-|-----------|--------------|
-| [references/issue-taxonomy.md](references/issue-taxonomy.md) | Start of session -- calibrate what to look for, severity levels, exploration checklist |
-
-## Templates
-
-| Template | Purpose |
-|----------|---------|
-| [templates/dogfood-report-template.md](templates/dogfood-report-template.md) | Copy into output directory as the report file |
+- `references/issue-taxonomy.md`: severity levels and exploration checklist.
+- `templates/dogfood-report-template.md`: report template.