Verification
Verification is where many agent workflows either become trustworthy or fall apart. A convincing explanation is not verification. A passing command, screenshot, or reproduced behavior is.
Quick Take
Section titled “Quick Take”Agents do better when they have a feedback loop. Tests, screenshots, and command results are that loop.
What To Ask For
Section titled “What To Ask For”- relevant tests for behavior changes
- formatting, linting, or type checks
- screenshots or browser evidence for UI changes
- logs or command output summaries for debugging work
- an explicit statement of what was verified versus assumed
Testing Rules Worth Writing Down
Section titled “Testing Rules Worth Writing Down”## Testing Rules
- Write or update tests when behavior changes.- Run the relevant test command before finishing.- For UI changes, capture visual evidence when possible.- Never present an unrun check as if it passed.- Say what remains unverified.TDD Prompt Pattern
Section titled “TDD Prompt Pattern”For risky logic changes, say so explicitly:
Write the failing tests first, show that they fail, then implement until they pass.
A Useful End-Of-Task Pattern
Section titled “A Useful End-Of-Task Pattern”Ask the agent to report back in two buckets:
Verified
Section titled “Verified”What it actually ran, checked, or observed.
Assumed
Section titled “Assumed”Anything it could not validate directly, such as environment-specific behavior or an unavailable integration.
Verification Questions
Section titled “Verification Questions”Before accepting a change, make sure you can answer:
- Did the agent run the right checks, not just any check?
- Did it add tests when the change affected behavior?
- Did it verify the user-facing outcome, not only compile success?
- Did it call out what remains uncertain?
UI Counts Too
Section titled “UI Counts Too”For frontend work, screenshots are often the missing piece. A passing build is not enough if the page is visually wrong.
Why This Matters
Section titled “Why This Matters”Coding agents fail most dangerously when they sound done before the work is truly validated. Verification turns a plausible change into a trustworthy one.