Skip to content

Verification

Verification is where many agent workflows either become trustworthy or fall apart. A convincing explanation is not verification. A passing command, screenshot, or reproduced behavior is.

Agents do better when they have a feedback loop. Tests, screenshots, and command results are that loop.

  • relevant tests for behavior changes
  • formatting, linting, or type checks
  • screenshots or browser evidence for UI changes
  • logs or command output summaries for debugging work
  • an explicit statement of what was verified versus assumed
## Testing Rules
- Write or update tests when behavior changes.
- Run the relevant test command before finishing.
- For UI changes, capture visual evidence when possible.
- Never present an unrun check as if it passed.
- Say what remains unverified.

For risky logic changes, say so explicitly:

Write the failing tests first, show that they fail, then implement until they pass.

Ask the agent to report back in two buckets:

What it actually ran, checked, or observed.

Anything it could not validate directly, such as environment-specific behavior or an unavailable integration.

Before accepting a change, make sure you can answer:

  • Did the agent run the right checks, not just any check?
  • Did it add tests when the change affected behavior?
  • Did it verify the user-facing outcome, not only compile success?
  • Did it call out what remains uncertain?

For frontend work, screenshots are often the missing piece. A passing build is not enough if the page is visually wrong.

Coding agents fail most dangerously when they sound done before the work is truly validated. Verification turns a plausible change into a trustworthy one.