Skip to content

Evaluation and Acceptance Criteria

Agents perform better when “done” is explicit. They also become much easier to review when the acceptance criteria are visible before the work begins.

  • the intended user-visible behavior
  • constraints that must remain true
  • tests or checks that should pass
  • evidence required for completion
  • any known edge cases or failure paths

Judge the result on:

  • Correctness: does it do the right thing?
  • Safety: did it avoid risky or destructive mistakes?
  • Completeness: are tests, docs, and related files updated?
  • Maintainability: does the change fit the repo’s conventions?

Instead of saying “add export support,” define:

  • supported formats
  • who can use it
  • what errors should happen
  • which tests must be added
  • what command should pass before the task is considered complete

That changes the task from “produce code” to “meet a standard.”