Skip to content

Evaluation and Acceptance Criteria

Agents perform better when “done” is explicit. They also become much easier to review when the acceptance criteria are visible before the work begins.

What Good Acceptance Criteria Include

the intended user-visible behavior
constraints that must remain true
tests or checks that should pass
evidence required for completion
any known edge cases or failure paths

A Practical Review Rubric

Judge the result on:

Correctness: does it do the right thing?
Safety: did it avoid risky or destructive mistakes?
Completeness: are tests, docs, and related files updated?
Maintainability: does the change fit the repo’s conventions?

Example

Instead of saying “add export support,” define:

supported formats
who can use it
what errors should happen
which tests must be added
what command should pass before the task is considered complete

That changes the task from “produce code” to “meet a standard.”