Methodology principles
- Practical over aspirational
- Evidence over sales claims
- Product usability over vague readiness language
- Actionability over long reports
Scores should be based on the surfaces that matter in real agent workflows: discovering capabilities, authenticating, taking safe actions, handling failures, and reacting to changes. The goal is not to produce a perfect theoretical rating. The goal is to make a manual assessment legible and actionable.
Scoring categories
| Category | What it checks |
|---|---|
| API readiness | Whether the API exposes the workflow with predictable inputs, outputs, and machine-usable behavior. |
| Auth friction | Whether access can be granted and maintained without brittle manual workarounds. |
| Action safety | Whether write paths include safeguards, reversibility, or risk-reduction patterns. |
| Docs clarity | Whether an agent builder can implement from the docs and examples you reviewed without hidden assumptions. |
| Webhook / event support | Whether the product exposes useful state changes for reactive workflows. |
| Sandbox / demo availability | Whether there is a safe place to test before touching production. |
| Rate-limit transparency | Whether limits and throttling behavior are clear enough for agent adaptation. |
| Error recovery | Whether structured errors and retry-friendly responses help agents recover. |
| MCP readiness | Whether the product shows a practical path into emerging tool-based agent ecosystems. |
Confidence matters
Every score should be read together with confidence.
- High confidence: based on direct evidence such as current docs, hands-on testing, or clear product references.
- Medium confidence: based on useful but incomplete evidence, such as partial docs or limited testing.
- Low confidence: based on thin, indirect, or outdated public signal.
A low-confidence high score should not be treated as a strong result.
How to read the result
A strong score usually means your evidence suggests the product exposes the workflow through a usable API, access can be granted cleanly, agents can act with clear boundaries, docs reduce ambiguity, and event and test support make automation more reliable.
A weak score usually means part of the workflow is still human-only, authentication is hard to automate safely, there are few safeguards around writes, docs leave too many unknowns, or testing and operational controls are weak.
AgentGrade is informational. It is not a legal opinion, security certification, compliance certification, or guarantee of production safety. It also does not claim to have independently reviewed the submitted product unless a separate human review process exists outside this page.