Skip to content

Agent Health

Agent Health is an evaluation and observability framework for AI agents. It helps you measure agent performance through “Golden Path” trajectory comparison — where an LLM judge evaluates agent actions against expected outcomes. Check out the GitHub repository for source code and contributions.

Terminal window
# Start Agent Health with demo data (no configuration needed)
npx @opensearch-project/agent-health@latest

Opens http://localhost:4001 with pre-loaded sample data for exploration.

  • AI teams building autonomous agents (RCA, customer support, data analysis)
  • QA engineers testing agent behavior across scenarios
  • Platform teams monitoring agent performance in production
  • Real-time agent execution streaming and visualization
  • LLM-based evaluation with pass/fail scoring
  • Batch experiments comparing agents and models
  • OpenTelemetry trace integration for performance analysis
  • Pluggable connectors for different agent types (REST, SSE, CLI)

Agent Health Architecture

Agent Health uses a client-server architecture where all clients (UI, CLI) access storage through a unified HTTP API. The server handles agent communication via pluggable connectors and proxies LLM judge calls to AWS Bedrock.

ConnectorProtocolDescription
agui-streamingAG-UI SSEML-Commons agents (default)
restHTTP POSTNon-streaming REST APIs
subprocessCLICommand-line tools
claude-codeClaude CLIClaude Code agent comparison
mockIn-memoryDemo and testing

For creating custom connectors, see Connectors.