Skip to content

CLI Reference

Agent Health provides a CLI for running evaluations, managing benchmarks, and generating reports.

Terminal window
npm install -g @opensearch-project/agent-health # Global install
npx @opensearch-project/agent-health <command> # No install required

Start the web server. This is the default action when no subcommand is specified.

Terminal window
agent-health [serve] [options]
OptionDescriptionDefault
-p, --port <n>Server port4001
-e, --env-file <path>Load env file.env
--no-browserSkip auto-open browser
Terminal window
agent-health --port 8080 --env-file prod.env
agent-health serve -p 8080 --no-browser

List available resources.

Terminal window
agent-health list <resource> [-o table|json]
ResourceAliasesDescription
agentsConfigured agents
connectorsAvailable connectors
modelsAvailable models
test-casestestcases, tcStored test cases
benchmarksbenchStored benchmarks
Terminal window
agent-health list agents
agent-health list tc -o json
agent-health list bench

Run a single test case evaluation.

Terminal window
agent-health run -t <test-case> [options]
OptionDescription
-t, --test-case <id>Test case ID or name (required)
-a, --agent <key>Agent key (repeatable for comparison)
-m, --model <id>Model override
-o, --output <fmt>Output: table, json
-v, --verboseShow full trajectory
Terminal window
agent-health run -t demo-otel-001 -a ml-commons -v
agent-health run -t demo-otel-001 -a ml-commons -a claude-code # compare agents

Run a benchmark (batch of test cases) against one or more agents.

Terminal window
agent-health benchmark [options]
OptionDescriptionDefault
-n, --name <name>Benchmark name or ID
-f, --file <path>JSON file of test cases to import and benchmark
-a, --agent <key>Agent key (repeatable)First enabled agent
-m, --model <id>Model overrideAgent default
-o, --output <fmt>Output: table, jsontable
--export <path>Export results to file
--format <type>Report format for --export: json, html, pdfjson
-v, --verboseShow per-test-case results and errors
--stop-serverStop the server after benchmark completesKeep running

Modes:

  • Quick mode (no -n, no -f): Auto-creates a benchmark from all stored test cases
  • Named mode (-n <name>): Runs a specific existing benchmark
  • File mode (-f <path>): Imports test cases from a JSON file, creates a benchmark, and runs it
Terminal window
agent-health benchmark # quick mode
agent-health benchmark -n "Baseline" -a ml-commons # named mode
agent-health benchmark -f ./test-cases.json -a my-agent -v # file mode
agent-health benchmark -f ./test-cases.json -n "My Run" --export results.json

Export benchmark test cases as import-compatible JSON.

Terminal window
agent-health export -b <benchmark> [options]
OptionDescription
-b, --benchmark <id-or-name>Benchmark ID or name (required)
-o, --output <file>Output file path (default: <benchmark-name>.json)
--stdoutWrite to stdout instead of file
Terminal window
agent-health export -b "Baseline" -o test-cases.json
agent-health export -b bench-123 --stdout | jq '.[] | .name'

Generate a downloadable report for benchmark runs.

Terminal window
agent-health report -b <benchmark> [options]
OptionDescriptionDefault
-b, --benchmark <id>Benchmark name or ID (required)
-r, --runs <ids>Comma-separated run IDsAll runs
-f, --format <type>Report format: json, html, pdfhtml
-o, --output <file>Output file pathAuto-generated
--stdoutWrite to stdout (JSON format only)
Terminal window
agent-health report -b "Baseline" # HTML report (all runs)
agent-health report -b "Baseline" -f pdf -o report.pdf # PDF report
agent-health report -b "Baseline" -r run-123,run-456 # Specific runs
agent-health report -b "Baseline" -f json --stdout # JSON to stdout

Check system configuration and connectivity.

Terminal window
agent-health doctor [-o text|json]

Checks:

  • Config file (agent-health.config.ts)
  • Environment file (.env)
  • AWS credentials (for Bedrock judge)
  • Claude Code CLI
  • Configured agents
  • Available connectors
  • OpenSearch Storage (test cases, benchmarks)
  • OpenSearch Observability (traces, logs)
$ agent-health doctor
✓ Config File: Found: agent-health.config.ts
✓ AWS Credentials: Profile: Bedrock
✓ Agents: 3 agents configured
⚠ OpenSearch Storage: Not configured
⚠ OpenSearch Observability: Not configured

Initialize project configuration files.

Terminal window
agent-health init [options]
OptionDescription
--forceOverwrite existing files
--with-examplesInclude sample test case

Creates agent-health.config.ts and .env.example.

Terminal window
agent-health init
agent-health init --force --with-examples

One-time migration to add stats to existing benchmark runs. Only needed if you have benchmarks created before stats tracking was added.

Terminal window
agent-health migrate [options]
OptionDescription
--dry-runShow what would be migrated without making changes
-v, --verboseShow detailed progress
Terminal window
agent-health migrate --dry-run # Preview changes
agent-health migrate -v # Run migration with details

VariableDescription
AWS_PROFILEAWS profile for Bedrock judge
AWS_REGIONAWS region
DEBUGEnable verbose debug logging (true/false)
OPENSEARCH_STORAGE_ENDPOINTStorage cluster URL
OPENSEARCH_STORAGE_USERNAMEStorage auth user
OPENSEARCH_STORAGE_PASSWORDStorage auth password
OPENSEARCH_LOGS_ENDPOINTLogs cluster URL
OPENSEARCH_LOGS_USERNAMELogs auth user
OPENSEARCH_LOGS_PASSWORDLogs auth password
CodeMeaning
0Success
1Error

Most commands support -o, --output:

FormatUse case
tableHuman-readable (default)
jsonMachine-readable, scripting