AI Observability
The Observability Stack provides end-to-end tooling for AI agent observability - from instrumenting your code to viewing traces, scoring quality, and running evaluations.
End-to-end platform
Section titled “End-to-end platform”flowchart LR
subgraph instrument["1 Instrument"]
direction TB
A1["<b>GenAI SDK</b><br/>@observe, enrich(),<br/>auto-instrument LLMs"]
end
subgraph normalize["2 Normalize"]
direction TB
B1["<b>OTEL Collector</b><br/>Standardize spans,<br/>semantic conventions"]
end
subgraph local["3 Local Tooling"]
direction TB
C1["<b>Agent Health</b><br/>Debugging, scoring,<br/>evaluations"]
end
subgraph process["4 Process"]
direction TB
D1["<b>Data Prepper</b><br/>Service maps, trace<br/>correlation, aggregation"]
end
subgraph analyze["5 Analyze"]
direction TB
E1["<b>OpenSearch Dashboards</b><br/>Agent traces, graphs,<br/>evals, APM"]
end
instrument --> normalize --> process --> analyze
normalize --> local
local --> process
Capabilities
Section titled “Capabilities”- Agent tracing - visualize LLM agent execution as trace trees, DAG graphs, and timelines
- GenAI semantic conventions - standard
gen_ai.*attributes for model, tokens, tools, and sessions - Evaluation & scoring - attach quality scores to traces, run experiments against datasets
- Trace retrieval - query stored traces from OpenSearch for evaluation pipelines
- Auto-instrumentation - OpenAI, Anthropic, Bedrock, LangChain, and 20+ libraries traced automatically
- MCP server - query OpenSearch from AI agents via the built-in Model Context Protocol server
Getting started
Section titled “Getting started”Start here for a hands-on walkthrough from pip install to seeing traces and scoring quality:
- Getting Started - instrument an agent, view traces, score quality in 5 minutes
Instrument
Section titled “Instrument”Send agent trace data to the observability stack:
- Python SDK -
@observe,enrich(), auto-instrumentation, AWS SigV4 - TypeScript SDK - coming soon
- AI Agents overview - why use the SDK vs manual OTel
Analyze
Section titled “Analyze”Explore traces in OpenSearch Dashboards:
- Agent Tracing - the Agent Traces UI, span tables, detail flyouts
- Agent Graph & Path - DAG visualization, trace tree, and timeline views
Evaluate
Section titled “Evaluate”Score agent quality and run experiments:
- Evaluation & Scoring -
score(),evaluate(),Benchmark, trace retrieval - Agent Health - Golden Path trajectory comparison, LLM judge scoring, batch experiments via UI and CLI
Connect
Section titled “Connect”- MCP Server - query OpenSearch from AI agents via MCP