Skip to main content

See Everything. From Services to AI Agents.

APM traces. Logs. Prometheus metrics. Service maps. Dashboards. Plus agent tracing, MCP support, and AI observability SDKs.

One open-source platform. Full-stack observability. Zero lock-in.

Traces
Metrics
Dashboards
AI Agents
Services
APM traces, service maps, and distributed tracing across your stack
Quick Start
$ curl -fsSL https://raw.githubusercontent.com/opensearch-project/observability-stack/main/install.sh | bash

Works everywhere. Installs everything. You're welcome. 🔥

5 Minutes to Production Observability

From zero to full tracing in minutes. No complex configuration, no vendor lock-in.

main.py Python
from opensearch_genai_sdk_py import register, agent, tool

register(service_name="my-app")

@tool(name="search")
def search(query: str) -> dict:
    return search_api.query(query)

@agent(name="assistant")
def assistant(prompt):
    data = search(prompt)
    return llm.generate(prompt, context=data)

# Automatic OTEL traces captured
result = assistant("Hello AI")
Terminal Output
Instrumentation initialized
OTEL exporter configured
Trace captured: process_request
Spans exported: 3
Latency: 342ms
View dashboard at http://localhost:8000

Choose Your Integration Style

Three paths to production observability. Pick the one that fits your workflow.

GenAI SDK

One-line setup with automatic OpenTelemetry instrumentation. Decorators for agents, tools, and workflows.

example.pypython
from opensearch_genai_sdk_py import register, agent, tool
# One-line setup — configures OTEL pipeline automatically
register(service_name="my-app")
@tool(name="get_weather")
def get_weather(city: str) -> dict:
return {"city": city, "temp": 22, "condition": "sunny"}
@agent(name="weather_assistant")
def assistant(query: str) -> str:
data = get_weather("Paris")
return f"{data['condition']}, {data['temp']}C"
# Automatic OTEL traces, metrics, and logs
result = assistant("What's the weather?")

Key Benefits

  • Zero configuration required
  • Automatic instrumentation of popular frameworks
  • Instant OTEL traces and metrics
  • Works with existing code
  • Production-ready in 5 minutes

Why OpenTelemetry Matters

The foundation for traces, metrics, and logs across services and AI. No compromises, no lock-in.

Industry Standard (CNCF)

OpenTelemetry is a CNCF graduated project, backed by major tech companies and trusted by thousands of organizations worldwide.

Learn more

Your Data, Your Rules

Own your observability data completely. Export to any backend, store locally, or switch providers anytime without losing history.

Learn more

Future-Proof Investment

Built on open standards that evolve with the industry. Your instrumentation code stays relevant as technology advances.

Learn more

No Vendor Lock-In

Switch observability backends in minutes, not months. Your instrumentation is portable across any OTEL-compatible platform.

Learn more

Language Agnostic

Consistent instrumentation across Python, JavaScript, Go, Java, and 10+ languages. One standard for your entire stack.

Learn more

Community Driven

Benefit from contributions by thousands of developers. Active community, extensive documentation, and continuous improvements.

Learn more

Full-Stack Observability, One Platform

From service health to AI agent performance — traces, logs, metrics, dashboards, and more

APM & Distributed Tracing

End-to-end visibility across services with service maps, latency breakdowns, and error tracking. OpenTelemetry-native.

  • Service maps and dependency visualization
  • Distributed trace correlation across services
  • Latency percentiles, error rates, and throughput
yaml
# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  opensearch:
    http:
      endpoint: https://opensearch:9200
    dataset: traces

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [opensearch]

Metrics & Dashboards

Prometheus-compatible metrics with PromQL support. Custom dashboards and RED metrics computed automatically from trace data.

  • Prometheus remote-write and native PromQL support
  • RED metrics (Rate, Errors, Duration) from Data Prepper
  • Custom dashboards with real-time panels and charts
Request Rate 99%
P99 Latency 95%
Error Rate 82%
Throughput 97%
Saturation 88%
Availability 94%
P50 Latency 91%
Success Rate 96%

Log Analytics & Correlation

Ingest, search, and correlate logs with traces. Full-text search with PPL and OpenTelemetry-native log collection.

  • Log-to-trace correlation in one click
  • Full-text search with PPL structured field queries
  • OTel-native log ingestion via OTLP
Log Explorer
PPL Query
Live
source = ss4o_logs-* | where severity = "ERROR" | sort - timestamp
ERR
Connection timeout to upstream service
trace_id: 3f2a...8c1d · service: api-gateway
WARN
Retry attempt 2/3 for payment-service
trace_id: 7b4e...2f9a · service: checkout

AI Agent & LLM Observability

Trace AI agent workflows with OpenTelemetry GenAI semantic conventions. Visualize execution graphs, monitor token usage, and debug agent behavior.

  • Agent tracing with tool-call and reasoning step visualization
  • GenAI semantic conventions for standard, interoperable traces
  • MCP support and Python/JavaScript SDKs
Agent Trace Waterfall
trace_id: 3f2a...8c1d
OK
Agent
invoke_agent (1240ms)
LLM
chat (680ms)
Tool
search_docs (320ms)
LLM
chat (240ms)
5
Spans
1,847
Tokens
1.24s
Duration