Skip to content

OpenTelemetry

OpenTelemetry (OTel) is the CNCF standard for generating, collecting, and exporting telemetry data. The OpenSearch Observability Stack uses OpenTelemetry as its primary ingestion interface — all telemetry flows through OTel SDKs and the OTel Collector before reaching storage backends.

This section covers the OTel Collector configuration, instrumentation approaches, and sampling strategies.

OpenTelemetry defines three core signal types:

A trace represents a single request as it propagates through your system. Each trace is composed of spans — units of work with a start time, duration, status, and parent-child relationships.

Traces answer questions like:

  • Which services handled this request?
  • Where did latency occur?
  • What caused the error?

The stack stores traces in OpenSearch via Data Prepper, indexed in otel-v1-apm-span-* indices.

A metric is a numerical measurement captured over time. OTel supports counters, histograms, gauges, and exponential histograms.

Metrics answer questions like:

  • What is the request rate for this service?
  • What is the p99 latency?
  • How much memory is the process using?

The stack routes metrics to Prometheus via OTLP HTTP for efficient time-series storage and querying.

A log is a timestamped text or structured record emitted by an application. OTel logs support trace correlation, meaning each log record can carry a traceId and spanId to link it to the distributed trace that was active when the log was emitted.

The stack stores logs in OpenSearch via Data Prepper, enabling full-text search and trace-correlated log exploration.

All three signals are transmitted using the OpenTelemetry Protocol (OTLP), which supports two transports:

TransportEndpointEncodingBest For
gRPChttp://localhost:4317ProtobufBackend services, high throughput
HTTPhttp://localhost:4318Protobuf or JSONBrowsers, serverless, restricted networks

OTLP is the only protocol you need. Unlike vendor-specific formats (Zipkin, Jaeger, StatsD), OTLP carries all three signals over a single connection, reducing operational complexity.

The following diagram shows how OpenTelemetry components fit into the stack:

flowchart TD
    subgraph Application
        SDK["OTel SDK<br/>(TracerProvider, MeterProvider, LoggerProvider)"]
        Auto["Auto-Instrumentation<br/>(framework hooks)"]
        Manual["Manual Instrumentation<br/>(custom spans, metrics)"]
        Auto --> SDK
        Manual --> SDK
    end

    SDK -->|"OTLP"| Collector["OTel Collector"]

    subgraph Collector Pipeline
        R["Receivers<br/>otlp (4317, 4318)"]
        P["Processors<br/>batch, memory_limiter,<br/>resourcedetection, transform"]
        E["Exporters<br/>otlp/opensearch, otlphttp/prometheus"]
        R --> P --> E
    end

    Collector --> DP["Data Prepper :21890"]
    Collector --> Prom["Prometheus :9090"]
    DP --> OS["OpenSearch"]
    Prom --> OSD["OpenSearch Dashboards"]
    OS --> OSD

Key points:

  1. SDKs run in your application process. They create spans, record metrics, and bridge log frameworks.
  2. Auto-instrumentation hooks into frameworks and libraries to generate telemetry without code changes.
  3. Manual instrumentation lets you add custom spans, attributes, and metrics for business-specific observability.
  4. The OTel Collector receives, processes, and exports telemetry. It runs as a standalone service (not embedded in your app).
  5. Data Prepper receives traces and logs from the Collector and writes them to OpenSearch indices.
  6. Prometheus receives metrics from the Collector via OTLP HTTP.

OpenTelemetry defines semantic conventions — standardized attribute names for common concepts. The stack relies on these conventions for its dashboards and visualizations:

ConventionPrefixExample Attributes
HTTPhttp.*, url.*http.request.method, url.path, http.response.status_code
Databasedb.*db.system.name, db.query.text, db.operation.name
RPCrpc.*rpc.system, rpc.service, rpc.method
Gen-AIgen_ai.*gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens
Serviceservice.*service.name, service.version, service.namespace

Using standard attribute names means the APM dashboards, service maps, and agent trace views work out of the box.