Skip to content

Data Pipeline

The data pipeline is the processing layer between your OpenTelemetry Collectors and OpenSearch. It handles routing, transformation, enrichment, and indexing of traces, logs, and metrics.

In the default stack configuration, Data Prepper serves as the primary pipeline engine. It receives OTLP data from collectors, routes events by signal type, applies processors, and writes to the appropriate OpenSearch indices.

flowchart LR
    A[OTel Collector] -->|OTLP gRPC :21890| B[Data Prepper]
    B -->|Route: LOG| C[Log Pipeline]
    B -->|Route: TRACE| D[Trace Pipeline]
    C -->|copy_values processor| E[OpenSearch<br/>log-analytics-plain]
    D --> F[Raw Traces Pipeline]
    D --> G[Service Map Pipeline]
    F -->|otel_traces processor| H[OpenSearch<br/>trace-analytics-plain-raw]
    G -->|otel_apm_service_map processor| I[OpenSearch<br/>otel-v2-apm-service-map]
    G --> J[Prometheus<br/>Remote Write]
ComponentRoleDefault Port
OTel CollectorCollects, batches, and exports telemetry from applications4317 (gRPC), 4318 (HTTP)
Data PrepperRoutes, transforms, and indexes telemetry into OpenSearch21890 (OTLP source)
OpenSearchStores and indexes traces, logs, and metrics9200 (REST API)
PrometheusReceives service map metrics via remote write9090

Customize the default pipeline configuration when you need to:

  • Add processors — Enrich, filter, or transform events before indexing (e.g., add Kubernetes metadata, redact PII, parse log formats).
  • Change routing — Send specific log sources to different indices or add conditional processing.
  • Adjust performance — Tune worker counts, batch sizes, and delays for your throughput requirements.
  • Add sinks — Export data to additional destinations beyond OpenSearch (e.g., S3, Kafka, Prometheus).
  • Bypass the pipeline — Use the OpenSearch Bulk API directly for pre-processed data or simple ingestion patterns.
  • Data Prepper -- Full pipeline configuration walkthrough including routing, processors, and sinks.
  • Ingest API -- Direct ingestion via the OpenSearch Bulk API for pre-processed data.
  • Batching & Performance -- Tune batch sizes, memory limits, and worker counts for production throughput.