OTel Collector Configuration
The OpenTelemetry Collector is the central routing layer in the observability stack. It receives telemetry from your applications via OTLP, processes it (batching, resource detection, transformations), and exports it to Data Prepper and Prometheus.
This page walks through the complete Collector configuration used by the stack.
Prerequisites
Section titled “Prerequisites”- The observability stack running (OTel Collector, Data Prepper, Prometheus, OpenSearch)
- Applications configured to send OTLP to the Collector endpoints
Full Configuration
Section titled “Full Configuration”receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 cors: allowed_origins: ["http://*", "https://*"]
processors: memory_limiter: check_interval: 5s limit_percentage: 80 spike_limit_percentage: 25 batch: timeout: 10s send_batch_size: 1024 resourcedetection: detectors: [env, system] transform: error_mode: ignore trace_statements: - context: span statements: - replace_pattern(attributes["url.path"], "\\?.*", "") - replace_pattern(name, "\\?.*", "") - replace_pattern(name, "GET /api/products/[^/]+", "GET /api/products/{productId}") - set(attributes["db_system_name"], attributes["db.system.name"]) where attributes["db.system.name"] != nil - set(attributes["code_function_name"], attributes["code.function.name"]) where attributes["code.function.name"] != nil - set(attributes["user_agent_original"], attributes["user_agent.original"]) where attributes["user_agent.original"] != nil - set(attributes["upstream_cluster_name"], attributes["upstream_cluster.name"]) where attributes["upstream_cluster.name"] != nil - set(attributes["error_type"], attributes["error.type"]) where attributes["error.type"] != nil - set(attributes["app_ads_contextKeys"], attributes["app.ads.contextKeys"]) where attributes["app.ads.contextKeys"] != nil - set(attributes["rpc_system"], attributes["rpc.system"]) where attributes["rpc.system"] != nil log_statements: - context: log statements: - flatten(body, "") - merge_maps(body, body[""], "upsert") - delete_key(body, "") - set(body, Concat([body], ", ")) where IsMap(body)
exporters: debug: verbosity: detailed otlp/opensearch: endpoint: data-prepper:21890 tls: insecure: true otlphttp/prometheus: endpoint: http://prometheus:9090/api/v1/otlp
service: telemetry: metrics: readers: - pull: exporter: prometheus: host: '0.0.0.0' port: 8888 pipelines: traces: receivers: [otlp] processors: [resourcedetection, memory_limiter, transform, batch] exporters: [otlp/opensearch, debug] metrics: receivers: [otlp] processors: [resourcedetection, memory_limiter, batch] exporters: [otlphttp/prometheus, debug] logs: receivers: [otlp] processors: [resourcedetection, memory_limiter, transform, batch] exporters: [otlp/opensearch, debug]Receivers
Section titled “Receivers”The otlp receiver accepts all three telemetry signals over two protocols:
receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 cors: allowed_origins: ["http://*", "https://*"]| Setting | Value | Purpose |
|---|---|---|
grpc.endpoint | 0.0.0.0:4317 | Accept OTLP gRPC from backend services |
http.endpoint | 0.0.0.0:4318 | Accept OTLP HTTP from browsers, serverless |
http.cors.allowed_origins | ["http://*", "https://*"] | Allow browser-based instrumentation from any origin |
The CORS configuration is permissive by default for development. In production, restrict allowed_origins to your application domains.
Processors
Section titled “Processors”Processors run in the order specified in each pipeline definition. The stack uses four processors:
memory_limiter
Section titled “memory_limiter”memory_limiter: check_interval: 5s limit_percentage: 80 spike_limit_percentage: 25Prevents the Collector from running out of memory under load. When memory usage exceeds 80%, the Collector starts dropping data. The spike limit reserves 25% headroom for sudden bursts.
Place memory_limiter early in the processor chain (after resourcedetection) so it can shed load before expensive processing occurs.
batch: timeout: 10s send_batch_size: 1024Batches telemetry records before export to reduce the number of outbound requests. A batch is flushed when it reaches 1024 items or 10 seconds have elapsed, whichever comes first.
| Setting | Default | Recommendation |
|---|---|---|
timeout | 200ms | 5-30s for production |
send_batch_size | 8192 | 512-2048 depending on payload size |
resourcedetection
Section titled “resourcedetection”resourcedetection: detectors: [env, system]Automatically populates resource attributes from the runtime environment:
env— ReadsOTEL_RESOURCE_ATTRIBUTESenvironment variablesystem— Addshost.name,host.arch,os.type
Additional detectors are available for cloud environments: aws, gcp, azure, docker, kubernetes.
transform
Section titled “transform”The transform processor uses the OTel Transformation Language (OTTL) to modify telemetry in-flight.
Trace transformations:
transform: error_mode: ignore trace_statements: - context: span statements: # Strip query parameters from URL paths and span names - replace_pattern(attributes["url.path"], "\\?.*", "") - replace_pattern(name, "\\?.*", "") # Normalize high-cardinality span names - replace_pattern(name, "GET /api/products/[^/]+", "GET /api/products/{productId}")Query parameter stripping prevents high-cardinality URL paths from creating excessive unique span names. The product ID normalization collapses GET /api/products/123, GET /api/products/456, etc. into a single operation name.
Dotted attribute flattening:
# Flatten dotted attribute keys (workaround for data-prepper#5616) - set(attributes["db_system_name"], attributes["db.system.name"]) where attributes["db.system.name"] != nilOTel semantic convention attributes use dots (e.g., db.system.name), but Data Prepper may interpret dots as nested object paths. These statements copy dotted attributes to underscore-delimited equivalents so they are indexed correctly.
Log transformations:
log_statements: - context: log statements: - flatten(body, "") - merge_maps(body, body[""], "upsert") - delete_key(body, "") - set(body, Concat([body], ", ")) where IsMap(body)These statements normalize structured log bodies. Nested maps are flattened and merged into the top-level body, ensuring consistent indexing in OpenSearch.
Exporters
Section titled “Exporters”otlp/opensearch
Section titled “otlp/opensearch”otlp/opensearch: endpoint: data-prepper:21890 tls: insecure: trueSends traces and logs to Data Prepper over OTLP gRPC. Data Prepper then processes and indexes them in OpenSearch. TLS is disabled because the Collector and Data Prepper communicate over a private Docker network.
In production with network exposure, enable TLS:
otlp/opensearch: endpoint: data-prepper:21890 tls: cert_file: /certs/collector.crt key_file: /certs/collector.key ca_file: /certs/ca.crtotlphttp/prometheus
Section titled “otlphttp/prometheus”otlphttp/prometheus: endpoint: http://prometheus:9090/api/v1/otlpSends metrics to Prometheus via its native OTLP HTTP endpoint (available since Prometheus 2.47). No conversion or remote-write adapter is needed.
debug: verbosity: detailedLogs all exported telemetry to the Collector’s stdout. Useful for development and troubleshooting. Remove or set verbosity: basic in production.
Service Pipelines
Section titled “Service Pipelines”The service.pipelines section wires receivers, processors, and exporters together:
service: pipelines: traces: receivers: [otlp] processors: [resourcedetection, memory_limiter, transform, batch] exporters: [otlp/opensearch, debug] metrics: receivers: [otlp] processors: [resourcedetection, memory_limiter, batch] exporters: [otlphttp/prometheus, debug] logs: receivers: [otlp] processors: [resourcedetection, memory_limiter, transform, batch] exporters: [otlp/opensearch, debug]Note that the metrics pipeline does not include the transform processor — metric transformations are not needed for the default setup.
Collector self-monitoring
Section titled “Collector self-monitoring”service: telemetry: metrics: readers: - pull: exporter: prometheus: host: '0.0.0.0' port: 8888The Collector exposes its own metrics on port 8888 in Prometheus format. Scrape this endpoint to monitor Collector health, queue depths, and export errors.
Customization Tips
Section titled “Customization Tips”Add a new receiver (e.g., Kafka):
receivers: otlp: ... kafka: brokers: ["kafka:9092"] topic: telemetry encoding: otlp_proto
service: pipelines: traces: receivers: [otlp, kafka]Filter unwanted spans (e.g., health checks):
processors: filter: error_mode: ignore traces: span: - 'attributes["url.path"] == "/health"' - 'attributes["url.path"] == "/ready"'Add tail sampling at the Collector level (see Sampling Strategies):
processors: tail_sampling: decision_wait: 30s policies: - name: errors type: status_code status_code: {status_codes: [ERROR]} - name: slow type: latency latency: {threshold_ms: 5000}Related Links
Section titled “Related Links”- OpenTelemetry Overview — Signals, protocols, and architecture
- Sampling Strategies — Head and tail sampling configuration
- Data Pipeline — Data Prepper and Prometheus configuration
- OpenTelemetry Collector documentation — Official Collector configuration reference
- Collector contrib components — Community receivers, processors, and exporters