Skip to content

Batching & Performance

Telemetry data passes through multiple batching stages: the OTel SDK, the OTel Collector, and Data Prepper. Each stage has independent configuration that affects throughput, latency, and memory usage. Tuning these settings correctly prevents data loss and ensures stable ingestion under load.

flowchart LR
    A[OTel SDK<br/>BatchSpanProcessor] -->|batch export| B[OTel Collector<br/>batch processor]
    B -->|batch export| C[Data Prepper<br/>workers + delay]
    C -->|bulk write| D[OpenSearch]

The Collector’s batch processor groups telemetry into larger payloads before exporting to Data Prepper:

processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
ParameterDefaultDescription
timeout200msMaximum time to wait before sending a partial batch
send_batch_size8192Number of spans/logs/metrics that trigger an immediate export
send_batch_max_size0 (unlimited)Hard upper limit on batch size; set to prevent oversized exports

Setting timeout: 10s and send_batch_size: 1024 means the Collector exports a batch when either 1,024 items accumulate or 10 seconds elapse, whichever comes first. This balances throughput with acceptable latency.

ParameterDefaultDescription
check_interval0sHow often to check memory usage
limit_percentage0Percentage of total memory that triggers throttling
spike_limit_percentage0Additional headroom for sudden spikes

With limit_percentage: 80 and spike_limit_percentage: 25, the Collector starts refusing data when memory reaches 80% and reserves 25% headroom for spikes. This prevents OOM crashes during traffic bursts.

Place memory_limiter first in your processor pipeline so it runs before any memory-intensive operations:

service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/dataprepper]

Each Data Prepper pipeline has independent concurrency and batching settings:

otlp-pipeline:
workers: 5
delay: 10
ParameterDefaultDescription
workers1Number of threads processing events in parallel
delay3000Milliseconds to buffer events before processing a batch

Increasing workers improves throughput for CPU-bound processors. Increasing delay creates larger batches, which improves OpenSearch bulk indexing efficiency but adds latency.

Pipeline ThroughputRecommended WorkersDelay (ms)
< 1,000 events/sec1-23000
1,000-10,000 events/sec3-51000-3000
10,000-50,000 events/sec5-10500-1000
> 50,000 events/sec10-20100-500

The OTel SDK batches spans before sending to the Collector. Tune these settings to control export frequency and memory usage in your application:

from opentelemetry.sdk.trace.export import BatchSpanProcessor
processor = BatchSpanProcessor(
exporter,
max_queue_size=2048,
schedule_delay_millis=5000,
max_export_batch_size=512,
export_timeout_millis=30000,
)
ParameterDefaultDescription
max_queue_size2048Maximum spans buffered in memory before dropping
schedule_delay_millis5000Delay between export attempts
max_export_batch_size512Spans per export batch
export_timeout_millis30000Timeout for each export call

These defaults work well for most applications. Increase max_queue_size and max_export_batch_size for high-throughput services. Decrease schedule_delay_millis if you need lower latency from span creation to visibility.

Environment variable equivalents:

Environment VariableParameter
OTEL_BSP_MAX_QUEUE_SIZEmax_queue_size
OTEL_BSP_SCHEDULE_DELAYschedule_delay_millis
OTEL_BSP_MAX_EXPORT_BATCH_SIZEmax_export_batch_size
OTEL_BSP_EXPORT_TIMEOUTexport_timeout_millis
SettingDevelopmentProduction
Collector batch.timeout1s5-10s
Collector batch.send_batch_size2561024-4096
Collector memory_limiter.limit_percentage8075-80
Data Prepper workers15-10
Data Prepper delay3000500-2000
SDK max_queue_size20484096-8192
SDK schedule_delay_millis50002000-5000
SDK max_export_batch_size512512-1024

Watch these indicators to detect batching or performance issues:

  • Collector dropped spans/logs — Check otelcol_processor_dropped_spans and otelcol_processor_dropped_log_records metrics. Non-zero values indicate the memory limiter is active or the export queue is full.
  • Data Prepper buffer usage — Monitor Data Prepper logs for Buffer is full warnings. Increase workers or reduce upstream throughput.
  • OpenSearch indexing latency — High bulk indexing latency causes backpressure through the pipeline. Check _nodes/stats for thread pool rejections.
  • SDK queue drops — The otel.bsp.spans.dropped metric indicates the SDK is producing spans faster than it can export them. Increase max_queue_size or reduce schedule_delay_millis.