Skip to content

Infrastructure Monitoring

Infrastructure monitoring gives you visibility into the health and performance of the systems that run your applications. The OpenSearch Observability Stack uses OpenTelemetry as the primary collection mechanism, with support for additional tools like Prometheus, Logstash, and Fluent Bit for specialized use cases.

The following diagram shows how infrastructure telemetry flows through the stack:

graph LR
    subgraph Sources
        D[Docker]
        K[Kubernetes]
        VM[VMs / Bare Metal]
        CW[AWS CloudWatch]
    end

    subgraph Collection
        OC[OTel Collector]
        P[Prometheus]
        FB[Fluent Bit]
        LS[Logstash]
    end

    subgraph Pipeline
        DP[Data Prepper]
        OS[OpenSearch]
        PR[Prometheus TSDB]
    end

    D --> OC
    K --> OC
    VM --> OC
    CW --> OC
    D --> FB
    K --> FB
    VM --> LS

    OC -->|traces, logs| DP --> OS
    OC -->|metrics OTLP| PR
    FB --> DP
    LS --> OS
    P --> PR

Choose your collection approach based on what you need to monitor:

StrategyBest forTelemetry typesTool
OTel Collector receiversContainer and host metrics, logsMetrics, logsOpenTelemetry Collector
Prometheus scrapingApplication and service metricsMetricsPrometheus
Log forwardersHigh-volume log collectionLogsFluent Bit, Fluentd, Logstash
Cloud integrationsManaged cloud servicesMetrics, logs, tracesADOT, CloudWatch

The recommended approach is to deploy the OpenTelemetry Collector as your primary collection agent. The Collector provides:

  • Unified collection of metrics, logs, and traces from a single agent
  • Receivers for Docker stats, Kubernetes cluster metrics, host metrics, and more
  • Processors for enriching data with resource attributes (hostname, container ID, pod name)
  • Exporters to route data to Data Prepper (traces/logs) and Prometheus (metrics)

For specific use cases, you can add specialized tools alongside the OTel Collector:

  • Prometheus for scraping metrics endpoints and long-term metric storage
  • Fluent Bit for lightweight, high-throughput log forwarding
  • Logstash for complex log transformation pipelines
  • ADOT for AWS-native environments with managed collector support

Infrastructure metrics provide quantitative measurements of system health:

  • Host metrics: CPU usage, memory utilization, disk I/O, network throughput
  • Container metrics: Container CPU/memory limits and usage, restart counts, network stats
  • Kubernetes metrics: Pod status, node capacity, deployment replicas, resource requests vs limits
  • Service metrics: Prometheus-scraped application metrics, custom gauges and counters

Infrastructure logs capture events and state changes:

  • Container logs: stdout/stderr from running containers
  • System logs: Syslog, journald, kernel messages
  • Kubernetes logs: Pod logs, kubelet logs, control plane component logs
  • Cloud service logs: CloudWatch Logs, ECS task logs, Lambda invocation logs

Infrastructure traces are less common but useful for:

  • Request routing: Tracing requests through load balancers and proxies
  • Service mesh: Envoy/Istio sidecar proxy traces
  • Cloud function invocations: Lambda execution traces via ADOT

Before setting up infrastructure monitoring, ensure you have:

  • A running OpenSearch Observability Stack (see Sandbox)
  • Network access from your infrastructure to the OTel Collector endpoints (ports 4317/4318)
  • Appropriate permissions to deploy agents on your target infrastructure

Choose the guide that matches your infrastructure:

  • Docker — Monitor Docker containers and Compose deployments
  • Kubernetes — Monitor Kubernetes clusters, pods, and workloads
  • AWS — Integrate with AWS services using ADOT
  • Prometheus — Scrape and store Prometheus metrics
  • Logstash — Route logs through Logstash pipelines
  • Fluentd & Fluent Bit — Forward logs with lightweight agents