Infrastructure Monitoring
Infrastructure monitoring gives you visibility into the health and performance of the systems that run your applications. The OpenSearch Observability Stack uses OpenTelemetry as the primary collection mechanism, with support for additional tools like Prometheus, Logstash, and Fluent Bit for specialized use cases.
Architecture overview
Section titled “Architecture overview”The following diagram shows how infrastructure telemetry flows through the stack:
graph LR
subgraph Sources
D[Docker]
K[Kubernetes]
VM[VMs / Bare Metal]
CW[AWS CloudWatch]
end
subgraph Collection
OC[OTel Collector]
P[Prometheus]
FB[Fluent Bit]
LS[Logstash]
end
subgraph Pipeline
DP[Data Prepper]
OS[OpenSearch]
PR[Prometheus TSDB]
end
D --> OC
K --> OC
VM --> OC
CW --> OC
D --> FB
K --> FB
VM --> LS
OC -->|traces, logs| DP --> OS
OC -->|metrics OTLP| PR
FB --> DP
LS --> OS
P --> PR
Collection strategies
Section titled “Collection strategies”Choose your collection approach based on what you need to monitor:
| Strategy | Best for | Telemetry types | Tool |
|---|---|---|---|
| OTel Collector receivers | Container and host metrics, logs | Metrics, logs | OpenTelemetry Collector |
| Prometheus scraping | Application and service metrics | Metrics | Prometheus |
| Log forwarders | High-volume log collection | Logs | Fluent Bit, Fluentd, Logstash |
| Cloud integrations | Managed cloud services | Metrics, logs, traces | ADOT, CloudWatch |
OpenTelemetry-first approach
Section titled “OpenTelemetry-first approach”The recommended approach is to deploy the OpenTelemetry Collector as your primary collection agent. The Collector provides:
- Unified collection of metrics, logs, and traces from a single agent
- Receivers for Docker stats, Kubernetes cluster metrics, host metrics, and more
- Processors for enriching data with resource attributes (hostname, container ID, pod name)
- Exporters to route data to Data Prepper (traces/logs) and Prometheus (metrics)
Supplementary tools
Section titled “Supplementary tools”For specific use cases, you can add specialized tools alongside the OTel Collector:
- Prometheus for scraping metrics endpoints and long-term metric storage
- Fluent Bit for lightweight, high-throughput log forwarding
- Logstash for complex log transformation pipelines
- ADOT for AWS-native environments with managed collector support
Core infrastructure signals
Section titled “Core infrastructure signals”Metrics
Section titled “Metrics”Infrastructure metrics provide quantitative measurements of system health:
- Host metrics: CPU usage, memory utilization, disk I/O, network throughput
- Container metrics: Container CPU/memory limits and usage, restart counts, network stats
- Kubernetes metrics: Pod status, node capacity, deployment replicas, resource requests vs limits
- Service metrics: Prometheus-scraped application metrics, custom gauges and counters
Infrastructure logs capture events and state changes:
- Container logs: stdout/stderr from running containers
- System logs: Syslog, journald, kernel messages
- Kubernetes logs: Pod logs, kubelet logs, control plane component logs
- Cloud service logs: CloudWatch Logs, ECS task logs, Lambda invocation logs
Traces
Section titled “Traces”Infrastructure traces are less common but useful for:
- Request routing: Tracing requests through load balancers and proxies
- Service mesh: Envoy/Istio sidecar proxy traces
- Cloud function invocations: Lambda execution traces via ADOT
Prerequisites
Section titled “Prerequisites”Before setting up infrastructure monitoring, ensure you have:
- A running OpenSearch Observability Stack (see Sandbox)
- Network access from your infrastructure to the OTel Collector endpoints (ports 4317/4318)
- Appropriate permissions to deploy agents on your target infrastructure
Next steps
Section titled “Next steps”Choose the guide that matches your infrastructure:
- Docker — Monitor Docker containers and Compose deployments
- Kubernetes — Monitor Kubernetes clusters, pods, and workloads
- AWS — Integrate with AWS services using ADOT
- Prometheus — Scrape and store Prometheus metrics
- Logstash — Route logs through Logstash pipelines
- Fluentd & Fluent Bit — Forward logs with lightweight agents