Prometheus
Prometheus serves as the metrics backend in the OpenSearch Observability Stack. It receives metrics from the OpenTelemetry Collector via the OTLP endpoint and scrapes its own targets. The stack’s Prometheus configuration includes OTLP resource attribute promotion, which maps OpenTelemetry resource attributes to Prometheus labels for rich metric querying.
Architecture
Section titled “Architecture”graph LR
subgraph Applications
A1[App 1]
A2[App 2]
end
subgraph Observability Stack
OC[OTel Collector]
P[Prometheus :9090]
OS[OpenSearch]
end
A1 -->|OTLP| OC
A2 -->|OTLP| OC
OC -->|OTLP HTTP| P
P -->|scrape /metrics| OC
P -.->|query| OS
Prerequisites
Section titled “Prerequisites”- A running OpenSearch Observability Stack
- OTel Collector configured to export metrics via OTLP HTTP
- Network access to Prometheus on port 9090
Stack Prometheus configuration
Section titled “Stack Prometheus configuration”The Observability Stack ships with the following Prometheus configuration:
global: scrape_interval: 60s scrape_timeout: 10s evaluation_interval: 60s external_labels: cluster: observability-stack-dev environment: development
otlp: keep_identifying_resource_attributes: true promote_resource_attributes: - service.instance.id - service.name - service.namespace - service.version - deployment.environment.name - gen_ai.agent.id - gen_ai.agent.name - gen_ai.provider.name - gen_ai.request.model - gen_ai.response.model
storage: tsdb: out_of_order_time_window: 30m
scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'otel-collector' scrape_interval: 10s static_configs: - targets: ['otel-collector:8888']Configuration breakdown
Section titled “Configuration breakdown”Global settings
Section titled “Global settings”| Setting | Value | Description |
|---|---|---|
scrape_interval | 60s | Default interval between metric scrapes |
scrape_timeout | 10s | Timeout for each scrape request |
evaluation_interval | 60s | How often recording and alerting rules are evaluated |
external_labels.cluster | observability-stack-dev | Cluster label added to all metrics |
external_labels.environment | development | Environment label added to all metrics |
OTLP receiver configuration
Section titled “OTLP receiver configuration”The otlp section configures how Prometheus receives metrics from the OTel Collector:
keep_identifying_resource_attributes: true— Retains OTel resource attributes as metric labels, preserving the full identity of the metric source.promote_resource_attributes— Promotes specific OpenTelemetry resource attributes to first-class Prometheus labels. This allows you to query and aggregate metrics by service name, version, environment, and AI model attributes.
Promoted attributes:
| Resource attribute | Use case |
|---|---|
service.instance.id | Distinguish between instances of the same service |
service.name | Filter metrics by service |
service.namespace | Group services by namespace |
service.version | Track metrics across deployments/versions |
deployment.environment.name | Separate dev/staging/production metrics |
gen_ai.agent.id | Filter by AI agent instance |
gen_ai.agent.name | Filter by AI agent name |
gen_ai.provider.name | Filter by LLM provider (OpenAI, Anthropic, etc.) |
gen_ai.request.model | Filter by requested model |
gen_ai.response.model | Filter by actual model used in response |
Storage configuration
Section titled “Storage configuration”The out_of_order_time_window: 30m setting allows Prometheus to accept samples that arrive up to 30 minutes out of order. This is important for OTLP ingestion because the OTel Collector may batch and send metrics with slight delays.
Scrape configs
Section titled “Scrape configs”The default configuration scrapes two targets:
prometheus— Prometheus self-monitoring metrics atlocalhost:9090otel-collector— OTel Collector internal metrics atotel-collector:8888, scraped every 10 seconds for near-real-time collector health monitoring
OTel Collector metrics export
Section titled “OTel Collector metrics export”Configure the OTel Collector to send metrics to Prometheus via the OTLP HTTP exporter:
exporters: otlphttp/prometheus: endpoint: http://prometheus:9090/api/v1/otlp tls: insecure: trueThe Collector sends metrics to the /api/v1/otlp endpoint, which Prometheus processes using its native OTLP receiver. This is the preferred method over Prometheus remote write because it preserves OpenTelemetry resource attributes.
Add custom scrape targets
Section titled “Add custom scrape targets”To scrape additional targets, extend the scrape_configs section:
scrape_configs: # Existing configs...
- job_name: 'my-application' scrape_interval: 30s static_configs: - targets: ['my-app:8080'] metrics_path: /metrics
- job_name: 'node-exporter' scrape_interval: 15s static_configs: - targets: ['node-exporter:9100']
- job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port] action: replace target_label: __address__ regex: (.+) replacement: ${1}Remote write from external Prometheus
Section titled “Remote write from external Prometheus”If you have an existing Prometheus instance, you can forward metrics to the stack’s Prometheus using remote write:
# In your external Prometheus configremote_write: - url: http://your-stack-prometheus:9090/api/v1/write queue_config: max_samples_per_send: 5000 batch_send_deadline: 5sQuery examples
Section titled “Query examples”Once metrics are flowing, query them using PromQL:
# CPU usage by servicerate(process_cpu_seconds_total{service_name="my-app"}[5m])
# Request rate by AI modelrate(gen_ai_client_operation_duration_count{gen_ai_request_model="gpt-4"}[5m])
# Memory usage by environmentprocess_resident_memory_bytes{deployment_environment_name="production"}
# OTel Collector throughputrate(otelcol_exporter_sent_metric_points_total[5m])Note that Prometheus converts OTel resource attribute names to label names by replacing dots with underscores (e.g., service.name becomes service_name).
When to use Prometheus vs OpenSearch for metrics
Section titled “When to use Prometheus vs OpenSearch for metrics”| Consideration | Prometheus | OpenSearch |
|---|---|---|
| Query language | PromQL (optimized for time-series) | PPL, SQL, DQL |
| Retention | Typically 15-90 days | Long-term storage |
| Cardinality | Best for low-to-medium cardinality | Handles high cardinality well |
| Alerting | Built-in Alertmanager integration | OpenSearch alerting plugin |
| Dashboards | Grafana, built-in UI | OpenSearch Dashboards |
| Use case | Real-time monitoring, alerting | Historical analysis, correlation with logs/traces |
Recommendation: Use Prometheus for real-time metrics, alerting, and operational dashboards. Use OpenSearch for long-term metric storage, cross-signal correlation (metrics + logs + traces), and historical analysis.
Verify metrics ingestion
Section titled “Verify metrics ingestion”- Check that Prometheus is receiving OTLP metrics:
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'- Verify promoted resource attributes appear as labels:
curl -s 'http://localhost:9090/api/v1/label/service_name/values' | jq .- Check the OTel Collector scrape target health:
curl -s 'http://localhost:9090/api/v1/query?query=up{job="otel-collector"}' | jq '.data.result[0].value[1]'Related links
Section titled “Related links”- Infrastructure Monitoring Overview
- Docker
- Kubernetes
- OTel Prometheus compatibility — Prometheus and OpenMetrics compatibility spec
- Prometheus configuration reference — Official Prometheus configuration docs