Skip to content

Prometheus

Prometheus serves as the metrics backend in the OpenSearch Observability Stack. It receives metrics from the OpenTelemetry Collector via the OTLP endpoint and scrapes its own targets. The stack’s Prometheus configuration includes OTLP resource attribute promotion, which maps OpenTelemetry resource attributes to Prometheus labels for rich metric querying.

graph LR
    subgraph Applications
        A1[App 1]
        A2[App 2]
    end

    subgraph Observability Stack
        OC[OTel Collector]
        P[Prometheus :9090]
        OS[OpenSearch]
    end

    A1 -->|OTLP| OC
    A2 -->|OTLP| OC
    OC -->|OTLP HTTP| P
    P -->|scrape /metrics| OC
    P -.->|query| OS
  • A running OpenSearch Observability Stack
  • OTel Collector configured to export metrics via OTLP HTTP
  • Network access to Prometheus on port 9090

The Observability Stack ships with the following Prometheus configuration:

global:
scrape_interval: 60s
scrape_timeout: 10s
evaluation_interval: 60s
external_labels:
cluster: observability-stack-dev
environment: development
otlp:
keep_identifying_resource_attributes: true
promote_resource_attributes:
- service.instance.id
- service.name
- service.namespace
- service.version
- deployment.environment.name
- gen_ai.agent.id
- gen_ai.agent.name
- gen_ai.provider.name
- gen_ai.request.model
- gen_ai.response.model
storage:
tsdb:
out_of_order_time_window: 30m
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['otel-collector:8888']
SettingValueDescription
scrape_interval60sDefault interval between metric scrapes
scrape_timeout10sTimeout for each scrape request
evaluation_interval60sHow often recording and alerting rules are evaluated
external_labels.clusterobservability-stack-devCluster label added to all metrics
external_labels.environmentdevelopmentEnvironment label added to all metrics

The otlp section configures how Prometheus receives metrics from the OTel Collector:

  • keep_identifying_resource_attributes: true — Retains OTel resource attributes as metric labels, preserving the full identity of the metric source.
  • promote_resource_attributes — Promotes specific OpenTelemetry resource attributes to first-class Prometheus labels. This allows you to query and aggregate metrics by service name, version, environment, and AI model attributes.

Promoted attributes:

Resource attributeUse case
service.instance.idDistinguish between instances of the same service
service.nameFilter metrics by service
service.namespaceGroup services by namespace
service.versionTrack metrics across deployments/versions
deployment.environment.nameSeparate dev/staging/production metrics
gen_ai.agent.idFilter by AI agent instance
gen_ai.agent.nameFilter by AI agent name
gen_ai.provider.nameFilter by LLM provider (OpenAI, Anthropic, etc.)
gen_ai.request.modelFilter by requested model
gen_ai.response.modelFilter by actual model used in response

The out_of_order_time_window: 30m setting allows Prometheus to accept samples that arrive up to 30 minutes out of order. This is important for OTLP ingestion because the OTel Collector may batch and send metrics with slight delays.

The default configuration scrapes two targets:

  1. prometheus — Prometheus self-monitoring metrics at localhost:9090
  2. otel-collector — OTel Collector internal metrics at otel-collector:8888, scraped every 10 seconds for near-real-time collector health monitoring

Configure the OTel Collector to send metrics to Prometheus via the OTLP HTTP exporter:

exporters:
otlphttp/prometheus:
endpoint: http://prometheus:9090/api/v1/otlp
tls:
insecure: true

The Collector sends metrics to the /api/v1/otlp endpoint, which Prometheus processes using its native OTLP receiver. This is the preferred method over Prometheus remote write because it preserves OpenTelemetry resource attributes.

To scrape additional targets, extend the scrape_configs section:

scrape_configs:
# Existing configs...
- job_name: 'my-application'
scrape_interval: 30s
static_configs:
- targets: ['my-app:8080']
metrics_path: /metrics
- job_name: 'node-exporter'
scrape_interval: 15s
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}

If you have an existing Prometheus instance, you can forward metrics to the stack’s Prometheus using remote write:

# In your external Prometheus config
remote_write:
- url: http://your-stack-prometheus:9090/api/v1/write
queue_config:
max_samples_per_send: 5000
batch_send_deadline: 5s

Once metrics are flowing, query them using PromQL:

# CPU usage by service
rate(process_cpu_seconds_total{service_name="my-app"}[5m])
# Request rate by AI model
rate(gen_ai_client_operation_duration_count{gen_ai_request_model="gpt-4"}[5m])
# Memory usage by environment
process_resident_memory_bytes{deployment_environment_name="production"}
# OTel Collector throughput
rate(otelcol_exporter_sent_metric_points_total[5m])

Note that Prometheus converts OTel resource attribute names to label names by replacing dots with underscores (e.g., service.name becomes service_name).

When to use Prometheus vs OpenSearch for metrics

Section titled “When to use Prometheus vs OpenSearch for metrics”
ConsiderationPrometheusOpenSearch
Query languagePromQL (optimized for time-series)PPL, SQL, DQL
RetentionTypically 15-90 daysLong-term storage
CardinalityBest for low-to-medium cardinalityHandles high cardinality well
AlertingBuilt-in Alertmanager integrationOpenSearch alerting plugin
DashboardsGrafana, built-in UIOpenSearch Dashboards
Use caseReal-time monitoring, alertingHistorical analysis, correlation with logs/traces

Recommendation: Use Prometheus for real-time metrics, alerting, and operational dashboards. Use OpenSearch for long-term metric storage, cross-signal correlation (metrics + logs + traces), and historical analysis.

  1. Check that Prometheus is receiving OTLP metrics:
Terminal window
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'
  1. Verify promoted resource attributes appear as labels:
Terminal window
curl -s 'http://localhost:9090/api/v1/label/service_name/values' | jq .
  1. Check the OTel Collector scrape target health:
Terminal window
curl -s 'http://localhost:9090/api/v1/query?query=up{job="otel-collector"}' | jq '.data.result[0].value[1]'