Skip to content

dedup

import { Aside } from ‘@astrojs/starlight/components’;

The dedup command removes duplicate documents from search results based on the values of one or more specified fields. By default, it keeps the first occurrence of each unique combination of field values and discards subsequent duplicates.

You can retain more than one duplicate per combination by specifying a count, preserve rows that have null values with keepempty=true, and limit deduplication to consecutive rows only with consecutive=true.

dedup [<count>] <field-list> [keepempty=<bool>] [consecutive=<bool>]
ArgumentRequiredTypeDefaultDescription
<field-list>YesComma-delimited field namesThe fields used to determine uniqueness. At least one field is required. When multiple fields are specified, uniqueness is based on the combination of all field values.
<count>NoInteger (> 0)1The number of duplicate documents to retain for each unique combination of field values.
keepemptyNoBooleanfalseWhen true, keeps documents where any field in the field list has a NULL value or is missing. When false, those documents are discarded.
consecutiveNoBooleanfalseWhen true, removes only consecutive duplicate documents rather than all duplicates.
  • Operates on field combinations. When you specify multiple fields, dedup considers the combination of values across all those fields. For example, dedup service, severity keeps one row for each unique (service, severity) pair.
  • keepempty=true preserves rows with null values. By default, rows where any of the specified fields is null are removed. Set keepempty=true to retain them.
  • consecutive=true only removes adjacent duplicates. This is useful when your data is sorted and you want to collapse runs of identical values while preserving non-adjacent duplicates.
  • Common pattern: one representative per group. Use dedup to get one sample document per unique value of a field. This is faster than stats when you need the actual document, not just a count.

Keep one log entry per unique severity level:

| dedup severityText

Try in playground →

Keep up to 2 log entries per severity level:

| dedup 2 severityText

Try in playground →

Keep one log entry per unique combination of service and severity:

| dedup `resource.attributes.service.name`, severityText

Try in playground →

Keep one log per unique traceId, including logs that have no traceId:

| dedup traceId keepempty=true

Try in playground →

One representative error log per OTel service

Section titled “One representative error log per OTel service”

Get one sample error log from each service to quickly see what kinds of errors each service produces:

| where severityText = 'ERROR'
| dedup `resource.attributes.service.name`

Try in playground →

Unique service-severity combinations with OTel context

Section titled “Unique service-severity combinations with OTel context”

Find every distinct combination of service and severity level, showing one sample log body for each. This is useful for building a quick inventory of what each service is logging:

| dedup `resource.attributes.service.name`, severityText
| sort `resource.attributes.service.name`, severityText

Try in playground →

Deduplicate traces to find one slow span per service

Section titled “Deduplicate traces to find one slow span per service”

Get one representative slow span (over 1 second) from each service in your OTel trace data:

source = otel-v1-apm-span-*
| where durationInNanos > 1000000000
| dedup serviceName
| sort - durationInNanos
  • top - Find the most common values of a field
  • rare - Find the least common values of a field
  • stats - Aggregate results when you need counts rather than sample documents
  • head - Limit the number of results returned
  • PPL Command Reference - All PPL commands