dedup
import { Aside } from ‘@astrojs/starlight/components’;
The dedup command removes duplicate documents from search results based on the values of one or more specified fields. By default, it keeps the first occurrence of each unique combination of field values and discards subsequent duplicates.
You can retain more than one duplicate per combination by specifying a count, preserve rows that have null values with keepempty=true, and limit deduplication to consecutive rows only with consecutive=true.
Syntax
Section titled “Syntax”dedup [<count>] <field-list> [keepempty=<bool>] [consecutive=<bool>]Arguments
Section titled “Arguments”| Argument | Required | Type | Default | Description |
|---|---|---|---|---|
<field-list> | Yes | Comma-delimited field names | — | The fields used to determine uniqueness. At least one field is required. When multiple fields are specified, uniqueness is based on the combination of all field values. |
<count> | No | Integer (> 0) | 1 | The number of duplicate documents to retain for each unique combination of field values. |
keepempty | No | Boolean | false | When true, keeps documents where any field in the field list has a NULL value or is missing. When false, those documents are discarded. |
consecutive | No | Boolean | false | When true, removes only consecutive duplicate documents rather than all duplicates. |
Usage notes
Section titled “Usage notes”- Operates on field combinations. When you specify multiple fields,
dedupconsiders the combination of values across all those fields. For example,dedup service, severitykeeps one row for each unique (service, severity) pair. keepempty=truepreserves rows with null values. By default, rows where any of the specified fields is null are removed. Setkeepempty=trueto retain them.consecutive=trueonly removes adjacent duplicates. This is useful when your data is sorted and you want to collapse runs of identical values while preserving non-adjacent duplicates.- Common pattern: one representative per group. Use
dedupto get one sample document per unique value of a field. This is faster thanstatswhen you need the actual document, not just a count.
Examples
Section titled “Examples”Deduplicate on a single field
Section titled “Deduplicate on a single field”Keep one log entry per unique severity level:
| dedup severityTextKeep multiple duplicates per group
Section titled “Keep multiple duplicates per group”Keep up to 2 log entries per severity level:
| dedup 2 severityTextDeduplicate on multiple fields
Section titled “Deduplicate on multiple fields”Keep one log entry per unique combination of service and severity:
| dedup `resource.attributes.service.name`, severityTextPreserve rows with null values
Section titled “Preserve rows with null values”Keep one log per unique traceId, including logs that have no traceId:
| dedup traceId keepempty=trueOne representative error log per OTel service
Section titled “One representative error log per OTel service”Get one sample error log from each service to quickly see what kinds of errors each service produces:
| where severityText = 'ERROR'| dedup `resource.attributes.service.name`Extended examples
Section titled “Extended examples”Unique service-severity combinations with OTel context
Section titled “Unique service-severity combinations with OTel context”Find every distinct combination of service and severity level, showing one sample log body for each. This is useful for building a quick inventory of what each service is logging:
| dedup `resource.attributes.service.name`, severityText| sort `resource.attributes.service.name`, severityTextDeduplicate traces to find one slow span per service
Section titled “Deduplicate traces to find one slow span per service”Get one representative slow span (over 1 second) from each service in your OTel trace data:
source = otel-v1-apm-span-*| where durationInNanos > 1000000000| dedup serviceName| sort - durationInNanosSee also
Section titled “See also”- top - Find the most common values of a field
- rare - Find the least common values of a field
- stats - Aggregate results when you need counts rather than sample documents
- head - Limit the number of results returned
- PPL Command Reference - All PPL commands