Skip to content

join

import { Tabs, TabItem, Aside } from ‘@astrojs/starlight/components’;

The join command combines two datasets by matching rows on a condition. The left side is your current pipeline (an index or piped commands); the right side is another index or a subsearch enclosed in square brackets.

Use join to correlate logs with traces, enrich spans with service metadata, or combine any two datasets that share a common field.

The join command supports two syntax forms: basic and extended.

[joinType] join [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria> <right-dataset>
join [type=<joinType>] [overwrite=<bool>] [max=<n>] (<join-field-list> | [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria>) <right-dataset>
ParameterRequiredDescription
joinTypeNoThe type of join. Valid values: inner (default), left, right, full, semi, anti, cross.
left = <leftAlias>NoAlias for the left dataset to disambiguate shared field names. Must appear before right.
right = <rightAlias>NoAlias for the right dataset to disambiguate shared field names.
<joinCriteria>YesA comparison expression placed after on or where that specifies how to match rows.
<right-dataset>YesThe right-side dataset. Can be an index name or a subsearch in [ ... ].
ParameterRequiredDescription
typeNoJoin type when using extended syntax. Valid values: inner (default), left, outer (alias for left), right, full, semi, anti, cross.
overwriteNoWhen using a field list, whether right-side fields with duplicate names overwrite left-side fields. Default: true.
maxNoMaximum number of right-side matches per left row. Default: 0 (unlimited) when legacy mode is enabled; 1 otherwise.
<join-field-list>NoCommon field names used to build the join condition automatically. Fields must exist in both datasets.
<joinCriteria>YesA comparison expression placed after on or where that specifies how to match rows.
<right-dataset>YesThe right-side dataset. Can be an index name or a subsearch in [ ... ].
TypeKeywordDescription
Innerinner joinReturns only rows with a match on both sides. This is the default.
Left outerleft joinReturns all left rows. Unmatched right fields are null.
Right outerright joinReturns all right rows. Unmatched left fields are null. Requires plugins.calcite.all_join_types.allowed = true.
Full outerfull joinReturns all rows from both sides. Unmatched fields are null on the missing side. Requires plugins.calcite.all_join_types.allowed = true.
Left semileft semi joinReturns left rows that have at least one match on the right. No right-side fields are included.
Left antileft anti joinReturns left rows that have no match on the right. Useful for finding orphaned records.
Crosscross joinReturns the Cartesian product of both sides. Requires plugins.calcite.all_join_types.allowed = true.
  • Assign aliases when both sides share field names. Without aliases, ambiguous field names are automatically prefixed with the table name or alias (e.g., table1.id, table2.id).
  • Keep the right side small. The right dataset is loaded into memory. Filter or limit the right-side subsearch to keep queries efficient.
  • Right, full, and cross joins are disabled by default for performance reasons. Enable them by setting plugins.calcite.all_join_types.allowed to true.
  • Subsearch row limit. The maximum number of rows from a subsearch is controlled by plugins.ppl.join.subsearch_maxout (default: 50000).
  • When using the extended syntax with a field list, duplicate field names are deduplicated based on the overwrite setting.

Correlate log entries with trace spans using traceId to see which spans produced each log line:

source = logs-otel-v1*
| inner join left=l right=r ON l.traceId = r.traceId otel-v1-apm-span-*

Join logs with a filtered subset of trace spans - only slow spans above a latency threshold:

source = logs-otel-v1* as l
| where severityText = 'ERROR'
| left join ON l.traceId = r.traceId [
source = otel-v1-apm-span-*
| where durationInNanos > 5000000000
| sort - durationInNanos
| head 100
] as r

Join logs with trace spans on the shared traceId field using the field-list shorthand:

source = logs-otel-v1*
| where severityText = 'ERROR'
| join type=left overwrite=true traceId [
source = otel-v1-apm-span-*
| where durationInNanos > 1000000000
| sort - durationInNanos
| head 100
]

Return only log events that have at least one matching trace span:

source = logs-otel-v1*
| left semi join left=l right=r on l.traceId = r.traceId otel-v1-apm-span-*

Anti join - find orphaned logs without spans

Section titled “Anti join - find orphaned logs without spans”

Return log events that have no matching trace span - useful for finding gaps in instrumentation:

source = logs-otel-v1*
| left anti join left=l right=r on l.traceId = r.traceId otel-v1-apm-span-*

Join log events with trace span data using traceId to see which spans produced each log line:

source = logs-otel-v1*
| left join left=l right=r on l.traceId = r.traceId [
source = otel-v1-apm-span-*
]
| head 50

Join raw spans with the service map index to add dependency context:

source = otel-v1-apm-span-*
| inner join left=span right=svc on span.serviceName = svc.serviceName [
source = otel-v2-apm-service-map*
| dedup serviceName
]
| sort - span.durationInNanos
| head 20