patterns
import { Aside } from ‘@astrojs/starlight/components’;
The patterns command is one of PPL’s most powerful features for log analysis. It automatically extracts patterns from unstructured text by replacing variable parts (numbers, IPs, timestamps, identifiers) with <*> placeholders, grouping similar log lines together. This replaces hours of manual regex writing with a single command.
Two pattern extraction methods are available:
simple_pattern(default): Fast, regex-based extraction that replaces alphanumeric tokens with<*>placeholders.brain: A smarter ML-based clustering algorithm that preserves semantic meaning and produces more accurate groupings.
Two output modes control how results are returned:
label(default): Adds apatterns_fieldcolumn to each event showing its pattern.aggregation: Groups events by pattern and returnspattern_countandsample_logs.
Syntax
Section titled “Syntax”patterns <field> [by <byClause>] [method=simple_pattern|brain] [mode=label|aggregation] [max_sample_count=<int>] [show_numbered_token=<bool>] [new_field=<name>] [pattern=<regex>] [buffer_limit=<int>] [variable_count_threshold=<int>] [frequency_threshold_percentage=<decimal>]Arguments
Section titled “Arguments”Common arguments
Section titled “Common arguments”| Argument | Required | Default | Description |
|---|---|---|---|
<field> | Yes | — | The text field to analyze for log patterns. |
by <byClause> | No | — | Fields or scalar functions to group logs before pattern extraction. |
method | No | simple_pattern | Pattern extraction method: simple_pattern (fast, regex-based) or brain (ML-based clustering). |
mode | No | label | Output mode: label adds a pattern column to each event; aggregation groups events by pattern. |
max_sample_count | No | 10 | Maximum number of sample log entries returned per pattern in aggregation mode. |
show_numbered_token | No | false | When true, variables use numbered placeholders (<token1>, <token2>) instead of <*>, and aggregation mode includes a tokens mapping. |
new_field | No | patterns_field | Name for the output field that contains the extracted pattern. |
simple_pattern arguments
Section titled “simple_pattern arguments”| Argument | Required | Default | Description |
|---|---|---|---|
pattern | No | Auto-detect | A custom Java regular expression that identifies characters to replace with <*> placeholders. For example, [0-9] replaces only digits. |
brain arguments
Section titled “brain arguments”| Argument | Required | Default | Description |
|---|---|---|---|
buffer_limit | No | 100000 | Maximum internal buffer size (minimum 50000). |
variable_count_threshold | No | 5 | Controls sensitivity to detecting constant vs. variable words. Lower values produce more general patterns. |
frequency_threshold_percentage | No | 0.3 | Minimum word frequency percentage. Words below this threshold are ignored. |
Usage notes
Section titled “Usage notes”- The
patternscommand runs on the coordinator node, not on data nodes. It groups patterns from log messages that have already been returned. - In label mode, each event gets an additional
patterns_fieldcolumn showing its pattern. Use this to visually identify similar log lines. - In aggregation mode, the output contains one row per unique pattern with
pattern_countandsample_logscolumns. This is ideal for understanding log composition at a glance. - The brain method is better at identifying which parts of a log message are variable vs. static. It produces more meaningful groupings than
simple_pattern, especially for complex log formats. - The
byclause lets you discover patterns per group (e.g., per service, per severity level). - Default cluster settings for
patternscan be overridden with cluster settings prefixedplugins.ppl.pattern.*.
When to use patterns
Section titled “When to use patterns”| Use case | Why patterns helps |
|---|---|
| Incident investigation | Quickly answer: “What log patterns appeared during the outage?” |
| Log volume reduction | Identify the noisiest patterns consuming storage and bandwidth |
| Anomaly detection | Spot new or rare patterns that were not seen before |
| Log categorization | Group thousands of unique messages into a manageable set of templates |
| Regex bootstrapping | Use discovered patterns as a starting point for parse or grok rules |
Basic examples
Section titled “Basic examples”Simple pattern discovery (label mode)
Section titled “Simple pattern discovery (label mode)”Add a pattern label to each log body:
source = logs-otel-v1*| patterns body method=simple_pattern| head 20| body | patterns_field |
|---|---|
| 10.0.1.55 - GET /api/v1/agents 200 1234ms | <>.<>.<>.<> - <> /<>/<>/<> <> <><*> |
| 192.168.1.10 - POST /api/v1/invoke 201 567ms | <>.<>.<>.<> - <> /<>/<>/<> <> <><*> |
| 172.16.0.42 - GET /health 200 12ms | <>.<>.<>.<> - <> /<> <> <><*> |
Aggregation mode — group by pattern
Section titled “Aggregation mode — group by pattern”Count how many log entries match each pattern:
source = logs-otel-v1*| patterns body method=simple_pattern mode=aggregation| head 20This returns one row per unique pattern with the count and up to 10 sample log lines.
Brain method — smarter clustering
Section titled “Brain method — smarter clustering”The brain method preserves more semantic meaning than simple_pattern:
source = logs-otel-v1*| patterns body method=brain| head 20The brain algorithm identifies that HTTP methods (GET, POST, etc.), URL paths, and status codes are variable while structural elements (brackets, dashes, quotes) are constant, producing cleaner patterns like:
<*IP*> - <*> [<*>/<*>/<*>:<*>:<*>:<*> <*>] "<*> <*> HTTP/<*>" <*> <*>Custom regex pattern
Section titled “Custom regex pattern”Replace only digits, preserving all other characters:
source = logs-otel-v1*| patterns body method=simple_pattern new_field='no_numbers' pattern='[0-9]'| head 20Aggregation with numbered tokens
Section titled “Aggregation with numbered tokens”Enable numbered tokens to see exactly which parts of the pattern are variable:
source = logs-otel-v1*| patterns body method=simple_pattern mode=aggregation show_numbered_token=true| head 1The output includes a tokens map showing what each <tokenN> placeholder matched, e.g. {'<token1>': ['200'], '<token2>': ['404'], ...}.
Extended examples
Section titled “Extended examples”Discover patterns in OTel log bodies
Section titled “Discover patterns in OTel log bodies”Find the dominant log patterns across all services in your OpenTelemetry data:
source = logs-otel-v1*| patterns body method=brain mode=aggregation| sort - pattern_count| head 20This reveals the most common log message shapes across your entire system. The pattern_count column shows which patterns dominate your log volume — often a small number of patterns account for the vast majority of log entries.
Find dominant patterns per service
Section titled “Find dominant patterns per service”Use the by clause to discover patterns grouped by the originating service:
source = logs-otel-v1*| patterns body method=brain mode=aggregation by `resource.attributes.service.name`| sort - pattern_count| head 30This helps answer questions like: “Which service produces the most repetitive log patterns?” and “Are there services emitting unique patterns that might indicate errors?”
Configuring defaults
Section titled “Configuring defaults”Override the default patterns settings at the cluster level:
PUT _cluster/settings{ "persistent": { "plugins.ppl.pattern.method": "brain", "plugins.ppl.pattern.mode": "aggregation", "plugins.ppl.pattern.max.sample.count": 5, "plugins.ppl.pattern.buffer.limit": 50000, "plugins.ppl.pattern.show.numbered.token": true }}