Skip to content

patterns

import { Aside } from ‘@astrojs/starlight/components’;

The patterns command is one of PPL’s most powerful features for log analysis. It automatically extracts patterns from unstructured text by replacing variable parts (numbers, IPs, timestamps, identifiers) with <*> placeholders, grouping similar log lines together. This replaces hours of manual regex writing with a single command.

Two pattern extraction methods are available:

  • simple_pattern (default): Fast, regex-based extraction that replaces alphanumeric tokens with <*> placeholders.
  • brain: A smarter ML-based clustering algorithm that preserves semantic meaning and produces more accurate groupings.

Two output modes control how results are returned:

  • label (default): Adds a patterns_field column to each event showing its pattern.
  • aggregation: Groups events by pattern and returns pattern_count and sample_logs.
patterns <field> [by <byClause>]
[method=simple_pattern|brain]
[mode=label|aggregation]
[max_sample_count=<int>]
[show_numbered_token=<bool>]
[new_field=<name>]
[pattern=<regex>]
[buffer_limit=<int>]
[variable_count_threshold=<int>]
[frequency_threshold_percentage=<decimal>]
ArgumentRequiredDefaultDescription
<field>YesThe text field to analyze for log patterns.
by <byClause>NoFields or scalar functions to group logs before pattern extraction.
methodNosimple_patternPattern extraction method: simple_pattern (fast, regex-based) or brain (ML-based clustering).
modeNolabelOutput mode: label adds a pattern column to each event; aggregation groups events by pattern.
max_sample_countNo10Maximum number of sample log entries returned per pattern in aggregation mode.
show_numbered_tokenNofalseWhen true, variables use numbered placeholders (<token1>, <token2>) instead of <*>, and aggregation mode includes a tokens mapping.
new_fieldNopatterns_fieldName for the output field that contains the extracted pattern.
ArgumentRequiredDefaultDescription
patternNoAuto-detectA custom Java regular expression that identifies characters to replace with <*> placeholders. For example, [0-9] replaces only digits.
ArgumentRequiredDefaultDescription
buffer_limitNo100000Maximum internal buffer size (minimum 50000).
variable_count_thresholdNo5Controls sensitivity to detecting constant vs. variable words. Lower values produce more general patterns.
frequency_threshold_percentageNo0.3Minimum word frequency percentage. Words below this threshold are ignored.
  • The patterns command runs on the coordinator node, not on data nodes. It groups patterns from log messages that have already been returned.
  • In label mode, each event gets an additional patterns_field column showing its pattern. Use this to visually identify similar log lines.
  • In aggregation mode, the output contains one row per unique pattern with pattern_count and sample_logs columns. This is ideal for understanding log composition at a glance.
  • The brain method is better at identifying which parts of a log message are variable vs. static. It produces more meaningful groupings than simple_pattern, especially for complex log formats.
  • The by clause lets you discover patterns per group (e.g., per service, per severity level).
  • Default cluster settings for patterns can be overridden with cluster settings prefixed plugins.ppl.pattern.*.
Use caseWhy patterns helps
Incident investigationQuickly answer: “What log patterns appeared during the outage?”
Log volume reductionIdentify the noisiest patterns consuming storage and bandwidth
Anomaly detectionSpot new or rare patterns that were not seen before
Log categorizationGroup thousands of unique messages into a manageable set of templates
Regex bootstrappingUse discovered patterns as a starting point for parse or grok rules

Add a pattern label to each log body:

source = logs-otel-v1*
| patterns body method=simple_pattern
| head 20
bodypatterns_field
10.0.1.55 - GET /api/v1/agents 200 1234ms<>.<>.<>.<> - <> /<>/<>/<> <> <><*>
192.168.1.10 - POST /api/v1/invoke 201 567ms<>.<>.<>.<> - <> /<>/<>/<> <> <><*>
172.16.0.42 - GET /health 200 12ms<>.<>.<>.<> - <> /<> <> <><*>

Try in playground →

Count how many log entries match each pattern:

source = logs-otel-v1*
| patterns body method=simple_pattern mode=aggregation
| head 20

This returns one row per unique pattern with the count and up to 10 sample log lines.

Try in playground →

The brain method preserves more semantic meaning than simple_pattern:

source = logs-otel-v1*
| patterns body method=brain
| head 20

The brain algorithm identifies that HTTP methods (GET, POST, etc.), URL paths, and status codes are variable while structural elements (brackets, dashes, quotes) are constant, producing cleaner patterns like:

<*IP*> - <*> [<*>/<*>/<*>:<*>:<*>:<*> <*>] "<*> <*> HTTP/<*>" <*> <*>

Try in playground →

Replace only digits, preserving all other characters:

source = logs-otel-v1*
| patterns body method=simple_pattern new_field='no_numbers' pattern='[0-9]'
| head 20

Try in playground →

Enable numbered tokens to see exactly which parts of the pattern are variable:

source = logs-otel-v1*
| patterns body method=simple_pattern mode=aggregation show_numbered_token=true
| head 1

The output includes a tokens map showing what each <tokenN> placeholder matched, e.g. {'<token1>': ['200'], '<token2>': ['404'], ...}.

Try in playground →

Find the dominant log patterns across all services in your OpenTelemetry data:

source = logs-otel-v1*
| patterns body method=brain mode=aggregation
| sort - pattern_count
| head 20

This reveals the most common log message shapes across your entire system. The pattern_count column shows which patterns dominate your log volume — often a small number of patterns account for the vast majority of log entries.

Try in playground →

Use the by clause to discover patterns grouped by the originating service:

source = logs-otel-v1*
| patterns body method=brain mode=aggregation by `resource.attributes.service.name`
| sort - pattern_count
| head 30

This helps answer questions like: “Which service produces the most repetitive log patterns?” and “Are there services emitting unique patterns that might indicate errors?”

Try in playground →

Override the default patterns settings at the cluster level:

PUT _cluster/settings
{
"persistent": {
"plugins.ppl.pattern.method": "brain",
"plugins.ppl.pattern.mode": "aggregation",
"plugins.ppl.pattern.max.sample.count": 5,
"plugins.ppl.pattern.buffer.limit": 50000,
"plugins.ppl.pattern.show.numbered.token": true
}
}
  • parse — extract specific fields using Java regex when you know the pattern
  • grok — extract fields using predefined grok patterns for known log formats
  • rex — regex extraction with sed mode and multiple match support