ml
import { Aside } from ‘@astrojs/starlight/components’;
The ml command applies machine learning algorithms from the ML Commons plugin directly in your PPL query pipeline. It supports anomaly detection using Random Cut Forest (RCF) and clustering using k-means, running train-and-predict operations in a single step.
Syntax
Section titled “Syntax”Anomaly detection (time-series):
ml action='train' algorithm='rcf' time_field=<field> [<parameters>]Anomaly detection (batch/non-time-series):
ml action='train' algorithm='rcf' [<parameters>]K-means clustering:
ml action='train' algorithm='kmeans' [<parameters>]Arguments
Section titled “Arguments”RCF time-series parameters
Section titled “RCF time-series parameters”| Argument | Required | Default | Description |
|---|---|---|---|
time_field=<field> | Yes | — | The timestamp field for time-series analysis. |
number_of_trees=<int> | No | 30 | Number of trees in the forest. |
shingle_size=<int> | No | 8 | Consecutive records in a shingle (sliding window). |
sample_size=<int> | No | 256 | Sample size for stream samplers. |
output_after=<int> | No | 32 | Minimum data points before results are produced. |
time_decay=<float> | No | 0.0001 | Decay factor for stream samplers. |
anomaly_rate=<float> | No | 0.005 | Expected anomaly rate (0.0 to 1.0). |
date_format=<string> | No | yyyy-MM-dd HH:mm:ss | Format of the time field. |
time_zone=<string> | No | UTC | Time zone of the time field. |
category_field=<field> | No | — | Group input by category; prediction runs independently per group. |
RCF batch (non-time-series) parameters
Section titled “RCF batch (non-time-series) parameters”| Argument | Required | Default | Description |
|---|---|---|---|
number_of_trees=<int> | No | 30 | Number of trees in the forest. |
sample_size=<int> | No | 256 | Random samples per tree from training data. |
output_after=<int> | No | 32 | Minimum data points before results are produced. |
training_data_size=<int> | No | Full dataset | Size of the training dataset. |
anomaly_score_threshold=<float> | No | 1.0 | Score threshold above which a point is anomalous. |
category_field=<field> | No | — | Group input by category; prediction runs independently per group. |
K-means parameters
Section titled “K-means parameters”| Argument | Required | Default | Description |
|---|---|---|---|
centroids=<int> | No | 2 | Number of clusters. |
iterations=<int> | No | 10 | Maximum iterations for convergence. |
distance_type=<type> | No | EUCLIDEAN | Distance metric: COSINE, L1, or EUCLIDEAN. |
Output fields
Section titled “Output fields”RCF time-series output
Section titled “RCF time-series output”The command appends these fields to each row:
| Field | Description |
|---|---|
score | Anomaly score (higher = more anomalous). |
anomaly_grade | Anomaly grade (0.0 = normal, higher = more anomalous). |
RCF batch output
Section titled “RCF batch output”| Field | Description |
|---|---|
score | Anomaly score. |
anomalous | Boolean indicating whether the point is anomalous (True/False). |
K-means output
Section titled “K-means output”| Field | Description |
|---|---|
ClusterID | The cluster assignment (integer starting from 0). |
Usage notes
Section titled “Usage notes”- For time-series RCF, ensure data is ordered by the time field before passing to
ml. The algorithm expects sequential data. - The
output_afterparameter controls the warm-up period. The first N data points will have a score of 0 while the model learns normal patterns. - Batch RCF treats each data point independently, making it suitable for detecting outliers in non-sequential data.
- K-means works best when numeric fields are on similar scales. Consider normalizing with
evalbefore clustering. - Use
category_fieldto run independent models per category (e.g., per service), avoiding cross-contamination between different baseline behaviors.
Examples
Section titled “Examples”Detect anomalous latency in time-series data
Section titled “Detect anomalous latency in time-series data”Aggregate span duration into 1-minute buckets and detect anomalies:
source = otel-v1-apm-span-*| stats avg(durationInNanos) as avg_latency by span(startTime, 1m) as minute| ml action='train' algorithm='rcf' time_field='minute'| where anomaly_grade > 0| sort - anomaly_gradeTime-series anomaly detection by service
Section titled “Time-series anomaly detection by service”Run independent anomaly detection per service:
source = otel-v1-apm-span-*| stats avg(durationInNanos) as avg_latency by span(startTime, 1m) as minute, serviceName| ml action='train' algorithm='rcf' time_field='minute' category_field='serviceName'| where anomaly_grade > 0Batch outlier detection on request durations
Section titled “Batch outlier detection on request durations”Detect unusually slow spans without considering time ordering:
source = otel-v1-apm-span-*| ml action='train' algorithm='rcf'| where anomalous = 'True'Cluster services by error and latency behavior
Section titled “Cluster services by error and latency behavior”Use k-means to group services into behavioral clusters based on error rate and average latency:
source = otel-v1-apm-span-*| stats avg(durationInNanos) as avg_duration, count() as total, sum(case(status.code = 2, 1 else 0)) as errors by serviceName| eval error_rate = errors * 100.0 / total| ml action='train' algorithm='kmeans' centroids=3Tune anomaly detection sensitivity
Section titled “Tune anomaly detection sensitivity”Lower the anomaly_rate and increase shingle_size for stricter detection with more context:
source = otel-v1-apm-span-*| stats avg(durationInNanos) as avg_latency by span(startTime, 1m) as minute| ml action='train' algorithm='rcf' time_field='minute' anomaly_rate=0.001 shingle_size=16| where anomaly_grade > 0Extended examples
Section titled “Extended examples”End-to-end latency anomaly investigation (OTel)
Section titled “End-to-end latency anomaly investigation (OTel)”Detect anomalous latency spikes and then find the specific traces responsible:
source = otel-v1-apm-span-*| stats avg(durationInNanos) as avg_latency, max(durationInNanos) as max_latency by span(startTime, 5m) as window, serviceName| ml action='train' algorithm='rcf' time_field='window' category_field='serviceName'| where anomaly_grade > 0| sort - anomaly_grade| head 10After identifying the anomalous time windows, investigate individual traces:
source = otel-v1-apm-span-*| where serviceName = 'checkout'| sort - durationInNanos| head 20Cluster OTel services by operational profile
Section titled “Cluster OTel services by operational profile”Group services by their token usage, latency, and throughput characteristics to identify operational tiers:
source = otel-v1-apm-span-*| stats avg(durationInNanos) as avg_duration, count() as throughput by serviceName| eval avg_duration_ms = avg_duration / 1000000| ml action='train' algorithm='kmeans' centroids=3 distance_type=EUCLIDEANSee also
Section titled “See also”- stats — aggregate data before feeding to ML algorithms
- eventstats — append aggregation results alongside original events
- trendline — simple and weighted moving averages
- eval — normalize fields before clustering