SELF-LEARNING AGENT

Gets smarter with every investigation.

The agent learns correct queries, correlations, and resolution paths from past incidents. The next time, it skips the dead ends.

HOW IT WORKS

Watch the agent learn in real time.

The same alert fires twice. The first time, the agent explores and makes mistakes. The second time, it remembers and goes straight to the answer.

Investigation #1 first encounter
1

Query accuracy improvement

On the first encounter, the agent may query the wrong metric name, get an error, and retry with the corrected name. It stores this correction so that next time it queries the right metric on the first try.

Zero repeat errorsOnce a metric name is corrected, the agent never gets it wrong again.
Faster first responseCorrect queries on the first try mean faster data retrieval and quicker root cause identification.
First encounter
get_metrics("latency.p95") ✗ not found
retry
get_metrics("trace.duration.p95") ✓ found
★ saved correction to memory
Next encounter
🧠 get_metrics("trace.duration.p95") ✓ first try
← recalled correct name from memory
2

Correlation learning

When investigating root cause, the agent explores multiple hypotheses. After finding that a p95 spike was caused by pod memory pressure, it stores this correlation. Next time, it skips the dead ends entirely.

Skip dead endsHypotheses that led nowhere are deprioritized – the agent won't waste time checking deploys if they've never been the cause.
Pattern recognitionSymptom-to-root-cause links grow stronger with each investigation, building a permanent knowledge base.
First encounter
check recent deploys ✗ none
check config changes ✗ none
check pod metrics ✓ OOM 94%
★ saved: p95 spike → pod memory
Next encounter
check deploys skipped
check config skipped
🧠 get_pod_metrics() ✓ confirmed
← recalled correlation from memory
GROWING KNOWLEDGE

What the agent learns.

Every investigation adds to a permanent, compounding knowledge base specific to your infrastructure.

Metric name corrections

Wrong metric names are mapped to correct ones, eliminating repeat query failures.

"latency.p95" → "trace.duration.p95"

Symptom → root cause links

The agent stores which symptoms map to which root causes, skipping exploration.

"p95 spike" → "pod memory pressure"

Effective tool sequences

Which tools in which order lead to the fastest resolution for a given alert type.

OOM → pod_metrics → container_logs → hpa_status

Dead-end paths

Hypotheses that never lead to root cause are deprioritized permanently.

check deploys – 0/4 investigations
THE RESULT

Compounding speed.

Every investigation makes the next one faster. The improvements are cumulative and permanent.

Investigation #1 Investigation #10 Change
Tool calls 6 3 ↓ 50%
Errors 1 0 ↓ 100%
Resolution time 34s 12s ↓ 65%
Dead ends explored 2 0 ↓ 100%
Patterns in memory 0 12 ↑ growing
65% faster resolution
50% fewer tool calls
100% patterns retained
SELF-LEARNING AI AGENTS

See the self-learning agent, in action.

Connect your stack and watch the agent get smarter with every investigation.