SELF-LEARNING AGENT

Gets smarter with every investigation.

The agent learns correct queries, correlations, and resolution paths from past incidents. The next time, it skips the dead ends.

HOW IT WORKS

Watch the agent learn in real time.

The same alert fires twice. The first time, the agent explores and makes mistakes. The second time, it remembers and goes straight to the answer.

Investigation #1 first encounter

Query accuracy improvement

On the first encounter, the agent may query the wrong metric name, get an error, and retry with the corrected name. It stores this correction so that next time it queries the right metric on the first try.

Zero repeat errorsOnce a metric name is corrected, the agent never gets it wrong again.

Faster first responseCorrect queries on the first try mean faster data retrieval and quicker root cause identification.

First encounter

▶ get_metrics("latency.p95") ✗ not found

retry

▶ get_metrics("trace.duration.p95") ✓ found

★ saved correction to memory

Next encounter

🧠 get_metrics("trace.duration.p95") ✓ first try

← recalled correct name from memory

Correlation learning

When investigating root cause, the agent explores multiple hypotheses. After finding that a p95 spike was caused by pod memory pressure, it stores this correlation. Next time, it skips the dead ends entirely.

Skip dead endsHypotheses that led nowhere are deprioritized – the agent won't waste time checking deploys if they've never been the cause.

Pattern recognitionSymptom-to-root-cause links grow stronger with each investigation, building a permanent knowledge base.

First encounter

▶ check recent deploys ✗ none

▶ check config changes ✗ none

▶ check pod metrics ✓ OOM 94%

★ saved: p95 spike → pod memory

Next encounter

check deploys skipped

check config skipped

🧠 get_pod_metrics() ✓ confirmed

← recalled correlation from memory

GROWING KNOWLEDGE

What the agent learns.

Every investigation adds to a permanent, compounding knowledge base specific to your infrastructure.

Metric name corrections

Wrong metric names are mapped to correct ones, eliminating repeat query failures.

"latency.p95" → "trace.duration.p95"

Symptom → root cause links

The agent stores which symptoms map to which root causes, skipping exploration.

"p95 spike" → "pod memory pressure"

Effective tool sequences

Which tools in which order lead to the fastest resolution for a given alert type.

OOM → pod_metrics → container_logs → hpa_status

Dead-end paths

Hypotheses that never lead to root cause are deprioritized permanently.

check deploys – 0/4 investigations

THE RESULT

Compounding speed.

Every investigation makes the next one faster. The improvements are cumulative and permanent.

Investigation #1 Investigation #10 Change

Tool calls 6 3 ↓ 50%

Errors 1 0 ↓ 100%

Resolution time 34s 12s ↓ 65%

Dead ends explored 2 0 ↓ 100%

Patterns in memory 0 12 ↑ growing

65% faster resolution

50% fewer tool calls

100% patterns retained

SELF-LEARNING AI AGENTS

See the self-learning agent, in action.

Connect your stack and watch the agent get smarter with every investigation.