AI in Production

All articles filed under AI in Production.

5 Articles

How we turned on-call judgment into skills an AI agent can load

An AI agent in the incident channel can run kubectl and read a dashboard. What it can't do is judge whether the last deploy is the suspect or a red herring. We open-sourced the SRE skills that encode that judgment, runnable offline against fixtures with no credentials.

Louis Fradin · Jun 9, 2026

Top 10 AI SRE Tools in 2026: A Comprehensive Comparison

The 10 best AI SRE tools in 2026 compared by architecture, root cause analysis, remediation, and change awareness — from Anyshift's versioned graph to Resolve AI's autonomous agents.

Roxane Fischer · Mar 13, 2026

Agentic Context Engineering in Production: How AI Agents Build Institutional Expertise

AI agents start every run from scratch. ACE (Agentic Context Engineering) gives them institutional memory that evolves through use, cutting root cause analysis time by 30%.

Ghazi Felhi · Mar 11, 2026

Why AI-SRE Needs Topology, Not Just Telemetry

The limits of telemetry-only AI approaches to SRE and why topology is the missing piece.

Roxane Fischer · Jan 27, 2026

Navigating AI in your Infrastructure: Dos, Don'ts, and Why It Matters

GenAI is everywhere. But very often, the cool and exciting demos don't work the same way in production.

Roxane Fischer · Oct 15, 2024

← Back to all posts