Production Debugging

All articles filed under Production Debugging.

5 Articles

How to Trace a Production Incident Back to the Commit

Burned 25 minutes on a Friday-morning page before I realized the responsible commit was in another team's repo. This is the four-command sequence I now run when an alert lands and `git log` on my own service comes up empty, with the outputs at each step and where the search space gets cut.

Louis Fradin · May 15, 2026

My Workers Stopped Polling: a K8s + Temporal Whodunit

Temporal workflows stuck in Running with zero pollers, and Temporal still reports a healthy task queue. The root cause lives one layer down: a CrashLoopBackOff in the Kubernetes worker pod, caused by a single bad environment variable. A walkthrough of debugging Temporal workers on Kubernetes the manual way (10 minutes), then with an infrastructure context layer that bridges the two systems (seconds).

Louis Fradin · Apr 8, 2026

Common Weak Points in Infrastructure Management: An In-Depth Guide

Managing infrastructure at scale is a complex endeavor that demands meticulous planning, robust tooling, and continuous adaptation.

Roxane Fischer · Sep 19, 2024

5 Key Reasons You're Struggling to Debug Your Infrastructure in Under an Hour

Most infrastructure debugging sessions blow past the one-hour mark for the same five structural reasons: scattered visibility across cloud accounts, missing historical state, terraform plan output that hides downstream impact, runbooks that lag the live infrastructure, and post-merger environments that no one has fully mapped. A walkthrough of each, with concrete examples and what reduces the time.

Roxane Fischer · Jul 30, 2024

Top 3 Weak Points in Your Infrastructure and how to mitigate them

Three structural patterns recur in growing infrastructure orgs: single-repo bottlenecks where dozens of teams share one approval queue, ClickOps and dead IaC code that drift outside any state file, and module version fragmentation that quietly bypasses security patches. A walkthrough of each, with the practices that contain the blast radius.

Roxane Fischer · Jul 30, 2024

← Back to all posts