Insights

Articles on reliability, leadership, and platform engineering. Lessons from the field and late-night debugging sessions.

Interactive Demos

Hands-On Learning

Interactive simulations to practice incident response and observability skills.

SRE Dashboard

Experience a real 2AM incident response — golden signals, distributed traces, log correlation.

Respond to incident →

P1 Incident Simulator

Black Friday checkout outage scenario. Make decisions, get scored on your SRE skills.

Start simulation →

SRE Learning Hub

Interactive modules on observability and resilience. Percentiles, SLOs, chaos engineering, and more.

Start learning →

Core Insights

Showing 1–10 of 16

Per page

5/20/202620 min read

The context tax, and how to stop paying it

Every AI feature you ship runs a meter you can't see on the dashboard. It shows up as a token bill that climbs faster than usage and answers that quietly get worse. The reflex is to blame the model and swap in a cheaper one. That fixes nothing. The charge comes from the harness you built around it, collected on every turn. Here's where it comes from and how to stop feeding it.

Read article

5/13/202617 min read

Every new hire goes through a scavenger hunt. I found an alternate way and won at it.

Most engineering onboarding fails not because the documentation is bad but because documentation was never the right tool for the job. Here's the architecture that retrieves the connected trail of decisions instead.

Read article

5/11/202623 min read

We have 400 dashboards and still don't know if we're healthy

Somewhere between 'we have no visibility' and 'we have too much data to make sense of,' something went wrong. This is about what that something is and how the signals your systems emit can either answer the health question or make it impossible and make you go bonkers on a 2AM page.

Read article

5/3/202614 min read

Burning tokens or building outcomes?

Companies are buying AI tools for coding, operations, support, and data analysis. The real question is whether the token spend is turning into better work, measurable outcomes, and OKRs the business can defend.

Read article

4/3/202617 min read

Building AI you can actually trust

Guardrails, secure code, environment segregation, and the review patterns we built for a healthcare AI platform under HIPAA and SOC 2 scrutiny. No fluff. Just what we actually shipped.

Read article

3/25/202619 min read

How AI Actually Works: Claude, ChatGPT and LLMs Explained Simply

No jargon, no hype. This guide explains how ChatGPT and Claude actually work under the hood: how they read your question, find context, call tools, remember things, and write an answer. Covers AI safety, guardrails, real risks, and why the doomsday headlines miss the point. Written for anyone, not just engineers.

Read article

3/21/202641 min read

Agentic AI for the enterprise

The term AI agent gets used for everything from a chatbot with one tool call to a system autonomously managing production infrastructure across a dozen APIs. That gap matters. This is a practitioner breakdown of what an agent actually is, how to build one properly, and what it looks like when you deploy the pattern in healthcare IT and banking operations today.

Read article

3/13/202618 min read

Claude skills, MCP, and knowing which one to use.

I use Claude constantly. At some point the repeat instructions piling up in every chat become a productivity tax. Skills handle how Claude thinks and executes. MCP handles what Claude can see and touch. Get that separation right from the start and everything else follows cleanly.

Read article

3/7/202614 min read

When the Grid Goes Dark

If crew dispatch is this chaotic at a small contracting company, what does it look like for a utility during a Midwest ice storm? The fix is just good engineering — the kind that architects think about before anyone writes a line of code.

Read article

2/27/202618 min read

Observe for Observability

Most observability platforms give you logs, metrics, and traces in separate tabs and call it a single pane of glass. Observe actually connects them. A practitioner breakdown of how Observe handles the three pillars, what the knowledge graph model means in practice, and where it compares to New Relic.

Read article

Insights

Interactive Demos

SRE Dashboard

P1 Incident Simulator

SRE Learning Hub

Core Insights

The context tax, and how to stop paying it

Every new hire goes through a scavenger hunt. I found an alternate way and won at it.

We have 400 dashboards and still don't know if we're healthy

Burning tokens or building outcomes?

Building AI you can actually trust

How AI Actually Works: Claude, ChatGPT and LLMs Explained Simply

Agentic AI for the enterprise

Claude skills, MCP, and knowing which one to use.

When the Grid Goes Dark

Observe for Observability

Accessibility