Facebook’s little mishap this week prompted the joke that some poor network engineer might be getting fired for mishandling a configuration file.
But in truth, I hope not.
It recalled to mind all the times I’ve seen companies use punitive measures when things go wrong, and every single time it’s been the completely wrong decision.
Why is that?
A massive outage is usually not the fault or responsibility of a single engineer. In truth, such systems are built by many different people over long periods of time. There should be a multitude of checks and automated systems and redundancies and failovers, all to make sure these things don’t happen.
If a lone engineer can become a single point of failure for such a system, the error was inherent in the system anyway.
To punish an engineer or team because of a systemic or process flaw is missing the point. Rather, the company should have an honest accounting of how such a failure occurred and ensure it never happens again.
Punishing the so-called engineers responsible only encourages a culture of fear and CYA. It inevitably leads to worse technical organizations.