Events vs. Systems

“Humans tend to read things as events rather than an outcome of a system that can be predicted.”

Thinking in Systems by Donella Meadows

When software systems are failing, we tend to evaluate the causes as either a singular event or even a series of events, and often times we miss the opportunity to understand the system dependencies that led to the event occurring.

For example, a database outage is caused by a misconfiguration or a slow-running query, and the team quickly fixes the issue. Or one of our customers has an unexpected spike in traffic, affecting the entire system and all customers. The team quickly adds capacity to the infrastructure so that it won’t happen again. And, to be clear, the first priority is to get back operational and create time and space to assess further.

However, these issues occur as the result of a series of interconnected systems, and if you don’t evaluate the entire system that put you on the path to the issue, you are setting yourself up for more failures in the future.

If we revisit the traffic spike as an example, we might find that the customer who experienced the spike is aware of the possibility of traffic spikes due to cyclical buying habits with their customers. They may not be our ideal customer, but our sales team closed the sale with no validation from a sales engineer. We planted the seeds of today’s failure back in our sales process.

When you understand that every part of your company is a system, and that all outcomes are the result of the interactions between these systems, and not a singular system, it can unlock profound understanding of why things are failing.

  • You sell your product to the wrong customer
  • That customer demands more mods to the legal agreement
  • During implementation, the customer demands that new features be built to address a use case that you had not planned to address
  • Your engineering team gets pulled into the firefight to get the new feature
  • Planned work gets delayed
  • Your customers become frustrated with the lack of progress on these committed roadmap items and leave
  • Your sales team now has a bigger hurdle to overcome
  • And on it goes

The next time you have a failure in your business, resist the temptation to view the failure as a singular event and troubleshoot the event. Instead, take the extra time to understand what might have happened in other parts of the business that put you in the position where the event could occur. Chances are you’ll find you could’ve seen it coming.