Please read the first part of this series and then come again. :)
Fail #1: “Not Defining the Solution”
One of the core principles of chaos engineering is to run experiments with a clear hypothesis. It sound so simple, but I think it is the most difficult part to find an improvement to test again.
I lanched a chaos engineer experiment to test the resilence of my network and the delays. I started the experiment without understanding the delay problem well.
I had no baseline for succes or failure. WHen the injection latency should happen and then how logs might be the duration, I had no clue about the result. It only runs completed; It was the only response.
My experiment -> pods in different nodes with delay -> pods with affinity and how they work.
My major takeaway here was defining the scenario in my environment before initiating an experiment is important to check if everything is working well. Otherwise, itβs easy to get lost in the noise of unanticipated issues that arise during the test.
In conclusion, I made basic mistakes in setting up the cluster, but each failure has been a lesson learned. I will keep doing it. It’s important to take a break. So this is my break.
Learned List Links:
-
πpods afinity
-
πdebugging dns