Photography by Roger Brown Photography on Shutterstock

Prepping

Created on 2021-03-12 21:41

Published on 2021-03-17 17:17

"Anything that can go wrong will go wrong" (Murphy's Law)

"Preppers" are persons that spend a lot of time preparing for an imminent disaster. In case of doomsday, they might survive it as they planned minutely for any possible outcome long before. Preppers are stocking equipment, clothes, food and often create shelters so that they could face even the most atrocious situations with relative calm. Prepping puts some financial strain on people but not as much one might think as preppers' most important assets are knowledge and improvisation. Regardless how elaborated the disaster recovery plan is, it's never perfect so creativity should always be welcomed.

Preppers use some techniques that IT should learn from:

Rehearsal - preppers really exercise their plan. They try surviving wilderness, with their bushcraft skills, thus testing their theories and equipment. In IT from time to time it's worthy simulating a nasty situation in the infrastructure. It doesn't need to be a complicated one that would imply the whole SimianArmy but at least simulate a disaster, solve it and take notes. Ask colleagues to stop a service or make a ethernet loop in a switch. Learn where to look for clues and when the incident is solved try to automate the solution. There are organizations that take rehearsals very seriously. I've been part of probably three fire simulation drills at Adastral Park in less than a month but I can say that It was the only occasion where I've seen so much calm and organization as people knew their drill.
Share information - preppers are sometime organized in clubs and they share lots of relevant information on materials and techniques. Community driven DRP might be interesting as people come with different viewpoints that might reveal flaws or provide better solutions. Information sharing might be the most important source of learning.
Use cheap and easy to find materials. For example many things can be simulated on scaled down versions of the real systems. A full K8S/EKS cluster can be simulated with a petty RaspberryPi running K3S. Most of the things will be the same but costs will be negligible. The differences can be documented and there are also lots of third parties that can simulate other services cheaply (Lambdas, Object Storage). In some situation one will have to improvise massively to recover from a disaster and then knowing what to do with almost off-the-shelve components becomes invaluable.
Invest in understandable systems - preppers try to understand nature and mechanisms so that they increase their survival odds. Although automation is a must it should be transparent to the engineers and clearly present what's behind the scenes. If the state is extremely bad and automation magic stopped working then understanding the system might be handy at least to go back in a restorable state with minimum effort. I for one was in a similar situation when I discovered that the backup I made had to be restored on a disk with different physical characteristics. Knowing what was the backup process I was able to hack the restore on the new drive although the tool was not supporting it.
Keep stashes - preppers often have hidden stashes of food and tools around the house. Engineers should also make stashes of resources if possible offsite or in different clouds. If something goes horribly wrong at least some things could be partially running. An OCI solution, replicated on a different vendor's platform, would make a lot of sense so if a provider is unavailable restoring the solution on another would not mean an entire rewrite but rather a reconfiguration. Vendor agnostic IAC solutions and CNI packed services are handy in this case - they can be easily run on different environments.
Have spare batteries - preppers don' t rely on a single centralised solution – e.g. electricity; they always have alternate independent energy sources that they can use - batteries, generators, etc. In the light of the recent OVH fire let's imagine the following scenario: the company's automation solution (Chef, Puppet, Octopus) is also affected somehow then the situation will be similar to electricity or transport failure. On the other hand, having the IAC implemented through standalone tools (Terraform, Ansible) increase the resilience as there is no longer a single point of failure, any engineer could replicate the infrastructure from his/her own machine as long as one can connect to the datacenter/cloud.
Learn how to use and build tools - preppers spend time in learning how to build bows, arrows, tents and also how to use them. Off the shelf tools are fine (backup/restore, IAC, etc) but sometime one needs to do specific operations (e.g. recover data from an ZFS formatted drive). Then building small and specific tools would be invaluable so a programming language that permits rapid development is handy - currently I see that people are looking towards Python and Go.

What I am trying to say is that pessimism and preparation are key for survivability in case of a disaster. Even a digital one.

Nilu's Thinkyard

Wednesday, March 17, 2021

Prepping

Prepping

No comments:

Post a Comment