The purpose of this is to be the smallest intro to the things I want my coworkers and mentees to know. This is not comprehensive.

Basics

Normal operation

  1. What is blue/green deployment?
  2. what are nines and why you can’t have all of them
  3. servers talk to each other in many ways
  4. Why should configuration be code? So you can version control it. Version controlling your config is good because then you know what changed and why. This does not contradict the ability to auto-scale.
  5. What is auto-scaling, why do you want it, and what are the dangers? It’s when your configuration spins up more servers when it senses load (number of users / amount of usage) is rising. You want it so that you don’t have to have extra servers when you’re not using them, and so you don’t max out your capacity when you have lots of users (for healthtech, think the days before/after a holiday weekend are high, then the holiday weekend is low). Common dangers include scaling up very far and getting a big bill, or not scaling fast enough and having users feel that something is slow.

Disaster recovery:

  1. backup everything, and if you didn’t test your disaster recovery backup, it doesn’t exist
  2. geographical distribution for key resources (if all your NY employees are in a giant storm and have no power, you want to give oncall to someone who is somewhere else who actually has power)
  3. have backup communications (if slack is down, use google hangouts or cell phones. If zoom is down, use google hangouts) - Don’t be like facebook

Other resources

  1. the google book but it’s long and 30% bullshit
  2. Sigje’s and ryn’s book But it’s 5yo old now. But it is mostly talking about cultural stuff so that ages better…
  3. Phoenix Project is the traditional thing to give people, but a novel can be a hard sell, and it doesn’t cover technical bits at all
  4. https://github.com/Lets-DevOps/awesome-learning
  5. https://github.com/Tikam02/DevOps-Guide