Home » ComputerScience » A cautionary tale of automation

A cautionary tale of automation

Joyent, a cloud provider outlines what caused a major outage:

https://www.joyent.com/blog/postmortem-for-outage-of-us-east-1-may-27-2014

Automation

I’m a huge fan of automation. I started about 20 years ago scripting common tasks that I performed as part of my day to day job. It made my life easier. For instance, at one job, instead of going through the 15 steps of creating a user account, I developed a script that when kicked off, did everything. What was a 30 minute process became a 5 second process.

Over the years, the tools that we use for automation have grown more powerful and complex. If you’ve been reading my blog, you may have noticed a developing keen interest I have with Ansible. With Ansible, I can create a configuration change and deploy it to hundreds or thousands of servers in moments. This is incredibly powerful .. or dangerous.

If the configuration I deploy across my enterprise is incorrect, I could potentially render each of my servers useless. With one swift command, I could wipe out the data on every single server we have. As his Uncle Ben once told Spiderman, “with power comes great responsibility”.

My advice –

  •  run automations against test servers first
  • Do dry runs first – It’s better to be safe than sorry
  • implement your automation in stages .. first your TEST servers, then DEV, QA/UAT, DR than finally PROD. – do checks each step of the way to verify that you’re getting the results that you want.
  • Make backup copies of configuration files.  Ansible has this ability, but it is turned off by default.  I recommend always turning this feature on with the backup=yes directive.

Leave a Reply