- Site Reliability Engineering Books
- Chapter 6 - Monitoring Distributed Systems
- Chapter 12 - Effective Troubleshooting
- Video Hotkeys
I wanted to recommend these Site Reliability Engineering books, or just SRE books for short. These books offer lots of great advice, along with useful pattern for sysadmins, devops, and operations folks. The SRE book, along with this workbook, are in my opinion the best books out today on this topic. They are also totally free and you can view them on-line too. It used to be that you had to purchase these, well you still can, but they have opened them up on-line too. Which, is an awesome resource to checkout.
You can view the table of contents here, and there is lots of stuff around setting infrastructure up, troubleshooting, logging, monitoring, being on-call, etc. There are a couple really cool chapters that I wanted to call out though. This is all linked in the episode notes below too.
The first one, is monitoring distributed systems, and this applies heavily to running containers in production too, as you just have so many moving parts. There is a pretty cool description of the four golden metrics that you likely want to track too. They are, latency, traffic, errors, and service saturation. I’d suggest reading through this but, but my main takeaway is that monitoring these key metrics, will help you in spotting potential problems, and also greatly assist in after the fact troubleshooting. We’ll be deploying monitoring shortly to our audio-to-text transcription build project, in episodes #65 and #66, if you’re following along. Should be pretty cool.
The second chapter, that I wanted to call out, is on Troubleshooting. This chapter, gives you a pretty good pattern to follow around troubleshooting problems. The chapter, covers the theory, what logging you likely want to have in place, along with metrics you probably want to keep an eye on. This chapter also walks you an example case-study. I find myself constantly checking these book out, and recommending chapters to people, when they ask about questions and I know a good resource to point them at. So, I cannot recommend checking these out enough, especially since they are online for free, you cannot go wrong.
Next, I wanted to quickly let you know that I have added video playback hotkeys to the website. If you check out, my about page, down here at the bottom, I have added a couple notes about what video playback hotkeys are enabled. For example, you can use the arrow keys to seek the video, forwards, and backwards a few second. You can also use the number keys to jump ahead too. So, this might help you out when you’re watching videos on my site. I wanted to thank a subscriber for who suggested this feature too. These types of suggestions help improve the site for everyone! Also, if you have any suggestions on how I can improve the site, please let me know, I am more than happy to have your feedback on how I can improve things!
Finally, the last final quick tip I wanted to share, is around using the command line. I often find myself wanting to rerun a command from weeks to months ago. Things like, a command I used for building a docker container, or maybe some for loop that I was using to do something. So, this isn’t the prettiest thing out there, but I often use the history command, and then pipe the results through grep, using the dash I option, for case insensitive grepping, and then type what I am looking for. Using this, you can quickly get a command dump from what you were working on long before.
history | grep -i docker
So, you can see a bunch of docker commands that I had run recently, but I’m doing this all the time for all types of stuff, going back many months. I also set my command history to a really huge buffer then I can track years worth of commands. This is sort of like my extended memory. Thought you might find this handy too. By the way, you can use the Ctrl+R option too, too search, but I find that quite limiting, as it only shows you a single result. I always want more than that to get a sense of what I’m working with.
Alright, that’s it for this episode, thanks for watching. Bye.