DevOps and SRE are less of a specific skillset and more of a cultural set of tools. These “tools” are things like values (automation, reliability, repeatability, data-driven decision making, etc.) and processes (blameless postmortems, 4-golden-signals monitoring, interdisciplinary project groups, often some Agile/Scrum workflow, etc.).
There are some common tools (git, CI tools) or artifacts (infrastructure as code) but they aren’t set in stone.
General Advice for DevOps Interviews
Show that you have a good grasp of proper automation and common “devops” practices. Be ready to tell stories that demonstrate your experience with this.
Have a philosophy about testing changes (code or infra — any change to a system). Talk about isolating the change, testing it, communicating about it as necessary, integrating it with other changes if necessary, and then moving it to staging/perf before rolling out in production.
Good, deep Linux/OS understanding (less focus on trick questions, but still be prepared for common ones).
Have good programming chops.
Have a good, positive attitude.
Common DevOps and SRE Job Interview Questions:
- What’s the difference between devops and SRE?
- What’s important to you when you’re doing a postmortem for an outage?
- What’s the difference between scalability and availability?
- Tell the story of the best-run outage you’ve been a part of, from beginning to end.
- What about the worst-run outage?
- What is the purpose of a post-mortem?
- How do you define a cutting-edge technology, and what are some cutting-edge technologies that you’re paying attention to right now?
- How do you achieve stability when working with brand-new tech? What about when providing a platform for unstable applications?
- How are containers different from virtual machines (VMs)?
- What are some advantages and disadvantages of containers?
- What is a docker container “made of” in the actual implementation? What are the general outlines of how it works?
- What’s the difference between continuous integration and continuous deployment?
- Explain mutable vs immutable infrastructure. What are some benefits and drawbacks of each? What tools might you use to set up a pipeline for each approach?
- What are some of the benefits and pitfalls of infrastructure-as-code (IAC)?
- How do you test that your changes to a given configuration won’t cause negative impacts in production environments?
- What experience does a new SRE/Admin have when on-boarding to your team and your infrastructure/system?
- What does your metrics and monitoring setup look like? How do you use it to debug issues that are happening in your system?
- What steps does it currently take to roll out a change in your system? What’s good and bad about that? What would you change, and why?
- How do you keep your servers and software up to date and patched?
Discussion / Behavioral Devops Questions
- How do you feel about on-call rotations as part of your work? What needs to happen for on-call to be better? If you were able to wave a magic engineering wand to make on-call better, what underlying issues would be fixed, and how?
- What’s your current split of project work to interrupt/on-call/reactive work?
- What has been your biggest win in your current role?
- How would you achieve no downtime deployments?
- What was the hardest/longest to track down issue you ever faced? What did you learn/would you do differently?