Notes by Doctor Droid

How can Playbooks help improve Developer Experience during on-call?

·

3 min read

Playbooks help on-call engineers debug issues faster & automate investigations

Cover Image for How can Playbooks help improve Developer Experience during on-call?

What is PlayBooks?

PlayBooks is an open source framework to codify and automate on-call investigations for engineers.

Github URL: https://github.com/DrDroidLab/playbooks

Why do you need Playbooks?

Engineering teams today use confluence/wiki currently to document how to investigate an issue and fix it. But:

  • It is dependent on an engineer to manually follow the docs (time consuming, prone to human errors)

  • Requires access to multiple monitoring tools over a laptop. (context overload, permissions, laptop accessibility)

  • Re-sharing the context of analysis with a teammate is hard too when the data is distributed across multiple different dashboards and tools.

You can replace these wikis with executable playbooks.

How does the PlayBooks framework operate?

The PlayBooks framework comes pre-built with 10+ integrations & flexibility to build more. For instance, you can fetch log query results from AWS Cloudwatch as a step within the framework. Similarly it supports many other types of tasks, including but not limited to:

  • Fetching Logs

  • Fetching metrics

  • Running API calls

  • Running Bash commands in remote server

  • Fetching deployment information

  • Running DB queries

  • Adding external links & notes

Using these tasks, you can create a playbook that's most suitable for your use-case.

Automate Playbooks using Workflows:

You can further automate the execution of Playbooks using "Workflows". Workflows help you to define trigger points which can automatically run a playbook and send it's response in a Slack channel.

How can Playbooks improve Developer Experience:

  • Faster Investigation:

    Having every alert supplemented with a related playbook or data accelerates the lead time to investigate an issue.

  • Reduced context switching between multiple tools:

    Data from multiple sources can be fetched into a single playbook, resulting in easier review of data while assessing an issue. It also helps avoid the hassle of requiring to login and have relevant access to each tool separately.

  • Reduced training:

    Months are often spent educating a new engineer in the team about how the system's internals operate. Using playbooks, engineers can ramp up by understanding the correlations in context of investigations and then extrapolate it to the product.

  • Reduced escalations to senior engineers:

    With an automation like this, senior engineers can configure all their tribal knowledge in form of 1-click playbooks. These playbooks help an on-call engineer with limited context go further in debugging and potentially reduce the need for manual intervention of a senior engineer.

  • Historical reference to similar issues in the past:

    PlayBooks store a reference of past executions so in case your team faces a similar issue again in the future, they can refer to these steps to look at what happened in the previous execution.

If your team has a poor developer experience for on-call investigations, PlayBooks could help you turn around the feelings associated with on-call by making it seamless.

Try it out here: https://github.com/DrDroidLab/playbooks