Quizlet was experiencing a challenge found commonly in the Site Reliability Engineering (SRE) world - an incident management process that was not conducive to actually improving across their tooling or processes after incidents. The organization was searching for a solution that would be easier to use, streamline information, provide robust learning capabilities, and also make incident response more efficient. Quizlet wanted to shift their culture surrounding incidents to one that takes a deeper and more collaborative approach to understanding incidents– one that would prioritize learning opportunities and group discussions in order to continuously improve their tooling, systems, and processes.
Prior to adopting Jeli, Quizlet used the Root Cause Analysis and 5 Why’s methods, which they had identified as a dated solution to today’s understanding of incidents. Their post-incident report process had two options that included a traditional and lightweight post-incident report; the latter left learnings on the table. Their post-incident report template, in Google Docs, was a source of friction for Incident Commanders (ICs). Some ICs found the timeline section of the post-incident report challenging to read and understand because there was no visual representation of duration or relative impact. The team found creating a timeline was a time-consuming, manual process that involved copying and pasting countless messages and timestamps into the Google Doc. Additionally, the organization was also seeking a solution to better support their responders during incidents to help automate incident response and streamline their workflows.
Identifying A Solution
Quizlet wanted a solution that would streamline toilsome tasks involved in coordination and communication during an incident, while also making it easier to gather data needed to really understand what happened afterwards. Jeli partnered with Quizlet on the creation of Jeli’s Incident Response (IR) Bot to build key features that would solve the needs Quizlet was looking for:
Broadcast channels to automate updates out to relevant Slack channels of their choosing, improving cross-channel communication, while decreasing cognitive load on the engineers responding.
Automatically import event data and Slack messages into the Jeli platform after an incident is closed.
Create Jira tickets easily during an incident with the Jira Integration for Jeli to help eliminate extra steps and cognitive load during incidents.
I love the automation of toil Jeli provides --Yanet L., Quizlet Platform Engineering Manager
The Results: Better Learnings, Faster
Quizlet now holds biweekly Learning From Incidents (LFI) meetings, improving how learnings are shared. During the meeting, the team focuses on asking participants about what their thought process was at that moment and what themes the team noticed, facilitating a collaborative experience to help reveal the key learnings from the incident. The LFI meeting also helps the Quizlet team identify where gaps are in their tooling, how people work, and additional team training needed across critical tools. With the adoption of Jeli, Quizlet has created efficiencies across workflows and increased resiliency:
Engineers are able to focus on responding to an incident while the other necessary adjacent parts of an incident are automated to reduce cognitive load. Engineers create high-quality action items in less time.
Engineering has increased the number of incidents being analyzed, to reveal deeper insights and learnings of their systems and tools to drive continuous improvement.
Google Docs and old processes have been replaced and automated with a platform designed for incident reviews in order to capture all of the key details, efficiently.
The Quizlet team now creates post-incident reports more often with less friction by using Jeli. Timelines are built faster, with more key details, using the Jeli Narrative Builder. The more comprehensive timelines now include the context to understand what happened during the incident, who participated, and the visual representation needed in order to identify key takeaways and areas for improvement. The Jeli incident response and management platform has enabled Quizlet to move forward in their incident management to a modern, more sophisticated approach with streamlined, predictable workflows.