Skip to content

On-call Process

Source: Notion | Last edited: 2025-12-16 | ID: 2c62d2dc-3ef...


  • Database: Incident Tracker

This is a crucial aspect of modern business operations, particularly in industries where downtime can have a significant impact on customers or stakeholders. On-call members will monitor systems, respond to alerts or notifications, and resolve issues quickly.

If you are the primary on-call, it’s important to stay near your laptop and ensure that you have a dependable internet connection. Failing to respond to the alert will result in notifications being sent to the other members in the rotation.

By implementing an effective on-call process, we can significantly enhance our ability to address customer issues promptly and minimize the duration of their outages. Moreover, this process enables every team member to gain a comprehensive understanding of the system as a whole by exposing them to various issues that may arise. This collective knowledge empowers the team to become more efficient in resolving problems and improving the overall system’s stability.

⚠️ It’s important to note that the purpose of the on-call system is not to assign blame for issues that arise. Instead, it serves as a framework and process that enables the team to foster accountability and facilitate continuous learning and improvement.

image

  1. Handle communication and address urgent issues or outages
  • Receive, categorize, and assess the complexity of the issues
  • Coordinates or gets confirmation from the reporter for priorities
  • Log the issue to Incident Tracker, create and assign the tickets to the appropriate engineers so they can investigate and resolve them
  1. Host all the team’s meetings in the on-call week
  2. Prepare the meeting agendas
  3. Host the meetings (Daily Standup, Sprint Planning, Sprint Review, RCA Review, etc)
  4. Recap and keep the team posted by sending the meeting notes
  5. Being the release manager

We’ve established a dedicated On-call Scheduler, where we maintain a comprehensive list of all on-call scheduler members for the project. This centralized resource serves as a reference point for our team, providing easy access to the current roster of on-call participants.

On-call Scheduler

  • Frequency: Once a week
  • Start Date: 2025-06-23
    1. @Engineer
    2. @Engineer

Contact information in case of emergencies. Please call the on-call member if any incidents occur.

OpsGenie serves as our optional tool for managing alerts. You need to familiarize yourself with this platform and acquire a foundational understanding of its features and functionalities. You can begin by visiting the following link to learn the basics: https://support.atlassian.com/opsgenie/docs/read-opsgenies-quickstart-guide/.

To prevent the alert from being escalated to the other engineers on the schedule, it’s important to ACK (acknowledge) it. Once you have acknowledged the alert, it’s essential to reach out to the reporter and ask questions to gain a better understanding of the issue. Make sure to document your findings and ensure that the incident is valid.

The on-call engineer needs to closely monitor customer Slack channels and quickly declare an incident if there is a major outage. It is important for them to pay attention to customer messages and take immediate action when a significant service disruption occurs.