Taro Logo

Senior Software Engineer - Incident Management

Global SaaS business delivering cloud monitoring, security and analytics solutions, helping organizations monitor their entire technology stack.
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
Enterprise SaaS

Description For Senior Software Engineer - Incident Management

Datadog, a leading global SaaS company, is seeking a Senior Software Engineer to join their Incident Management SRE team. This role focuses on fostering a resilient culture by leveraging incidents as learning opportunities and catalysts for growth. The position involves close collaboration with teams across departments to enhance on-call experience, incident response, and post-incident analysis.

As a Senior Software Engineer in Incident Management, you'll be responsible for establishing best practices for on-call rotations, building supporting platforms, and streamlining incident response processes. You'll contribute significantly to the company's post-mortem process, facilitate incident reviews, and train on-callers in incident management practices. The role requires expertise in Go and Python programming, along with knowledge of Kubernetes and distributed systems.

The ideal candidate brings at least 3 years of software engineering experience, strong communication skills, and a proven track record of incident response. You'll work in a hybrid environment that values office culture while providing flexibility for work-life harmony. The position offers competitive benefits including equity compensation, professional development opportunities, and comprehensive health benefits.

Working at Datadog means joining a company that champions professional development, diversity of thought, and innovation. You'll be part of a collaborative, pragmatic, and thoughtful people-first community focused on solving complex problems in the cloud age. The company's mission involves breaking down silos and enabling digital transformation, cloud migration, and infrastructure monitoring for customers' entire technology stacks.

Last updated 2 hours ago

Responsibilities For Senior Software Engineer - Incident Management

  • Steer the on-call experience by establishing best practices and building platforms to support on-call rotations
  • Define incident response processes and write software to streamline the process
  • Contribute to the post-mortem process and run weekly postmortem reading group
  • Support teams in facilitating incident reviews emphasizing learning and blamelessness
  • Train on-callers in incident and post-mortem processes
  • Engage in cross-functional collaborations with different teams

Requirements For Senior Software Engineer - Incident Management

Go
Python
TypeScript
Kubernetes
  • At least 3 years of experience building software that solves real user problems
  • Familiarity with Kubernetes and distributed systems
  • Experience being on-call and responding to incidents
  • Strong communication skills in English
  • Empathy and collaboration skills
  • Willingness to teach and train other engineers

Benefits For Senior Software Engineer - Incident Management

Equity
Mental Health Assistance
  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development and career pathing
  • Intradepartmental mentor and buddy program
  • Inclusive company culture with Community Guilds
  • Access to Inclusion Talks
  • Free global mental health benefits for employees and dependents
  • Competitive global benefits

Interested in this job?

Jobs Related To Datadog Senior Software Engineer - Incident Management