Datadog, a leading global SaaS company, is seeking a Senior Software Engineer to join their Incident Management SRE team. This role focuses on fostering a resilient culture by leveraging incidents as learning opportunities and catalysts for growth. The position involves close collaboration with teams across departments to enhance on-call experience, incident response, and post-incident analysis.
As a Senior Software Engineer in Incident Management, you'll be responsible for establishing best practices for on-call rotations, building supporting platforms, and streamlining incident response processes. You'll contribute significantly to the company's post-mortem process, facilitate incident reviews, and train on-callers in incident management practices. The role requires expertise in Go and Python programming, along with knowledge of Kubernetes and distributed systems.
The ideal candidate brings at least 3 years of software engineering experience, strong communication skills, and a proven track record of incident response. You'll work in a hybrid environment that values office culture while providing flexibility for work-life harmony. The position offers competitive benefits including equity compensation, professional development opportunities, and comprehensive health benefits.
Working at Datadog means joining a company that champions professional development, diversity of thought, and innovation. You'll be part of a collaborative, pragmatic, and thoughtful people-first community focused on solving complex problems in the cloud age. The company's mission involves breaking down silos and enabling digital transformation, cloud migration, and infrastructure monitoring for customers' entire technology stacks.