Datadog, a leading global SaaS company, is seeking a Senior Software Engineer to join their Incident Management SRE team. This role focuses on fostering a resilient culture by leveraging incidents as learning opportunities and catalysts for growth. The position involves close collaboration with teams across departments to enhance on-call experience, incident response, and post-incident analysis.
As a Senior Software Engineer in Incident Management, you'll be responsible for building and improving platforms that support on-call rotations, streamlining incident response processes, and facilitating post-mortem analyses. You'll work with Go, Python, and TypeScript in a distributed systems environment, helping teams navigate complex technical challenges while maintaining system reliability.
The ideal candidate brings at least 3 years of software engineering experience, strong knowledge of Kubernetes and distributed systems, and a track record of on-call incident response. You should be passionate about teaching others and driving organizational improvements through influence and collaboration.
Datadog offers a hybrid work environment, competitive benefits including equity compensation (RSUs and ESPP), and a strong focus on professional development. The company maintains an inclusive culture with various employee resource groups and emphasizes continuous learning and growth. This role provides an opportunity to make a significant impact on how a major tech company handles incidents and maintains system reliability while working with cutting-edge technologies.