Software Engineer - Observability Infrastructure SRE

Datadog

Global SaaS business delivering cloud monitoring and security solutions, enabling digital transformation and infrastructure monitoring of technology stacks.

New York, NY, USA

$187,000 - $240,000

Site Reliability

Mid-Level Software Engineer

Hybrid

1,000 - 5,000 Employees

3+ years of experience

Enterprise SaaS

Description For Software Engineer - Observability Infrastructure SRE

Datadog is seeking a Software Engineer for their Observability Infrastructure SRE team within the Core Observability SRE group. This role focuses on managing Datadog's internal observability tooling and practices, specifically the telemetry data plane that collects large volumes of observability data across all Datadog environments. The position involves working with libraries, fleet of agents, data processors, and endpoints in a multi-region, multi-cloud provider ecosystem.

The role requires collaboration with engineering teams to test and build new functionality, improving products for both internal use and customers. As part of the team, you'll be responsible for ensuring the scalability, reliability, and efficiency of critical observability systems across Datadog. The position offers the opportunity to work with cutting-edge technology while contributing to the company's core infrastructure.

Datadog operates in a hybrid workplace environment, emphasizing office culture while allowing flexibility for work-life harmony. The company offers comprehensive benefits including equity packages, professional development opportunities, and inclusive community initiatives. The role combines technical expertise with cross-team collaboration, making it ideal for engineers passionate about large-scale systems and observability.

Working at Datadog means joining a global SaaS business that champions professional development, diversity of thought, and innovation. The company maintains a collaborative, pragmatic, and thoughtful people-first community focused on solving complex problems in the cloud age. This role offers the chance to make significant impacts on critical systems while working with modern technologies and practices.

Last updated 5 hours ago

Responsibilities For Software Engineer - Observability Infrastructure SRE

Solve Datadog specific observability, instrumentation or telemetry collection problems
Build, scale and operate a robust telemetry data plane
Work with infrastructure team to design architecture across cloud providers and regions
Gather requirements for operational usecases and implement relevant supporting telemetry collection

Requirements For Software Engineer - Observability Infrastructure SRE

Python

3+ years experience in software engineering, running production systems at scale
Ability to explore and analyze problems to propose efficient solutions
Hands-on experience with Go or Python
Strong communication skills and experience working in cross-team projects
Experience leading the adoption of programs/projects with wide impact across Engineering

Benefits For Software Engineer - Observability Infrastructure SRE

Equity

Medical Insurance

Dental Insurance

Mental Health Assistance

401k

New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
Continuous professional development and career pathing
Intradepartmental mentor and buddy program
Inclusive company culture with Community Guilds
Free global mental health benefits for employees and dependents
Competitive global benefits
401(k) plan and match
Healthcare and dental benefits
Paid time off
Fitness reimbursements

Datadog

Global SaaS business delivering cloud monitoring and security solutions, enabling digital transformation and infrastructure monitoring of technology stacks.

New York, NY, USA

$187,000 - $240,000

Site Reliability

Mid-Level Software Engineer

Hybrid

1,000 - 5,000 Employees

3+ years of experience

Enterprise SaaS

Interested in this job?

Jobs Related To Datadog Software Engineer - Observability Infrastructure SRE

Site Reliability Engineer (2+ Years)

Fam

Site Reliability Engineer position at Fam, India's leading youth-focused payments app, requiring 2+ years of AWS experience and expertise in CI/CD, Kubernetes, and cloud infrastructure.

Site Reliability Engineer II

Sinch

Site Reliability Engineer II position at Sinch, working remotely in France to ensure system reliability and performance for a global cloud communications provider.

Software Engineer II, Site Reliability Engineering

Google

Site Reliability Engineer position at Google focusing on maintaining and improving large-scale distributed systems reliability and performance.

Software Engineer III, Site Reliability Engineering, Google Cloud

Google

Site Reliability Engineer role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.

Site Reliability Engineer II

Sinch

Remote Site Reliability Engineer II position at Sinch, focusing on managing global infrastructure using GCP, Terraform, and monitoring tools.