Taro Logo

Software Engineer - Observability Infrastructure SRE

Global SaaS business delivering cloud monitoring and security solutions, enabling digital transformation and infrastructure monitoring of technology stacks.
$187,000 - $240,000
Site Reliability
Mid-Level Software Engineer
Hybrid
1,000 - 5,000 Employees
3+ years of experience
Enterprise SaaS

Description For Software Engineer - Observability Infrastructure SRE

Datadog is seeking a Software Engineer for their Observability Infrastructure SRE team within the Core Observability SRE group. This role focuses on managing Datadog's internal observability tooling and practices, specifically the telemetry data plane that collects large volumes of observability data across all Datadog environments. The position involves working with libraries, fleet of agents, data processors, and endpoints in a multi-region, multi-cloud provider ecosystem.

The role requires collaboration with engineering teams to test and build new functionality, improving products for both internal use and customers. As part of the team, you'll be responsible for ensuring the scalability, reliability, and efficiency of critical observability systems across Datadog. The position offers the opportunity to work with cutting-edge technology while contributing to the company's core infrastructure.

Datadog operates in a hybrid workplace environment, emphasizing office culture while allowing flexibility for work-life harmony. The company offers comprehensive benefits including equity packages, professional development opportunities, and inclusive community initiatives. The role combines technical expertise with cross-team collaboration, making it ideal for engineers passionate about large-scale systems and observability.

Working at Datadog means joining a global SaaS business that champions professional development, diversity of thought, and innovation. The company maintains a collaborative, pragmatic, and thoughtful people-first community focused on solving complex problems in the cloud age. This role offers the chance to make significant impacts on critical systems while working with modern technologies and practices.

Last updated 5 hours ago

Responsibilities For Software Engineer - Observability Infrastructure SRE

  • Solve Datadog specific observability, instrumentation or telemetry collection problems
  • Build, scale and operate a robust telemetry data plane
  • Work with infrastructure team to design architecture across cloud providers and regions
  • Gather requirements for operational usecases and implement relevant supporting telemetry collection

Requirements For Software Engineer - Observability Infrastructure SRE

Go
Python
  • 3+ years experience in software engineering, running production systems at scale
  • Ability to explore and analyze problems to propose efficient solutions
  • Hands-on experience with Go or Python
  • Strong communication skills and experience working in cross-team projects
  • Experience leading the adoption of programs/projects with wide impact across Engineering

Benefits For Software Engineer - Observability Infrastructure SRE

Equity
Medical Insurance
Dental Insurance
Mental Health Assistance
401k
  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development and career pathing
  • Intradepartmental mentor and buddy program
  • Inclusive company culture with Community Guilds
  • Free global mental health benefits for employees and dependents
  • Competitive global benefits
  • 401(k) plan and match
  • Healthcare and dental benefits
  • Paid time off
  • Fitness reimbursements

Interested in this job?

Jobs Related To Datadog Software Engineer - Observability Infrastructure SRE

Site Reliability Engineer (2+ Years)

Site Reliability Engineer position at Fam, India's leading youth-focused payments app, requiring 2+ years of AWS experience and expertise in CI/CD, Kubernetes, and cloud infrastructure.

Site Reliability Engineer II

Site Reliability Engineer II position at Sinch, working remotely in France to ensure system reliability and performance for a global cloud communications provider.

Software Engineer II, Site Reliability Engineering

Site Reliability Engineer position at Google focusing on maintaining and improving large-scale distributed systems reliability and performance.

Software Engineer III, Site Reliability Engineering, Google Cloud

Site Reliability Engineer role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.

Site Reliability Engineer II

Remote Site Reliability Engineer II position at Sinch, focusing on managing global infrastructure using GCP, Terraform, and monitoring tools.