At Databricks, we are at the forefront of enabling data teams to tackle the world's most challenging problems. As a Staff Software Engineer in the Observability Platform team, you'll be working on one of the largest-scale software platforms, managing millions of virtual machines that generate terabytes of logs and process exabytes of data daily.
The role involves developing cutting-edge observability solutions that provide crucial insights into the health and performance of Databricks' products and infrastructure. You'll be responsible for building next-generation platforms handling billions of active time series and processing petabytes of logs daily, while managing infrastructure across nearly a hundred cloud regions.
Your impact will be significant as you develop advanced workflows that accelerate incident diagnosis, leverage Databricks' data intelligence platform, and set industry standards for troubleshooting practices. You'll also play a crucial role in upleveling monitoring and reliability practices across Databricks engineering, developing opinionated tools for managing structured logs, metrics, alerts, dashboards, and oncall rotations.
The ideal candidate brings 7+ years of production-level experience in languages like Go, Python, Java, Scala, or Rust, along with deep expertise in large-scale distributed systems and cloud technologies. You'll work with cutting-edge cloud technologies across AWS, Azure, and GCP, while having the opportunity to mentor and uplevel other engineers.
At Databricks, you'll be part of a team that's passionate about technical excellence and innovation, working on solutions that directly impact the reliability and performance of one of the most sophisticated data and AI platforms in the industry. The role offers competitive compensation, comprehensive benefits, and the opportunity to work on challenging technical problems at scale.