Taro Logo

Staff Software Engineer - Site Reliability and Observability

A global automotive company leading the change towards Zero Crashes, Zero Emissions and Zero Congestion through engineering, technology and design.
Austin, TX, USARoswell, GA, USAWarren, MI, USA
Site Reliability
Staff Software Engineer
Hybrid
5,000+ Employees
7+ years of experience
Automotive

Description For Staff Software Engineer - Site Reliability and Observability

General Motors is seeking a Staff Software Engineer specializing in Site Reliability and Observability to join their innovative team. This role is crucial in ensuring the reliability, scalability, and performance of GM's software systems as they pursue their vision of Zero Crashes, Zero Emissions, and Zero Congestion.

The position offers a unique opportunity to work with cutting-edge technology in the automotive industry, implementing and maintaining critical observability platforms and tools. As an SRE, you'll be responsible for monitoring system health, implementing automation, and ensuring high availability of production systems. The role requires expertise in cloud platforms (particularly Azure), containerization technologies, and modern monitoring tools.

The ideal candidate will bring 7+ years of hands-on SRE experience and a strong background in distributed systems. You'll work in a hybrid environment, collaborating with cross-functional teams to drive reliability improvements and implement best practices. The position offers the chance to make a significant impact on GM's digital infrastructure while working with modern technologies like Kubernetes, Terraform, and various programming languages.

Benefits include a company vehicle, relocation assistance, and comprehensive healthcare coverage. The role provides an excellent opportunity to work on large-scale systems while contributing to GM's mission of transforming mobility. The position is based in multiple locations including Austin, TX, Roswell, GA, and Warren, MI, offering flexibility while maintaining collaborative in-person work three days per week.

Last updated 10 days ago

Responsibilities For Staff Software Engineer - Site Reliability and Observability

  • Implement scalable, reliable, secure SRE and Observability platform
  • Deliver tools/software to improve reliability, scalability and operability
  • Collaborate with engineering teams on architecture and infrastructure
  • Conduct production readiness reviews and deployments
  • Monitor system availability, latency and service health
  • Participate in on-call engineering duty
  • Perform incident root cause analysis
  • Build run books and tooling for production support
  • Participate in technical discussions with Architecture group

Requirements For Staff Software Engineer - Site Reliability and Observability

Kubernetes
Python
Java
Go
  • 7+ years of hands-on SRE experience with cloud providers (Azure preferred)
  • Experience with high-availability, fault-tolerant, scalable distributed systems
  • Experience with monitoring tools like Azure Monitor/Sentinel, Datadog, Dynatrace
  • Strong knowledge of Docker, Kubernetes, Terraform
  • Experience troubleshooting JVM applications
  • Experience with chaos engineering
  • Knowledge of Open telemetry
  • Strong scripting/programming skills in Python, Java, Go, PowerShell, Bash
  • Experience with SSO, Big Data/NoSQL in cloud
  • CI/CD automation frameworks knowledge
  • Strong understanding of cloud networking
  • Experience improving uptime to 99.99%
  • Experience with source control (GitHub, Azure DevOps)
  • BS/MS in Computer Science/Engineering preferred

Benefits For Staff Software Engineer - Site Reliability and Observability

Medical Insurance
Vision Insurance
Dental Insurance
  • Company vehicle provided
  • Relocation benefits available
  • Comprehensive benefits package

Jobs Related To General Motors Staff Software Engineer - Site Reliability and Observability