Taro Logo

SRE Engineer - Air Platform Team

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$120,000 - $235,750
DevOps
Mid-Level Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For SRE Engineer - Air Platform Team

NVIDIA, a global leader in accelerated computing and AI technology, is seeking a skilled SRE Engineer to join their AIR (Digital Twin for Data Center Simulation) team. This role combines infrastructure management with cutting-edge technology, focusing on creating identical replicas of real-world data center infrastructure deployments.

The position offers an exciting opportunity to work with NVIDIA's innovative AIR platform, which enables cloud-scale efficiency through advanced simulation capabilities. As an SRE Engineer, you'll be responsible for designing and managing high-availability infrastructure, implementing automation solutions, and ensuring robust monitoring systems.

The ideal candidate will bring 3-5+ years of relevant experience, with strong capabilities in infrastructure automation, Linux systems, and modern DevOps practices. You'll work with cutting-edge technologies including Kubernetes, Docker, and various monitoring tools like Prometheus and Grafana.

NVIDIA offers a competitive compensation package, with base salary ranging from $120,000 to $235,750 USD, plus equity and comprehensive benefits. The company is known for its outstanding growth and innovation, making it one of the technology world's most desirable employers. Working at NVIDIA means joining a team of forward-thinking professionals who are pushing the boundaries of technology in AI and digital twins.

The role is based in Durham, NC, where you'll be part of a team that's transforming how data centers are designed and operated. NVIDIA's commitment to diversity and inclusion, combined with their market-leading position in accelerated computing, makes this an exceptional opportunity for a skilled SRE professional looking to make an impact in a rapidly evolving technology landscape.

Last updated 3 days ago

Responsibilities For SRE Engineer - Air Platform Team

  • Design, deploy, and manage IaaS platforms with a focus on high availability and performance
  • Automate infrastructure operations using tools like Terraform, Ansible, and Python
  • Focus on efficiency by automating repetitive workflows
  • Develop monitoring and observability tooling to detect and prevent outages
  • Deploy and troubleshoot non-disruptive cloud operations
  • Manage deployment/upgrades for Operating Systems, Kubernetes clusters, and other orchestration tools
  • Provide day-to-day support for engineering activities with CI/CD tools
  • Implement and enforce best practices around infrastructure security

Requirements For SRE Engineer - Air Platform Team

Python
Kubernetes
Linux
  • BS degree in Computer Science, Software Engineering, or related field
  • 3-5+ years of experience in Site Reliability, DevOps, or Systems Engineering
  • Strong automation and scripting skills in Ansible, Python, and Shell Scripting
  • Experience in IaaS environments
  • Deep experience in infrastructure engineering
  • Skilled in observability practices
  • Solid grasp of Linux internals and core networking concepts
  • Experience with modern deployment architecture
  • Proficiency in Kubernetes, Docker, QEMU, and Libvirt

Benefits For SRE Engineer - Air Platform Team

Medical Insurance
Equity
  • Competitive base salary
  • Equity
  • Comprehensive benefits package

Interested in this job?

Jobs Related To NVIDIA SRE Engineer - Air Platform Team