Senior Software Engineer – AI Infrastructure and Tooling

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$184,000 - $356,500
DevOps
Senior Software Engineer
Hybrid
5,000+ Employees
4+ years of experience
AI · Enterprise SaaS

Description For Senior Software Engineer – AI Infrastructure and Tooling

NVIDIA is seeking a Senior Software Engineer to join their AI Infrastructure and Tooling team. This role combines DevOps expertise with cutting-edge AI infrastructure development, focusing on building and maintaining large-scale cloud and on-premise computing clusters. The position offers a competitive salary range of $184,000 - $356,500 USD, along with equity and comprehensive benefits.

As part of a small, fast-moving team, you'll be responsible for the complete operational excellence of everything from OS level to services. The role requires expertise in Kubernetes, AWS, and modern DevOps practices, with a focus on building sophisticated infrastructure automation tools. You'll be working directly with NVIDIA's Autonomous Vehicles development team, making a significant impact on their development efficiency.

The ideal candidate should have at least 4 years of experience with Kubernetes-based platforms and cloud automation using tools like Terraform, Python, and Go. Strong knowledge of AWS services, traffic engineering, and observability platforms is essential. The role offers opportunities to work with GPU/CPU clusters and emerging technologies in AI infrastructure management.

NVIDIA, as the world leader in accelerated computing, offers an environment focused on innovation and tackling challenging problems that matter to the world. The company is committed to diversity and inclusion, providing equal opportunities to all candidates regardless of background. This position represents an opportunity to join a company at the forefront of AI and machine learning, working on infrastructure that powers next-generation technology development.

The role combines technical depth with strategic thinking, requiring both hands-on development skills and the ability to understand complex system interactions. You'll be part of a team that values innovation, operational excellence, and continuous learning, with the chance to work on cutting-edge technology that directly impacts NVIDIA's autonomous vehicle initiatives.

Last updated a minute ago

Responsibilities For Senior Software Engineer – AI Infrastructure and Tooling

  • Apply strong programming skills for crafting production-grade software
  • Design and implement Continuous Deployments (CD) pipelines
  • Manage big picture of system relationships
  • Build AWS infrastructure automation and deployment tools
  • Drive advancements in large-scale cloud and on-premise computing clusters

Requirements For Senior Software Engineer – AI Infrastructure and Tooling

Python
Go
Kubernetes
Linux
  • BS or MS in CS/CE/EE or equivalent experience
  • 4+ years of k8s based computing platforms tooling/APIs development
  • At least 4 years building automation software for cloud with Terraform, Python, Go
  • Strong AWS fundamentals: IAM, VPC, RDS, S3, CDN, EC2
  • Expert knowledge of DevOps principles, tools, and methodologies
  • Working experience with Continuous Deployments (CD) pipelines
  • Good understanding of Traffic Engineering solutions
  • In depth understanding of Internet protocols
  • Operational expertise with Observability, Prometheus eco system
  • Proficiency with Linux environment
  • Excellent written and verbal interpersonal skills

Interested in this job?

Jobs Related To NVIDIA Senior Software Engineer – AI Infrastructure and Tooling

Senior HPC DevOps Engineer

Senior HPC DevOps Engineer position at NVIDIA, focusing on building and maintaining large-scale HPC/AI clusters and implementing advanced DevOps practices.

Senior Software Engineer - Build and Deployment Tools

Senior Software Engineer position at NVIDIA focusing on build and deployment tools development for chip design infrastructure.

Senior HPC AI Cluster Engineer

Senior HPC AI Cluster Engineer role at NVIDIA focusing on building and maintaining large-scale HPC/AI infrastructure and supercomputers.

Site Reliability Engineer III- DevOps

Senior SRE role at JPMorgan Chase focusing on AWS infrastructure, Kubernetes, and DevOps practices with competitive compensation between $133K-$185K.

Senior DevOps Engineer

Senior DevOps Engineer role at Nomagic, combining cloud and hardware expertise to advance robotics technology, offering 25-32K PLN monthly with equity and hybrid work options.