Taro Logo

Senior DGX Cloud AI Infrastructure Software Engineer

World leader in accelerated computing, pioneering AI and digital twins technology.
$224,000 - $425,500
Cloud
Staff Software Engineer
Hybrid
5,000+ Employees
12+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA is seeking a Senior DGX Cloud AI Infrastructure Software Engineer to join their innovative AI research team. This role is crucial in developing and optimizing the infrastructure that powers NVIDIA's AI initiatives. The position focuses on building and maintaining scalable AI systems that enable large-scale training and inferencing operations.

The role combines deep technical expertise with strategic thinking, requiring the ability to design and implement robust infrastructure solutions while ensuring high efficiency and availability. You'll be working with cutting-edge AI technologies, including LLMs and GenAI, while contributing to the development of tools and services that support NVIDIA's AI platforms.

As a senior engineer, you'll be responsible for complex problem-solving, from application-level issues to hardware-level challenges. The position offers significant autonomy while providing the support and mentorship needed for success. NVIDIA's culture promotes blameless postmortems, continuous improvement, and innovative thinking.

The ideal candidate brings extensive experience in software infrastructure for AI systems, strong debugging capabilities, and a proven track record in scaling distributed systems. Knowledge of GPU technologies, network protocols, and deep learning frameworks is highly valued. The role offers competitive compensation, including equity and comprehensive benefits, reflecting NVIDIA's position as a leader in the AI and accelerated computing space.

Working at NVIDIA means being at the forefront of AI innovation, contributing to groundbreaking developments that transform industries. The company's commitment to diversity and inclusion, combined with its focus on pushing technological boundaries, creates an exciting and rewarding environment for professional growth and impact.

Last updated a month ago

Responsibilities For Senior DGX Cloud AI Infrastructure Software Engineer

  • Develop infrastructure software and tools for large-scale AI, LLM, and GenAI infrastructure
  • Develop and optimize tools to improve infrastructure efficiency and resiliency
  • Root cause and analyze and triage failures from the application level to the hardware level
  • Enhance infrastructure and products underpinning NVIDIA's AI platforms
  • Co-design and implement APIs for integration with NVIDIA's resiliency stacks
  • Define meaningful and actionable reliability metrics to track and improve system and service reliability

Requirements For Senior DGX Cloud AI Infrastructure Software Engineer

Python
Linux
Kubernetes
  • Minimum of 12+ years of experience in developing software infrastructure for large scale AI systems
  • Bachelor's degree or higher in Computer Science or a related technical field
  • Strong debugging skills and experience in analyzing and triaging AI applications
  • Proven track record in building and scaling large-scale distributed systems
  • Experience with AI training and inferencing and data infrastructure services
  • Familiar in operating large-scale observability platforms for monitoring and logging
  • Proficiency in programming languages such as Python, C/C++, script languages
  • Excellent communication and collaboration skills

Related Jobs

Senior Datacenter System Software Architect - DGX Cloud

Senior Datacenter System Software Architect role at NVIDIA focusing on DGX Cloud infrastructure, requiring expertise in distributed systems and AI infrastructure.

Software Infrastructure Engineer - GeForce Now

Senior infrastructure engineering role at NVIDIA focusing on cloud systems for GeForce Now, offering competitive compensation and the chance to work on cutting-edge gaming technology.

Senior Architect, Data Center Modeling

Senior Architect position at NVIDIA focusing on data center modeling and architecture, requiring 12+ years of experience in systems architecture and Python programming.

Sr. IT Engineer - Cloud

Senior IT Engineer - Cloud position at SHI International Corp, focusing on Azure cloud architecture and infrastructure, offering $120-180K salary in Somerset, NJ.

Software Developer 5

Staff Software Developer position at Oracle focusing on cloud infrastructure and distributed systems, offering competitive compensation and comprehensive benefits.