Taro Logo

Senior DGX Cloud AI Infrastructure Software Engineer

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
$184,000 - $356,500
Machine Learning
Senior Software Engineer
Remote
5,000+ Employees
8+ years of experience
AI · Enterprise SaaS

Description For Senior DGX Cloud AI Infrastructure Software Engineer

NVIDIA is seeking a Senior DGX Cloud AI Infrastructure Software Engineer to join their innovative AI research team. This role focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data infrastructure tools and services. The position offers an opportunity to work with cutting-edge AI technologies and contribute to NVIDIA's mission of powering the future of AI and data science.

The role involves developing and maintaining infrastructure software for large-scale AI systems, with a focus on LLM and GenAI infrastructure. You'll be responsible for implementing software engineering practices to ensure high efficiency and availability of AI systems, working with NVIDIA's GPU technologies and network infrastructure.

As a senior engineer, you'll be part of a dynamic team that values learning, growth, and innovation. The position requires strong technical expertise in AI infrastructure, distributed systems, and programming languages like Python and C++. You'll work on meaningful projects that directly impact NVIDIA's AI platforms and contribute to a culture of continuous improvement.

The compensation package is competitive, ranging from $184,000 to $356,500 USD base salary, plus equity and benefits. The role offers flexibility with multiple location options including Santa Clara, Austin, Redmond, and remote work possibilities. This is an excellent opportunity for experienced engineers who want to make a significant impact in the AI infrastructure space while working for a global leader in accelerated computing.

Last updated 10 minutes ago

Responsibilities For Senior DGX Cloud AI Infrastructure Software Engineer

  • Develop infrastructure software and tools for large-scale AI, LLM, and GenAI infrastructure
  • Develop and optimize tools to improve infrastructure efficiency and resiliency
  • Root cause and analyze and triage failures from the application level to the hardware level
  • Enhance infrastructure and products underpinning NVIDIA's AI platforms
  • Co-design and implement APIs for integration with NVIDIA's resiliency stacks
  • Define meaningful and actionable reliability metrics to track and improve system and service reliability

Requirements For Senior DGX Cloud AI Infrastructure Software Engineer

Python
Linux
Kubernetes
  • 8+ years of experience in developing software infrastructure for large scale AI systems
  • Bachelor's degree or higher in Computer Science or related technical field
  • Strong debugging skills and experience in analyzing and triaging AI applications
  • Proven track record in building and scaling large-scale distributed systems
  • Experience with AI training and inferencing and data infrastructure services
  • Familiar in operating large-scale observability platforms (ELK, Prometheus, Loki)
  • Proficiency in programming languages such as Python, C/C++, script languages
  • Excellent communication and collaboration skills

Benefits For Senior DGX Cloud AI Infrastructure Software Engineer

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Senior DGX Cloud AI Infrastructure Software Engineer

Senior Deep Learning Software Engineer, Recipe Pathfinding

Senior Deep Learning Software Engineer role at NVIDIA focusing on developing software systems for LLM optimization through recipe pathfinding and efficiency improvements.

Senior Software Engineer, Digital Human Technology

Senior Software Engineer position at NVIDIA focusing on Digital Human Technology, involving AI, machine learning, and high-performance computing.

Senior AI and LLM Solutions Software Engineer

Senior AI/ML engineer role at NVIDIA developing cutting-edge AI and LLM solutions for chip design and verification processes.

Senior Prediction and Planning Machine Learning Engineer - Autonomous Vehicles

Senior ML Engineer role at NVIDIA focusing on autonomous vehicle prediction and planning, requiring 5+ years of ML experience and deep expertise in neural networks.

Senior Software Engineer - Robotics and AI

Senior Software Engineer position at NVIDIA focusing on robotics and AI, developing solutions for humanoid robots and embodied agents.