Taro Logo

Senior DGX Cloud AI Infrastructure Software Engineer

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
Cloud
Senior Software Engineer
Hybrid
5,000+ Employees
8+ years of experience
AI · Enterprise SaaS

Description For Senior DGX Cloud AI Infrastructure Software Engineer

Join NVIDIA's DGX Cloud Team and contribute to the cutting-edge infrastructure powering AI innovation. As a Senior DGX Cloud AI Infrastructure Software Engineer, you'll be at the forefront of designing and building systems that enable large-scale AI training and inferencing. The role combines deep technical expertise with the opportunity to shape the future of AI infrastructure.

NVIDIA, the inventor of the GPU and leader in accelerated computing, offers a unique environment where you'll work on groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. The position involves developing and optimizing infrastructure software for AI systems, implementing resilient architectures, and ensuring high availability of critical AI platforms.

You'll be part of a dynamic team that values learning, growth, and innovation. The role offers significant autonomy while providing the support and mentorship needed to succeed. You'll work with state-of-the-art technology, including NVIDIA GPUs, advanced network technologies, and modern AI frameworks like PyTorch, TensorFlow, and JAX.

The ideal candidate combines strong technical skills in distributed systems and AI infrastructure with excellent problem-solving abilities. You'll have the opportunity to impact the efficiency and reliability of AI systems used by researchers and developers worldwide. The position offers exposure to cutting-edge AI technologies and the chance to work on meaningful projects that advance the field of artificial intelligence.

NVIDIA's culture promotes intellectual curiosity, diversity, and open collaboration. The company's work extends beyond traditional software development, opening up new universes to explore and enabling innovations from artificial intelligence to autonomous vehicles. If you're passionate about building the infrastructure that powers the next wave of AI innovation, this role offers an exciting opportunity to make a significant impact.

Last updated a day ago

Responsibilities For Senior DGX Cloud AI Infrastructure Software Engineer

  • Develop infrastructure software and tools for large-scale AI, LLM, and GenAI infrastructure
  • Develop and optimize tools to improve infrastructure efficiency and resiliency
  • Root cause and analyze and triage failures from the application level to the hardware level
  • Enhance infrastructure and products underpinning NVIDIA's AI platforms
  • Co-design and implement APIs for integration with NVIDIA's resiliency stacks
  • Define meaningful and actionable reliability metrics to track and improve system and service reliability

Requirements For Senior DGX Cloud AI Infrastructure Software Engineer

Python
Linux
  • Bachelor's degree or higher in Computer Science or related technical field
  • 8+ years of experience in developing software infrastructure for large scale AI systems
  • Strong debugging skills and experience in analyzing and triaging AI applications
  • Proven track record in building and scaling large-scale distributed systems
  • Experience with AI training and inferencing and data infrastructure services
  • Familiar with operating large-scale observability platforms (ELK, Prometheus, Loki)
  • Proficiency in programming languages such as Python, C/C++, script languages
  • Excellent communication and collaboration skills

Interested in this job?

Jobs Related To NVIDIA Senior DGX Cloud AI Infrastructure Software Engineer