Senior AI Cluster Tools Developer

NVIDIA is the world leader in accelerated computing, pioneering solutions for challenges no one else can solve. Their work in AI and digital twins is transforming major industries and impacting society.
$148,000 - $276,000
Backend
Senior Software Engineer
Hybrid
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior AI Cluster Tools Developer

NVIDIA is seeking a Senior AI Cluster Tools Developer to join their multifaceted software team. This role involves developing tools for GPU Cluster users and admins, working with various departments like Architecture and Software teams. The successful candidate will build internal performance/power profiling and analysis tools for AI workloads at cluster scale, create debugging tools for common GPU cluster problems, and collaborate with users to build/calibrate performance/power models for next-generation hardware or systems.

Key responsibilities include:

  • Developing internal perf/power profiling and analysis tools for AI workloads at cluster scale
  • Creating debugging tools for common GPU cluster issues
  • Collaborating with users to build and calibrate perf/power models
  • Partnering with architects to propose new hardware features or improve existing ones

Requirements:

  • BS+ in Computer Science or related field (or equivalent experience)
  • 5+ years of software development experience
  • Strong software design and implementation skills with Python/Go/C++
  • Good understanding of Deep Learning and AI frameworks (PyTorch, TensorFlow, etc.)
  • Knowledge of AI cluster job scheduling, storage management, and networking management
  • Linux kernel knowledge
  • Excellent problem-solving and project management skills

Preferred qualifications:

  • Experience in GPU cluster scale continuous profiling & analysis tools/platforms
  • Solid experience in large AI job troubleshooting and failure detection/recovery
  • Skillful in Deep Learning application performance analysis and optimization
  • Knowledge of GPU/CPU architecture and application performance or power efficiency analysis

NVIDIA offers competitive salaries, comprehensive benefits, and the opportunity to work with some of the most brilliant and talented people in the world. The company is committed to fostering a diverse work environment and is an equal opportunity employer.

Last updated 17 days ago

Responsibilities For Senior AI Cluster Tools Developer

  • Build internal perf/power profiling and analysis tools and platform for AI workloads at cluster scale
  • Build debugging tools for common encountered problems in GPU cluster
  • Work with users to build / calibrate perf/power models for next generation HW or system
  • Partner with architects to propose new HW features or improve existing features with real world use cases

Requirements For Senior AI Cluster Tools Developer

Python
Go
Linux
  • BS+ in Computer Science or related (or equivalent experience)
  • 5+ years of software development experience
  • Strong software design and implementation ability with Python/Go/C++
  • Good understanding of Deep Learning and AI frameworks like PyTorch, TensorFlow
  • Knowledge of AI cluster job scheduling, storage management and networking management
  • Knowledge of Linux kernel
  • Excellent problem solving skills and project management skills

Benefits For Senior AI Cluster Tools Developer

Equity
  • Equity
  • Comprehensive benefits package

Interested in this job?

Jobs Related To NVIDIA Senior AI Cluster Tools Developer

Senior Web Solutions Engineer

Senior Web Solutions Engineer at Google, leading YouTube's technical infrastructure development and data-driven improvements.

Senior Software Developer, Google Cloud Platforms

Senior Software Developer role at Google Cloud Platforms, focusing on developing next-generation technologies for Google's cloud infrastructure.

Senior Software Engineer, Full Stack

Senior Software Engineer, Full Stack at Google, building innovative AI solutions for cloud security and compliance.

Senior Software Engineer, Labs

Senior Software Engineer position at Google Labs, focusing on developing innovative technologies and products.

Senior Software Engineer, Full Stack, Google Cloud Business Platforms

Senior Full Stack Software Engineer role at Google Cloud, developing cutting-edge technologies for enterprise solutions.