Taro Logo

Senior Performance Research and Analysis Engineer

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins transforming major industries.
Senior Software Engineer
Remote
5+ years of experience
AI
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Senior Performance Research and Analysis Engineer

NVIDIA, the world leader in accelerated computing, is seeking a Senior Performance Research and Analysis Engineer to join their Performance group. This role focuses on profiling and analyzing AI workloads on large-scale GPU and CPU clusters, specifically for distributed Deep Learning LLM training.

The position offers a unique opportunity to work with cutting-edge hardware and platforms, including HCAs, Switches, CPUs, GPUs, and Systems. You'll be at the forefront of performance optimization for AI systems, developing and implementing analysis tools and methodologies to understand performance expectations, limitations, and bottlenecks.

Key responsibilities include researching AI workloads and DL models for large-scale training, conducting comprehensive performance analysis, and collaborating across hardware and software teams. The role requires expertise in high-performance networking, with a focus on RDMA and MPI, along with strong programming skills in Python, Bash, and C.

The ideal candidate will have at least 5 years of experience in high-performance networking, a strong background in computer science or software engineering, and demonstrated expertise with NVIDIA GPUs and deep learning frameworks. This position offers the opportunity to work with state-of-the-art technology and contribute to advancing AI computing performance.

NVIDIA provides a diverse and inclusive work environment, offering the chance to work on transformative technology that impacts major industries worldwide. The remote work option across multiple European locations provides flexibility while working with global teams on cutting-edge AI and computing challenges.

Last updated 7 months ago

Responsibilities For Senior Performance Research and Analysis Engineer

  • Profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training
  • Research AI workloads and DL models for large-scale deep learning LLM training
  • Benchmark, profile, and analyze performance to find bottlenecks
  • Implement performance analysis tools
  • Collaborate with hardware and software teams
  • Define performance test planning and set performance expectations

Requirements For Senior Performance Research and Analysis Engineer

Python
Linux
  • B.Sc in Computer Science or Software Engineering
  • 5+ years of experience with high-performance Networking (RDMA, MPI)
  • Performance Analysis skills and methodologies
  • Experience with NVIDIA GPUs, CUDA library, deep learning frameworks
  • Fast and self-learning capabilities with strong analytical skills
  • Programming Languages: Python, Bash and C languages
  • Experience with Linux OS distros
  • Team player with good communication skills

Interested in this job?