Senior Principal Software Engineer - GPU Cluster Performance and Benchmark Engineering

A world leader in cloud solutions, using tomorrow's technology to tackle today's challenges. Partners with industry-leaders in almost every sector and has been operating with integrity for 40+ years.
$96,800 - $251,600
Machine Learning
Principal Software Engineer
In-Person
5,000+ Employees
10+ years of experience
AI · Enterprise SaaS · Cloud

Description For Senior Principal Software Engineer - GPU Cluster Performance and Benchmark Engineering

We are seeking a highly skilled and experienced Large GPU Cluster Performance and Benchmark Engineer to join our advanced technology team as a Senior Principal. This role focuses on designing, optimizing, and benchmarking large-scale GPU clusters, specifically running MLPerf benchmarks from MLCommons across thousands of NVIDIA and AMD GPUs. The position offers an opportunity to be at the forefront of GPU performance benchmarking and large-scale infrastructure design, working with cutting-edge technology and a highly skilled team.

The role involves leading performance optimization for AI/ML workloads, conducting comprehensive STAC benchmarks, and architecting solutions for high-performance computing environments. You'll collaborate with cross-functional teams to develop innovative GPU cluster architectures while serving as a technical thought leader in the field.

As part of Oracle, a world leader in cloud solutions, you'll have access to extensive resources and opportunities to work on challenging projects that push the boundaries of technology. The position offers competitive compensation ($96,800 - $251,600) and comprehensive benefits including medical, dental, vision insurance, 401(k) with company match, flexible vacation, and parental leave.

The ideal candidate will bring 10+ years of experience in GPU cluster architecture and benchmarking, strong programming skills, and expertise in container orchestration and cloud infrastructure. You'll need to demonstrate exceptional analytical abilities and strong communication skills to succeed in this collaborative environment.

This role represents an excellent opportunity for an experienced professional looking to make significant contributions to the advancement of GPU cluster performance and AI/ML infrastructure at a global technology leader. Join us in shaping the future of cloud computing and artificial intelligence technologies.

Last updated 20 hours ago

Responsibilities For Senior Principal Software Engineer - GPU Cluster Performance and Benchmark Engineering

  • Execute and lead performance benchmarking of large-scale GPU clusters using MLPerf
  • Conduct end-to-end STAC benchmarks for compute and storage performance
  • Design and architect complex solutions leveraging OCI services
  • Collaborate with cross-functional teams on GPU cluster architectures
  • Serve as thought leader in GPU cluster performance optimization
  • Mentor junior engineers
  • Stay current with industry trends in GPU performance and benchmarking
  • Drive innovation in large-scale GPU clusters

Requirements For Senior Principal Software Engineer - GPU Cluster Performance and Benchmark Engineering

Python
Kubernetes
  • 10+ years of experience
  • Experience running MLPerf benchmarks across large-scale environments with thousands of NVIDIA and AMD GPUs
  • Expertise in conducting STAC benchmarks
  • Strong knowledge of GPU architectures and parallel computing
  • Proficiency with Oracle Cloud Infrastructure (OCI) services
  • Expertise in container orchestration
  • Strong programming skills in Python, C++, or CUDA
  • English language proficiency

Benefits For Senior Principal Software Engineer - GPU Cluster Performance and Benchmark Engineering

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
  • Medical, dental, and vision insurance
  • Short term and long term disability
  • Life insurance and AD&D
  • Health care and dependent care Flexible Spending Accounts
  • Pre-tax commuter and parking benefits
  • 401(k) with company match
  • Flexible Vacation
  • 11 paid holidays
  • Paid sick leave
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan

Interested in this job?

Jobs Related To Oracle Senior Principal Software Engineer - GPU Cluster Performance and Benchmark Engineering

Principal AI Developer-ACS Business Process

Principal AI Developer position at Oracle focusing on developing AI solutions using OCI GenAI/RAG functionality and Cohere LLM models.

Principal Software Engineer - AI/ML Platform

Principal Software Engineer position at Oracle Health Data Intelligence, focusing on AI/ML platform development with competitive compensation and comprehensive benefits.

Principal Data Scientist - Oracle Health Applications & Infrastructure

Principal Data Scientist role at Oracle Health, focusing on AI/ML solutions for healthcare, offering $109K-$223K salary with comprehensive benefits.

Principal AI Engineer

Principal AI Engineer role at Oracle, leading AI solutions for NetSuite ecosystem. 10+ years experience required, focus on multi-agent systems and LLMs.

Senior Principal Machine Learning Engineer

Senior Principal Machine Learning Engineer role at Oracle focusing on building scalable AI/ML platforms and distributed systems.