Principal Software Developer - AI Infra Compute

Oracle is a world leader in cloud solutions, using tomorrow's technology to tackle today's challenges. They've partnered with industry-leaders in almost every sector and have been operating with integrity for over 40 years.
$96,800 - $223,400
Distributed Systems
Principal Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS

Description For Principal Software Developer - AI Infra Compute

OCI (Oracle Cloud Infrastructure) AI Infrastructure is at the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI/ML/HPC workloads. This is an opportunity to be part of the AI revolution, creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance.

The team is responsible for designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These are essential for running distributed AI/ML/HPC workloads across thousands of GPUs, leveraging technologies like RoCE and Infiniband.

As a Principal Software Developer, you'll be working on innovative projects building groundbreaking solutions from the ground up. You'll be part of a young, fast-growing team working on ambitious new initiatives in a dynamic, agile environment where learning and adaptability are key.

The role requires a self-motivated individual with strong technical excellence in distributed systems and algorithms. You should be comfortable diving deep into any part of the stack, as well as software debugging and low-level systems troubleshooting. The ideal candidate values simplicity and scalability in design and implementation, and can collaborate effectively with various dependencies, including Network and Data Center operations.

This position offers competitive compensation ($96,800 - $223,400) along with comprehensive benefits including medical/dental/vision insurance, 401(k) with company match, flexible vacation, and parental leave. Join Oracle's AI Infrastructure team and be part of pushing the boundaries of AI technology while working with cutting-edge GPU systems and distributed computing challenges.

Last updated an hour ago

Responsibilities For Principal Software Developer - AI Infra Compute

  • Designing, implementing, and delivering software, firmware for managing GPU based AI servers
  • Working closely with partner teams to deliver high quality software to manage, triage and repair GPU systems
  • Working closely with product teams to debug, resolve customer's issues

Requirements For Principal Software Developer - AI Infra Compute

Linux
MySQL
Redis
Go
Java
Python
  • BS or MS degree in Computer Science or relevant technical field
  • Deep understanding of operating systems, computer networks, and high-performance applications
  • 6+ years experience delivering and operating large-scale production systems
  • Proficient in one programming language (java/python/c/c++/goLang/shell scripting)
  • Strong background in Linux systems
  • Experience with Server/GPU hardware architecture and system management
  • Experience with Infiniband or RoCE networking
  • Good understanding of databases and SQL (MySQL) and caching technologies

Benefits For Principal Software Developer - AI Infra Compute

401k
Medical Insurance
Dental Insurance
Vision Insurance
Parental Leave
  • Medical, dental, and vision insurance
  • Short term and long term disability
  • Life insurance and AD&D
  • Health care and dependent care Flexible Spending Accounts
  • 401(k) Savings with company match
  • Flexible Vacation
  • 11 paid holidays
  • Paid sick leave
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan

Interested in this job?

Jobs Related To Oracle Principal Software Developer - AI Infra Compute

Principal Member Technical Staff (JoinOCI-SDE)

Principal Technical Staff position at Oracle Cloud Infrastructure focusing on distributed systems, deployment automation, and cloud infrastructure development in Nashville, TN.

Software Development Director

Senior software leadership role at Oracle Cloud Infrastructure (OCI) focusing on infrastructure initiatives and team management, requiring 10+ years of experience.

Software Developer 4

Principal Software Engineer role at Oracle Cloud Infrastructure (OCI) focusing on distributed systems, network monitoring, and analytics solutions.

Principal Member of Technical Staff Engineer

Principal Software Engineer role at Oracle Health, focusing on distributed systems and cloud infrastructure for healthcare applications.

Principal Software Developer - Virtual Networking

Principal Software Developer position at Oracle focusing on virtual networking and infrastructure management for cloud services.