OCI (Oracle Cloud Infrastructure) AI Infrastructure is at the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI/ML/HPC workloads. This is an opportunity to be part of the AI revolution, creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance.
The team is responsible for designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These are essential for running distributed AI/ML/HPC workloads across thousands of GPUs, leveraging technologies like RoCE and Infiniband.
As a Principal Software Developer, you'll be working on innovative projects building groundbreaking solutions from the ground up. You'll be part of a young, fast-growing team working on ambitious new initiatives in a dynamic, agile environment where learning and adaptability are key.
The role requires a self-motivated individual with strong technical excellence in distributed systems and algorithms. You should be comfortable diving deep into any part of the stack, as well as software debugging and low-level systems troubleshooting. The ideal candidate values simplicity and scalability in design and implementation, and can collaborate effectively with various dependencies, including Network and Data Center operations.
This position offers competitive compensation ($96,800 - $223,400) along with comprehensive benefits including medical/dental/vision insurance, 401(k) with company match, flexible vacation, and parental leave. Join Oracle's AI Infrastructure team and be part of pushing the boundaries of AI technology while working with cutting-edge GPU systems and distributed computing challenges.