OCI (Oracle Cloud Infrastructure) AI Infrastructure is building a cutting-edge, ultra-high-performance GPU platform for AI/ML/HPC workloads. The role focuses on designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These are essential for running distributed AI/ML/HPC workloads across thousands of GPUs, leveraging technologies like RoCE and Infiniband.
The position offers the opportunity to work on innovative projects building groundbreaking solutions from the ground up, be part of a young, fast-growing team working on ambitious initiatives, and collaborate in a vibrant, agile environment where learning and adaptability are key.
We're seeking self-motivated engineers with strong distributed systems expertise who can quickly adapt and learn. The ideal candidate will have rock-solid development skills, deep understanding of distributed systems and algorithms, and be comfortable with low-level systems troubleshooting. They should value simplicity and scalability in design while being able to collaborate effectively across teams.
The role offers competitive compensation ($96,800 - $223,400) plus comprehensive benefits including medical/dental/vision insurance, 401(k) with match, flexible vacation, and parental leave. This is an opportunity to be part of Oracle's cutting-edge AI infrastructure team while working on challenging technical problems at scale.
Join us to push the boundaries of AI technology while building critical infrastructure used by customers worldwide. The position requires 6+ years of experience and offers both technical challenges and career growth opportunities.