OCI (Oracle Cloud Infrastructure) AI Infrastructure is building a cutting-edge, ultra-high-performance GPU platform for AI/ML/HPC workloads. The team is responsible for designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These are essential for running distributed AI/ML/HPC workloads across thousands of GPUs, leveraging technologies like RoCE and Infiniband.
The role offers opportunities to work on innovative projects, building groundbreaking solutions from the ground up. You'll be part of a young, fast-growing team working on ambitious initiatives in a dynamic, agile environment where learning and adaptability are key.
We're seeking adaptable, self-motivated engineers with deep understanding of distributed systems and algorithms. The ideal candidate should be comfortable diving deep into any part of the stack, excel at software debugging and low-level systems troubleshooting, and value simplicity and scalability in design and implementation.
This position offers competitive compensation ($79,800-$178,100) with comprehensive benefits including medical, dental, vision insurance, 401(k) with company match, flexible vacation, and parental leave. Join us in pushing the boundaries of AI technology while working with cutting-edge GPU infrastructure at scale.