OCI (Oracle Cloud Infrastructure) AI Infrastructure is seeking a Senior Software Developer to join their cutting-edge GPU platform team. This role is at the forefront of the AI revolution, focusing on building and maintaining systems that enable customers to scale from tens to thousands of GPUs without compromising performance.
The position involves working with advanced technologies including RoCE and Infiniband, designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. This is essential for supporting distributed AI/ML/HPC workloads across massive GPU clusters.
As a senior member of the team, you'll be responsible for building groundbreaking solutions from the ground up, working in a fast-paced, agile environment. The role requires deep technical expertise in distributed systems, strong problem-solving abilities, and excellent collaboration skills.
The ideal candidate will be a self-motivated individual who values simplicity and scalability in design, has strong debugging skills, and can work effectively across various teams including Network and Data Center operations. This is an opportunity to be part of a young, growing team working on ambitious initiatives that are shaping the future of AI infrastructure.
The position offers competitive compensation, comprehensive benefits, and the chance to work on innovative projects at scale. You'll be joining Oracle, a well-established leader in cloud solutions with a 40+ year track record of success, while working on cutting-edge technology that's defining the future of AI computing infrastructure.