We are looking for a Principal Site Reliability Engineer to join our OCI team. This role is part of a globally distributed team responsible for detecting, triaging, and mitigating OCI service-impacting events as quickly as possible. You will be part of one of these regional teams and will be responsible for minimizing the downtime of OCI services. You will achieve this by delivering excellent major incident management and operating systems with high scalability, performance, and security that help prevent incidents from occurring.
Oracle's Cloud is state-of-the-art and constantly evolving. When issues arise, your team will respond within minutes to ensure customer impact is minimized. This role will expose you to the inner workings of OCI's systems and organization. You will interact with and influence leaders across Oracle and drive broad, cross-organization programs aimed at iteratively improving OCI-wide service availability. We are an agile team with significant impact.
As a Principal SRE, you will be responsible for the design and delivery of mission-critical infrastructure, focusing on security, resiliency, scale, and performance. You will work closely with development teams to improve service architecture and implement best practices for cloud operations. The role requires deep technical expertise in cloud platforms, automation, and modern DevOps practices, making you a key contributor to Oracle's cloud infrastructure reliability and performance.