Taro Logo

Principal Site Reliability Engineer

As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's problems.
Site Reliability
Principal Software Engineer
In-Person
5,000+ Employees
8+ years of experience
Enterprise SaaS · Cloud

Job Description

At Oracle Cloud Infrastructure (OCI), we build the more intelligent future of cloud. As a Principal Site Reliability Engineer, you will be responsible for the operation of production environments, including systems and databases, supporting critical business operations for Singapore's governmental sovereign cloud environment. You will focus on automation and optimization of operations for multiple production environments, recommending new solutions to improve availability, performance, and supportability.

The role combines deep technical knowledge with administration/analysis of Oracle's Cloud Infrastructure to provide escalation support for complex production environment problems. You'll tackle challenges related to immense growth, scaling, cloud leveraging, high performance, and high availability requirements. As a senior technical leader, you'll guide junior engineers, participate in large-scale incident management, and help optimize processes and procedures.

This position requires Singaporean citizenship and security clearance due to its work with government projects. You'll be part of a team that operates on a rotational shift basis, ensuring 24/7 support for critical infrastructure. The role offers opportunities to work with cutting-edge cloud technologies while contributing to national infrastructure projects.

Oracle offers a comprehensive benefits package including medical insurance, life insurance, and retirement options. We promote work-life balance and provide opportunities for community involvement through volunteer programs. As a world leader in cloud solutions, we use tomorrow's technology to tackle today's challenges, making this an excellent opportunity for experienced SRE professionals looking to make a significant impact.

The ideal candidate will bring extensive experience in Linux system administration, cloud operations, and modern DevOps tools, combined with strong leadership abilities and a track record of solving complex technical challenges at scale.

Last updated 7 days ago

Responsibilities For Principal Site Reliability Engineer

  • Development of automation and optimization focused on operational excellence
  • Deep dive, root cause and solve for systemic issues
  • Install, monitor, maintain, support, and optimize all production server hardware and software
  • Provide escalated technical support for complex technical issues
  • Coordinate escalated support cases and lead technical resources
  • Lead communications with key partners in solving complex technical problems
  • Provide technical guidance and leadership to junior members

Requirements For Principal Site Reliability Engineer

Linux
Kubernetes
  • Experience with Linux System Administration, Networking, Storage, Compute, and Virtualization
  • Understanding and experience with Kubernetes, Terraform, Ansible, Chef and Puppet
  • Experience participating in or running incident bridges of significant scale
  • Experience in SRE, cloud technical support, cloud operations, NOC or similar
  • Must be a Singaporean citizen
  • Security clearance required
  • 6 to 10+ years of experience

Benefits For Principal Site Reliability Engineer

Medical Insurance
Vision Insurance
Dental Insurance
  • Flexible medical insurance
  • Life insurance
  • Retirement options
  • Volunteer programs