Apple's Manufacturing Systems & Infrastructure (MSI) team is seeking a Senior Site Reliability Engineer to play a critical role in maintaining and enhancing the reliability of production systems. This role involves collaborating with engineering teams to design, implement, and monitor infrastructure and services, employing expertise in automation and performance optimization.
Key responsibilities include:
- Designing, developing, and maintaining scalable, reliable, and efficient infrastructure
- Implementing monitoring, alerting, and logging systems
- Automating tasks and improving system efficiency
- Collaborating with development teams to improve service reliability
- Conducting root cause analysis of system failures
- Participating in on-call rotations and incident response
- Driving continuous improvement initiatives
- Mentoring junior team members
The ideal candidate will have:
- 7+ years of experience in site reliability engineering, DevOps, or a related field
- Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
- Strong experience with cloud platforms (AWS, Google Cloud Platform, or Microsoft Azure)
- Proficiency in infrastructure as code tools (Terraform, Ansible, or CloudFormation)
- Expertise in containerization and orchestration (Docker, Kubernetes, HELM)
- Experience with CI/CD pipelines and tools (Jenkins, ArcoCD)
- Strong scripting and programming skills (Python, Go, Shell, or Ruby)
- Knowledge of monitoring tools (Prometheus, Grafana, Open Telemetry, Splunk)
- Familiarity with version control systems (Git)
- Solid understanding of Linux/Unix system administration and networking
- Experience with database management and optimization
- Knowledge of message brokers and streaming platforms
This role offers a competitive base pay range of $175,800 to $264,200, along with additional benefits including stock options, comprehensive medical and dental coverage, retirement benefits, and educational reimbursement opportunities.