Taro Logo

Lead Site Reliability Engineer – Cloud Platform (AWS)

Toyota Financial Services (TFS) is the finance and insurance brand for Toyota and Lexus in North America, delivering best-in-class customer experience.
Plano, TX, USA
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
7+ years of experience
Finance · Enterprise SaaS

Job Description

Toyota Financial Services (TFS) is seeking a Lead Site Reliability Engineer to spearhead their cloud platform operations on AWS. This role sits at the intersection of infrastructure management and software engineering, focusing on scaling and supporting the reliability, automation, and observability of TFS's AWS infrastructure. The position requires a deep understanding of cloud infrastructure and SRE best practices, with responsibilities spanning from operating cloud-native infrastructure to implementing self-healing automation workflows. The ideal candidate will bring 7+ years of relevant experience and strong expertise in AWS services, particularly EKS, Lambda, and CloudWAN. The role offers comprehensive benefits including healthcare, 401(k) with company match, and professional development opportunities. As part of Toyota, one of the world's most admired brands, you'll work in a collaborative environment focused on innovation and delivering best-in-class customer experiences. The position is based in Plano, Texas, and offers the opportunity to work with cutting-edge cloud technologies while contributing to Toyota's vision of moving people beyond what's possible.

Last updated 18 days ago

Responsibilities For Lead Site Reliability Engineer – Cloud Platform (AWS)

  • Operate and optimize cloud-native infrastructure in AWS, with focus on EKS, Lambda, CloudWAN, Systems Manager, and ECR
  • Build and maintain self-healing automation workflows
  • Create and manage AWS Systems Manager Automation Documents
  • Define and track SLIs/SLOs and error budgets
  • Implement observability using Dynatrace and AWS-native tools
  • Develop and maintain infrastructure as code using Terraform
  • Enhance and support CI/CD pipelines using GitHub and Harness
  • Participate in incident management and on-call rotations
  • Lead blameless postmortems
  • Collaborate with cloud development teams
  • Troubleshoot cloud infrastructure and networking issues

Requirements For Lead Site Reliability Engineer – Cloud Platform (AWS)

Python
Kubernetes
  • 7+ years of experience in SRE, DevOps, or Cloud Infrastructure roles
  • Solid understanding of SRE principles: SLIs, SLOs, error budgets, incident response
  • Hands-on experience with AWS services
  • Strong knowledge of network architecture and protocols within AWS
  • Experience building automated remediation and self-healing systems
  • Proficiency with Terraform, Python, Bash, and infrastructure as code principles
  • Experience with CI/CD tools and observability platforms
  • Familiarity with ITSM processes and cloud security best practices
  • Excellent troubleshooting, problem-solving, and collaboration skills

Benefits For Lead Site Reliability Engineer – Cloud Platform (AWS)

401k
Medical Insurance
Dental Insurance
Vision Insurance
Education Budget
Parental Leave
  • Professional growth and development programs
  • Tuition reimbursement
  • Team Member Vehicle Purchase Discount
  • Toyota Team Member Lease Vehicle Program
  • Comprehensive health care and wellness plans
  • 401(k) Savings Plan with company match
  • Annual retirement contribution
  • Paid holidays and paid time off
  • Tax Advantaged Accounts
  • Relocation assistance

Related Jobs