Taro Logo

Site Reliability Engineer

Next-generation payments technology company providing cloud-native software to optimize financial transaction processing since 2012.
Site Reliability
Mid-Level Software Engineer
In-Person
3+ years of experience
Finance
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Site Reliability Engineer

Electrum Payments is a leading payments technology company that has been delivering enterprise-grade payment solutions since 2012. They specialize in cloud-native software for optimizing financial transaction processing, focusing on high-volume, low-value payment schemes. As a Site Reliability Engineer, you'll be at the forefront of ensuring the reliability and performance of critical payment systems that impact millions of South Africans daily.

The role combines traditional IT operations with software engineering expertise, requiring you to build and maintain robust, scalable systems. You'll work on critical tasks including incident prevention, infrastructure management, monitoring system implementation, and ensuring smooth cloud operations. The position offers significant opportunities for personal growth and career progression within a company that values technical excellence.

Key responsibilities include developing reliable applications, managing critical incidents, implementing monitoring solutions, and driving cost-optimization initiatives. You'll also be involved in disaster recovery planning and system performance optimization. The ideal candidate should have strong technical skills in AWS services, observability tools, and a solid background in SRE practices.

The company offers an excellent work environment with a strong focus on work-life balance. Benefits include flexible working hours, daily cooked lunches, and regular team social activities. Electrum fosters a transparent culture where learning from mistakes is encouraged, making it an ideal place for professional growth and development.

Last updated 8 months ago

Responsibilities For Site Reliability Engineer

  • Monitor, automate, and improve reliability, scalability, performance and availability of services
  • Collaborate with teams to develop reliable, available, and scalable applications
  • Participate in on-call rotations and manage critical incidents
  • Develop and maintain incident response processes and alerting mechanisms
  • Diagnose and resolve infrastructure and system-level issues
  • Implement automation tools and frameworks for deployment, configuration, and monitoring processes
  • Design and implement disaster recovery strategies
  • Drive cost-optimization initiatives

Requirements For Site Reliability Engineer

Kubernetes
  • Bachelor's degree in Computer Science, Information Technology, or related field
  • 3+ years experience in an SRE or similar role
  • Familiarity with AWS services like EC2, S3, RDS, Lambda, EKS and CloudWatch
  • Experience with observability tools like Elastic and Grafana
  • Development skills advantageous
  • Proficient troubleshooting and problem-solving skills
  • Excellent collaboration, communication, and time management skills
  • Attention to detail and ability to work effectively in a team environment

Benefits For Site Reliability Engineer

  • Flexible core working hours
  • Daily cooked lunches
  • Stocked kitchen
  • Team socializing and getaways
  • Social outings

Interested in this job?