Site Reliability Engineer

Electrum Payments

Next-generation payments technology company providing cloud-native software to optimize financial transaction processing since 2012.

Cape Town, South Africa

Site Reliability

Mid-Level Software Engineer

In-Person

3+ years of experience

Finance

This job posting is no longer active. 😔

Job Description

Electrum Payments is a leading payments technology company that has been delivering enterprise-grade payment solutions since 2012. They specialize in cloud-native software for optimizing financial transaction processing, focusing on high-volume, low-value payment schemes. As a Site Reliability Engineer, you'll be at the forefront of ensuring the reliability and performance of critical payment systems that impact millions of South Africans daily.

The role combines traditional IT operations with software engineering expertise, requiring you to build and maintain robust, scalable systems. You'll work on critical tasks including incident prevention, infrastructure management, monitoring system implementation, and ensuring smooth cloud operations. The position offers significant opportunities for personal growth and career progression within a company that values technical excellence.

Key responsibilities include developing reliable applications, managing critical incidents, implementing monitoring solutions, and driving cost-optimization initiatives. You'll also be involved in disaster recovery planning and system performance optimization. The ideal candidate should have strong technical skills in AWS services, observability tools, and a solid background in SRE practices.

The company offers an excellent work environment with a strong focus on work-life balance. Benefits include flexible working hours, daily cooked lunches, and regular team social activities. Electrum fosters a transparent culture where learning from mistakes is encouraged, making it an ideal place for professional growth and development.

Last updated 10 months ago

Responsibilities For Site Reliability Engineer

Monitor, automate, and improve reliability, scalability, performance and availability of services
Collaborate with teams to develop reliable, available, and scalable applications
Participate in on-call rotations and manage critical incidents
Develop and maintain incident response processes and alerting mechanisms
Diagnose and resolve infrastructure and system-level issues
Implement automation tools and frameworks for deployment, configuration, and monitoring processes
Design and implement disaster recovery strategies
Drive cost-optimization initiatives

Requirements For Site Reliability Engineer

Kubernetes

Bachelor's degree in Computer Science, Information Technology, or related field
3+ years experience in an SRE or similar role
Familiarity with AWS services like EC2, S3, RDS, Lambda, EKS and CloudWatch
Experience with observability tools like Elastic and Grafana
Development skills advantageous
Proficient troubleshooting and problem-solving skills
Excellent collaboration, communication, and time management skills
Attention to detail and ability to work effectively in a team environment

Benefits For Site Reliability Engineer

Flexible core working hours
Daily cooked lunches
Stocked kitchen
Team socializing and getaways
Social outings