Site Reliability Engineer

Tecsys Inc.

Tecsys is a global supply chain technology company that helps organizations achieve operational excellence through smarter supply chains.

Bengaluru, Karnataka, India

Site Reliability

Senior Software Engineer

Hybrid

501 - 1,000 Employees

5+ years of experience

Enterprise SaaS · Logistics

Job Description

Tecsys, a global supply chain technology company, is expanding its presence with a new office in Bangalore, India. They're seeking a Site Reliability Engineer to join their Network and Security Operations Center (NSOC) team. This role involves working with a high degree of autonomy while collaborating globally with teams across different time zones, particularly in North America. The position focuses on improving platform reliability and uptime through data-driven approaches, implementing automation, and maintaining critical infrastructure. The ideal candidate will have strong experience in systems engineering, cloud platforms (AWS/Azure), and automation tools. The role offers the opportunity to work on large-scale systems while contributing to the company's 24/7 "follow the sun" global support model. The position requires flexibility in working hours to accommodate international collaboration and includes on-call responsibilities. This is an excellent opportunity for an experienced SRE to join a growing global team that's transforming supply chain technology while working with modern tools and practices including CI/CD, monitoring systems like Datadog, and cloud platforms. The role combines technical expertise with cross-functional collaboration, making it ideal for someone who enjoys both technical challenges and team interaction.

Last updated a month ago

Responsibilities For Site Reliability Engineer

Collaborate with Engineering teams to support services through system design consulting, developing platforms and frameworks
Maintain services by measuring and monitoring availability, latency and system health
Develop tools & automation on top of Azure & AWS
Scale systems through automation and improve reliability
Practice sustainable incident response and blameless postmortems
Implement CI/CD automation
Implement monitoring, logging, alerting, and SLA Reporting
Create and maintain technical documentation
Take command of high-severity incidents
Collaborate with Platform Engineering team

Requirements For Site Reliability Engineer

Java

Kubernetes

Linux

Bachelor's degree in computer science or related technical discipline
5+ years systems engineering experience
Experience designing and deploying large scale systems
Strong knowledge of system design and high performance computing
Experience with full stack automation
Knowledge of Datadog or similar tools
Knowledge and experience of AWS or Azure required
Basic knowledge of Java- or .Net-based development
Knowledge of GitLab or Jenkins
Proficient English communication skills
Experience with SaaS company preferred
Experience with FedRamp compliance is an asset