Taro Logo

Site Reliability Engineer (SRE)

AI-powered Personal & Entrepreneurial Resource Planner (PRP) company with over 100 million downloads worldwide, founded in 2018.
Site Reliability
Senior Software Engineer
In-Person
4+ years of experience
AI · Enterprise SaaS
This job posting is no longer active.

Job Description

Air Apps is an innovative technology company founded in 2018, with offices in Lisbon and San Francisco, focused on developing an AI-powered Personal & Entrepreneurial Resource Planner (PRP). With over 100 million downloads worldwide, they're seeking a Site Reliability Engineer (SRE) to join their Platform DevOps Squad in Lisbon.

The role demands a seasoned professional with 4+ years of experience in SRE, DevOps, or System Engineering. The ideal candidate will be responsible for ensuring system reliability, availability, and scalability across cloud environments. Key responsibilities include implementing automation, monitoring solutions, and performance optimization strategies using tools like Prometheus, Grafana, and Terraform.

This position offers an excellent opportunity to work with cutting-edge technologies and contribute to a rapidly growing platform. The successful candidate will be involved in critical aspects of infrastructure management, from designing fault-tolerant systems to optimizing cloud costs and maintaining high availability.

The company offers competitive benefits including flexible working hours, Apple hardware, health insurance, and unique perks like the Air Conference 2025 in Las Vegas. Air Apps maintains a strong commitment to diversity and inclusion, fostering an environment where varied perspectives are valued and celebrated.

For engineers passionate about infrastructure, automation, and system reliability, this role presents an exciting chance to make a significant impact while working with a dynamic, forward-thinking team that's reshaping resource management through AI-driven solutions.

Last updated 3 months ago

Responsibilities For Site Reliability Engineer (SRE)

  • Design and implement scalable, reliable, and fault-tolerant systems across cloud environments
  • Develop and maintain observability tools, including monitoring, logging, and alerting
  • Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code
  • Optimize system performance, scalability, and incident response workflows
  • Work closely with development and DevOps teams to improve system design
  • Conduct root cause analysis and implement preventative measures
  • Ensure high availability through load balancing, failover, and disaster recovery strategies
  • Improve CI/CD pipelines
  • Optimize cloud cost and resource utilization
  • Participate in on-call rotations

Requirements For Site Reliability Engineer (SRE)

Python
Go
Kubernetes
Linux
  • 4+ years of experience in Site Reliability Engineering, DevOps, or System Engineering
  • Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures
  • Experience with observability and monitoring tools
  • Proficiency in Infrastructure as Code tools
  • Hands-on experience with containerization and orchestration
  • Strong Linux system administration and networking fundamentals
  • Experience with incident management, debugging, and root cause analysis
  • Proficiency in scripting (Bash, Python, or Go)
  • Knowledge of load balancing, failover strategies, and distributed systems
  • Understanding of security best practices
  • Strong communication skills

Benefits For Site Reliability Engineer (SRE)

Medical Insurance
Commuter Benefits
  • Remote-first approach with flexible working hours
  • Apple hardware ecosystem for work
  • Flexible Paid Time Off (PTO)
  • Annual Bonus
  • Top-tier Health Insurance
  • Public Transportation Pass
  • Coverflex benefits package
  • Air Conference 2025 in Las Vegas