Site Reliability Engineer (SRE)

Air Apps

AI-powered Personal & Entrepreneurial Resource Planner (PRP) company with over 100 million downloads worldwide, founded in 2018.

Lisbon, Portugal

Site Reliability

Senior Software Engineer

In-Person

4+ years of experience

AI · Enterprise SaaS

This job posting is no longer active.

Job Description

Air Apps is an innovative technology company founded in 2018, with offices in Lisbon and San Francisco, focused on developing an AI-powered Personal & Entrepreneurial Resource Planner (PRP). With over 100 million downloads worldwide, they're seeking a Site Reliability Engineer (SRE) to join their Platform DevOps Squad in Lisbon.

The role demands a seasoned professional with 4+ years of experience in SRE, DevOps, or System Engineering. The ideal candidate will be responsible for ensuring system reliability, availability, and scalability across cloud environments. Key responsibilities include implementing automation, monitoring solutions, and performance optimization strategies using tools like Prometheus, Grafana, and Terraform.

This position offers an excellent opportunity to work with cutting-edge technologies and contribute to a rapidly growing platform. The successful candidate will be involved in critical aspects of infrastructure management, from designing fault-tolerant systems to optimizing cloud costs and maintaining high availability.

The company offers competitive benefits including flexible working hours, Apple hardware, health insurance, and unique perks like the Air Conference 2025 in Las Vegas. Air Apps maintains a strong commitment to diversity and inclusion, fostering an environment where varied perspectives are valued and celebrated.

For engineers passionate about infrastructure, automation, and system reliability, this role presents an exciting chance to make a significant impact while working with a dynamic, forward-thinking team that's reshaping resource management through AI-driven solutions.

Last updated 3 months ago

Responsibilities For Site Reliability Engineer (SRE)

Design and implement scalable, reliable, and fault-tolerant systems across cloud environments
Develop and maintain observability tools, including monitoring, logging, and alerting
Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code
Optimize system performance, scalability, and incident response workflows
Work closely with development and DevOps teams to improve system design
Conduct root cause analysis and implement preventative measures
Ensure high availability through load balancing, failover, and disaster recovery strategies
Improve CI/CD pipelines
Optimize cloud cost and resource utilization
Participate in on-call rotations

Requirements For Site Reliability Engineer (SRE)

Python

Kubernetes

Linux

4+ years of experience in Site Reliability Engineering, DevOps, or System Engineering
Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures
Experience with observability and monitoring tools
Proficiency in Infrastructure as Code tools
Hands-on experience with containerization and orchestration
Strong Linux system administration and networking fundamentals
Experience with incident management, debugging, and root cause analysis
Proficiency in scripting (Bash, Python, or Go)
Knowledge of load balancing, failover strategies, and distributed systems
Understanding of security best practices
Strong communication skills

Benefits For Site Reliability Engineer (SRE)

Medical Insurance

Commuter Benefits

Remote-first approach with flexible working hours
Apple hardware ecosystem for work
Flexible Paid Time Off (PTO)
Annual Bonus
Top-tier Health Insurance
Public Transportation Pass
Coverflex benefits package
Air Conference 2025 in Las Vegas