Site Reliability Engineer, AI/ML Platforms

Adobe

Adobe is a global leader in digital experiences, helping everyone from emerging artists to global brands create and deliver exceptional digital content.

San Jose, CA, USA

$133,900 - $242,000

Site Reliability

Staff Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Description For Site Reliability Engineer, AI/ML Platforms

Adobe is seeking an exceptional Site Reliability Engineer to join their AI Training and Inference Platforms team within Adobe Firefly. This is a unique opportunity to work at the intersection of Site Reliability Engineering and cutting-edge AI technology at one of the world's leading software companies.

The role focuses on building, scaling, and securing Adobe's AI Platform, which enables Firefly product teams to efficiently manage and deploy Machine Learning capabilities across Adobe's client applications. You'll be working with a team of SREs to support a platform that will handle thousands of ML models from Adobe's Applied Research groups and App Teams, spanning various lifecycle stages from early research to production deployment.

As an SRE, you'll be responsible for ensuring the platform's reliability, scalability, and efficiency across multiple cloud environments. Your work will directly impact Adobe's ability to deliver high-quality AI services to its customers, making you a crucial part of Adobe's AI innovation journey.

The ideal candidate brings a strong background in distributed systems and container orchestration, particularly with Kubernetes. You should be comfortable with programming in Python or Go, and have experience with modern DevOps practices and tools. Your expertise in observability, automation, and system reliability will be essential in maintaining and improving the platform's performance.

This role offers the opportunity to work with cutting-edge AI/ML technologies while solving complex technical challenges at scale. You'll collaborate with talented engineers across Adobe, contributing to the company's mission of changing the world through digital experiences. The position comes with competitive compensation, comprehensive benefits, and the chance to work on technology that impacts millions of users worldwide.

If you're passionate about reliability engineering, excited about AI/ML technologies, and want to work on infrastructure that powers next-generation creative tools, this role at Adobe could be your next career milestone.

Last updated 15 hours ago

Responsibilities For Site Reliability Engineer, AI/ML Platforms

Identify and implement methodologies to increase reliability, scalability, security, and efficiency
Ensure highest uptime and Quality of Service (QoS) through operational excellence
Define service level objectives (SLOs) and indicators (SLIs)
Support and maintain globally distributed, multi-cloud environments
Automate common, repeatable tasks at large scale
Identify areas to improve service resiliency through chaos engineering and performance testing
Coordinate with other Adobe platform teams and service providers

Requirements For Site Reliability Engineer, AI/ML Platforms

Kubernetes

Python

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field
5+ years relevant industry experience
Experience in building and scaling distributed systems
Production level expertise with containerization orchestration engines (Kubernetes)
Fundamental programming skills in Python, Go
Knowledge of infrastructure configuration management tools like Ansible and Terraform
Experience with observability tools like InfluxDB, Prometheus, and Elastic Stack
Understanding of AI/ML, including ML frameworks, public cloud, and commercial AI/ML solutions

Adobe

Adobe is a global leader in digital experiences, helping everyone from emerging artists to global brands create and deliver exceptional digital content.

San Jose, CA, USA

$133,900 - $242,000

Site Reliability

Staff Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Adobe Site Reliability Engineer, AI/ML Platforms

Senior Site Reliability Engineer

Microsoft

Senior Site Reliability Engineer position at Microsoft Security, focusing on building and managing critical infrastructure for red team operations with emphasis on security and automation.

Cloud Site Reliability Engineer I

Zafin

Cloud Site Reliability Engineer I position at Zafin, responsible for ensuring seamless operation of cloud infrastructure and applications.

Cloud Site Reliability Engineer II

Zafin

Lead Cloud Site Reliability Engineer position at Zafin, requiring 12+ years of experience in cloud operations, focusing on Azure infrastructure and container orchestration for banking solutions.

Lead Site Reliability Engineer (Product SRE)

Xero

Lead Site Reliability Engineer position at Xero, focusing on driving reliability, observability, and high-performing services across product teams.

Staff Site Reliability Engineer

Assured

Staff Site Reliability Engineer position at Assured, offering $180K-$210K with equity, focusing on building and scaling infrastructure for insurance claims processing platform.