Site Reliability Engineer – AIOps

A world leader in cloud solutions using tomorrow's technology to tackle today's challenges, partnering with industry-leaders for over 40+ years.
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
8+ years of experience
AI · Enterprise SaaS · Cloud

Description For Site Reliability Engineer – AIOps

Oracle is seeking a Site Reliability Engineer specializing in AIOps to join their innovative team. This role combines traditional SRE practices with cutting-edge AI/ML technologies to enhance system reliability at scale. You'll be responsible for designing and implementing AI-powered solutions for monitoring, anomaly detection, and automated incident response across Oracle's cloud infrastructure. The position requires expertise in both machine learning and operational systems, with opportunities to work on large-scale distributed systems. You'll collaborate with cloud architects, data engineers, and SRE teams to transform reliability practices through AI innovation. The role offers exposure to state-of-the-art technologies and the chance to shape the future of cloud operations at one of the world's leading technology companies. Oracle provides comprehensive benefits, promotes work-life balance, and fosters an inclusive environment where innovation thrives. This is an excellent opportunity for experienced engineers passionate about combining AI/ML with operational excellence to make a significant impact on cloud infrastructure reliability.

Last updated 16 days ago

Responsibilities For Site Reliability Engineer – AIOps

  • Design, build, and deploy AI/ML models for analyzing large-scale monitoring and telemetry data
  • Develop algorithms for anomaly detection and predictive maintenance
  • Implement AI-powered automation for incident management
  • Design data pipelines for monitoring and log data
  • Build dashboards and visualizations for AI-driven insights
  • Partner with SRE team to align AI Ops initiatives
  • Integrate AI-driven tools into observability platforms
  • Research and implement state-of-the-art AI Ops tools
  • Mentor junior engineers in AI/ML methodologies

Requirements For Site Reliability Engineer – AIOps

Python
Kubernetes
  • 3+ years of experience in machine learning, data science, or AI-driven automation
  • Proficiency in Python, TensorFlow, PyTorch
  • Experience with cloud platforms (OCI, AWS, Azure, or GCP)
  • Knowledge of cloud monitoring tools (Prometheus, Grafana, Open Search)
  • Experience with data processing tools (Apache Kafka, Apache Spark)
  • SRE principles knowledge
  • Bachelor's or Master's degree in Computer Science, Data Science, or Engineering
  • Experience with containerization and orchestration technologies
  • Knowledge of observability and telemetry standards

Benefits For Site Reliability Engineer – AIOps

Medical Insurance
Vision Insurance
Dental Insurance
401k
Parental Leave
  • Competitive benefits package
  • Work-life balance
  • Medical, life insurance, and retirement options
  • Volunteer programs

Interested in this job?

Jobs Related To Oracle Site Reliability Engineer – AIOps

Site Reliability Developer 3

Site Reliability Developer role at Oracle focusing on cloud infrastructure, automation, and system reliability with emphasis on security and scalability.

Site Reliability Developer 3

Site Reliability Developer role at Oracle focusing on cloud infrastructure, automation, and system reliability with emphasis on security and scalability.

Site Reliability Developer 3

Oracle is hiring a Site Reliability Developer 3 to design, implement, and maintain secure, scalable infrastructure for cloud services, focusing on automation and system reliability.

Site Reliability Developer 4

Senior Site Reliability Developer position at Oracle focusing on infrastructure cloud services, automation, and system optimization.

Site Reliability Developer 4

Senior Site Reliability Engineering role at Oracle focusing on cloud infrastructure automation and reliability, offering $206K+ and comprehensive benefits in Redwood City.