Software Development and DevOps Engineer, EFA

AWS Utility Computing (UC) provides product innovations from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS's services and features apart in the industry.
Haifa, Israel
DevOps
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Enterprise SaaS

Description For Software Development and DevOps Engineer, EFA

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS.

We seek a DevOps Engineer for the Machine Learning (ML) Infrastructure team to build the tools that are used to guarantee top performance of AWS ML and High Performance Computing (HPC) technologies developed by our organization. You will:

  • Be the lead engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale.
  • Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers.
  • Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads.
  • Use AWS Managed Grafana, Quicksight, OpenSearch, and Athena to digest performance data and create dashboards.
  • Invent automatic mechanisms to alert developers to functional and performance regressions.
  • Manage complex infrastructure covering many instance types, software stacks, and Linux operating systems.
  • Ensure all infrastructure setup is code (IaC), reviewed and committed to automated pipelines.
  • Find innovative ways to schedule work using Jenkins, supporting the development team while keeping cluster costs down.
  • Review dashboard and automation results, triage failures, and introduce new tests and platforms.
  • Create reports and status updates of the CI/CD system for stakeholders.

Join us as we expand the AWS offerings for AI, including Trainium, Neuron, and the Elastic Fabric Adapter (EFA).

Last updated 5 days ago

Responsibilities For Software Development and DevOps Engineer, EFA

  • Lead engineering team for infrastructure monitoring and reporting
  • Automate software delivery using Amazon CI/CD tools and AWS products
  • Develop Python code for large cluster management and benchmarking
  • Create dashboards using AWS Managed Grafana, Quicksight, OpenSearch, and Athena
  • Implement automatic alerting mechanisms for regressions
  • Manage complex infrastructure across various instance types and software stacks
  • Implement Infrastructure as Code (IaC) practices
  • Optimize work scheduling using Jenkins
  • Review and triage automation results
  • Report on CI/CD system status to stakeholders

Requirements For Software Development and DevOps Engineer, EFA

Python
Linux
  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 3+ years of full software development life cycle experience
  • Bachelor's degree in computer science or equivalent
  • 3+ years experience coding in Python
  • Experience developing highly automated CI/CD pipelines (Jenkins preferred)

Interested in this job?

Jobs Related To Amazon Software Development and DevOps Engineer, EFA

Senior DevOps/Linux Systems Engineer

Senior DevOps/Linux Systems Engineer at Freeform, building advanced IT infrastructure for 3D printing factories.

DevOps Engineer

DevOps Engineer role at Betty, an innovative iCasino and mobile gaming company in Sofia, Bulgaria.

DevOps- Senior DevOps Engineer

Senior DevOps Engineer role at Paytm, guiding teams in CI/CD and tackling technical challenges in India's leading digital payments company.

Senior Systems Reliability Engineer

Senior Systems Reliability Engineer at DRW: Design and support highly available systems for global research and trading of FICCO and Cryptoassets.

Senior DevOps Engineer

Senior DevOps Engineer needed at WillowTree to design, implement, and maintain cloud-based DevOps pipelines for world-class digital products.