System Development Engineer, Alexa Language and Data Ops

Amazon is a global technology company that develops and maintains industry-leading multi-modal and multi-lingual large language models (LLM) through its Artificial General Intelligence (AGI) team.
DevOps
Mid-Level Software Engineer
Contact Company
5,000+ Employees
3+ years of experience
AI
This job posting may no longer be active. You may be interested in these related jobs instead:
Controls Engineer, North Asia, APAC Controls Deployment & Service

Controls Engineer position at AWS in Tokyo, focusing on data center automation systems deployment and maintenance, requiring technical expertise and project management skills.

System Development Engineer II, Network Availability Engineering

System Development Engineer II position at AWS focusing on building and maintaining network monitoring software for one of the world's largest cloud infrastructure networks.

System Development Engineer, Seller Fulfillment Services

System Development Engineer role at Amazon supporting seller fulfillment services, requiring 3+ years experience in systems engineering and development.

Technical Support Engineer, Technical Incident Management and Engineering

Technical Support Engineer role at Amazon supporting robotics systems in fulfillment centers, offering competitive pay and benefits with focus on automation and customer support.

System Dev Engineer II, Kuiper ANCHOR

System Dev Engineer II role at Amazon's Project Kuiper, focusing on satellite network infrastructure and systems automation.

Description For System Development Engineer, Alexa Language and Data Ops

The Artificial General Intelligence (AGI) team at Amazon is seeking passionate, talented, and inventive engineers to play a pivotal role in the development and maintenance of industry-leading multi-modal and multi-lingual large language models (LLM). The AGI team's mission is to leverage hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI foundational models that revolutionize machine perception, interpretation, and interaction with humans and the physical world.

Key responsibilities include:

  • Providing support for cluster and node management to ensure smooth operation of LLM infrastructure
  • Continuously improving and automating cluster/capacity/maintenance upgrades
  • Developing automation tools for improving operational excellence
  • Working on operations and maintenance-driven coding projects, primarily in Ruby, Rails, Java, Python, or shell scripts, AWS, and web technologies
  • Hands-on experience with Kubernetes and expertise in different AWS services
  • Driving company-wide campaigns with Support and Engineering teams
  • Participating in design and code reviews and identifying bottlenecks
  • Troubleshooting and researching root causes thoroughly to resolve defects

The ideal candidate should have:

  • 3+ years of administrative experience in networking, storage systems, operating systems, and hands-on systems engineering
  • Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, or Rust
  • Experience with Linux/Unix
  • Experience with CI/CD pipelines and build processes
  • Preferred: Experience with distributed systems at scale

Amazon values a "Work Hard. Have Fun. Make History" approach, with a strong focus on sharing learning experiences from the front line with development teams. The role offers various opportunities for growth and specialization, whether you prefer mastering a domain, juggling multiple tasks, implementing process improvements, or focusing on coding.

Join the AGI team at Amazon to be at the forefront of AI innovation and contribute to the development of cutting-edge language models and AI technologies.

Last updated 23 days ago

Responsibilities For System Development Engineer, Alexa Language and Data Ops

  • Provide support for cluster and node management, ensuring smooth operation of LLM infrastructure
  • Continuously improve and automate cluster/capacity/maintenance upgrades
  • Develop automation tools for improving operational excellence
  • Work on operations and maintenance driven coding projects
  • Drive company-wide campaigns with Support and Engineering teams
  • Participate in design and code reviews and identify bottlenecks
  • Troubleshoot and research root causes thoroughly and resolve defects

Requirements For System Development Engineer, Alexa Language and Data Ops

Python
Ruby
Java
Kubernetes
  • 3+ years of administrative experience in networking, storage systems, operating systems, and hands-on systems engineering
  • Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, or Rust
  • Experience with Linux/Unix
  • Experience with CI/CD pipelines and build processes
  • Hands-on experience with Kubernetes
  • Expertise in different AWS services

Interested in this job?