ML Systems Engineer

Genmo

A research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI.

San Francisco, CA, USA

Machine Learning

Senior Software Engineer

In-Person

4+ years of experience

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For ML Systems Engineer

Genmo is at the forefront of AI innovation, focusing on developing cutting-edge video generation models to advance artificial general intelligence (AGI). As an ML Systems Engineer at Genmo, you'll play a crucial role in building and optimizing the infrastructure that powers their state-of-the-art models. The position combines deep technical expertise in machine learning systems with high-performance computing, requiring experience with modern ML frameworks and serving architectures.

You'll be responsible for designing and implementing sophisticated model serving systems that can handle millions of daily requests with sub-second latency. The role involves working with cutting-edge technology, including H100 GPUs, and requires expertise in model optimization techniques using tools like TensorRT and torch.compile. You'll be building automated pipelines for model deployment and optimization while ensuring robust monitoring and observability of system performance.

The ideal candidate brings 4+ years of ML engineering experience, with a strong focus on model serving at scale. You should have hands-on experience with high-performance frameworks, deep understanding of Python and PyTorch, and proven ability to optimize large-scale inference systems. Knowledge of transformer architectures and GPU optimization is essential.

Working at Genmo means being part of a team that's pushing the boundaries of what's possible in video generation and AI. The company values open-source contributions and experience with advanced serving optimizations. This is an opportunity to work on challenging technical problems while contributing to the development of next-generation AI technology. The position is based at their San Francisco headquarters, where you'll collaborate with researchers and engineers to shape the future of AI video generation.

Last updated 9 days ago

Responsibilities For ML Systems Engineer

Design and implement high-performance model serving infrastructure supporting streaming, batching, and multi-modal inputs
Build automated model compilation and optimization pipelines using TensorRT, torch.compile, and other compilers
Optimize serving systems for throughput, latency, and GPU utilization across H100 fleet
Develop monitoring and observability for model-specific metrics
Collaborate with researchers to transition models from development to production
Implement A/B testing, canary deployments, and gradual rollout strategies for models
Integrate serving layer with platform infrastructure

Requirements For ML Systems Engineer

Python

Bachelor's or Master's degree in Computer Science or related field
4+ years ML engineering experience with 2+ years focused on model serving
Production experience with high-performance model serving frameworks
Strong Python proficiency and PyTorch experience
Experience with model compilation and optimization
Track record of building inference systems at scale (10K+ QPS)
Understanding of attention mechanisms and transformer architectures
Experience with containerized deployment and orchestration