Genmo is at the forefront of AI innovation, focusing on developing cutting-edge video generation models to advance artificial general intelligence (AGI). As an ML Systems Engineer at Genmo, you'll play a crucial role in building and optimizing the infrastructure that powers their state-of-the-art models. The position combines deep technical expertise in machine learning systems with high-performance computing, requiring experience with modern ML frameworks and serving architectures.
You'll be responsible for designing and implementing sophisticated model serving systems that can handle millions of daily requests with sub-second latency. The role involves working with cutting-edge technology, including H100 GPUs, and requires expertise in model optimization techniques using tools like TensorRT and torch.compile. You'll be building automated pipelines for model deployment and optimization while ensuring robust monitoring and observability of system performance.
The ideal candidate brings 4+ years of ML engineering experience, with a strong focus on model serving at scale. You should have hands-on experience with high-performance frameworks, deep understanding of Python and PyTorch, and proven ability to optimize large-scale inference systems. Knowledge of transformer architectures and GPU optimization is essential.
Working at Genmo means being part of a team that's pushing the boundaries of what's possible in video generation and AI. The company values open-source contributions and experience with advanced serving optimizations. This is an opportunity to work on challenging technical problems while contributing to the development of next-generation AI technology. The position is based at their San Francisco headquarters, where you'll collaborate with researchers and engineers to shape the future of AI video generation.