Together AI is seeking a Machine Learning Platform Engineer to join their innovative team in San Francisco. This role is crucial in enabling custom models and dedicated inference on Together's platform, with a focus on optimizing autoscaling, minimizing cold starts, and achieving optimal model performance.
The position requires a seasoned professional with 5+ years of experience in building large-scale distributed systems. You'll be working at the intersection of machine learning infrastructure and platform engineering, utilizing technologies like Kubernetes, Terraform, and various programming languages including Go, Rust, and Python.
As a Platform Engineer, you'll be responsible for critical infrastructure components including multi-cluster orchestration, predictive autoscaling, and API development. The role offers an opportunity to work on cutting-edge AI infrastructure, contributing to Together AI's mission of making AI systems more accessible and cost-effective.
The company has made significant contributions to open-source research, including developments like FlashAttention, Hyena, FlexGen, and RedPajama. They offer a competitive compensation package ranging from $160,000 to $250,000, plus equity and comprehensive benefits.
This is an excellent opportunity for experienced engineers who are passionate about AI infrastructure and want to make a meaningful impact in the field of artificial intelligence. The hybrid work environment requires four days per week in the SF office, providing a balance between collaborative in-person work and flexibility.
The ideal candidate will have strong expertise in distributed systems, excellent understanding of operating systems concepts, and proven experience with containerization and infrastructure as code. You'll be joining a research-driven company that values open and transparent AI systems, working alongside passionate researchers and engineers to advance the frontier of AI technology.