Beam is an ultrafast AI inference platform that has built a groundbreaking serverless runtime capable of launching GPU-backed containers in under 1 second and scaling to thousands of GPUs. The platform serves millions of users globally and is backed by prestigious investors including Y Combinator and Tiger Global, along with notable developer-tool founders.
As a Machine Learning Engineer at Beam, you'll be at the forefront of optimizing inference performance across diverse models on their platform. Your role will focus on minimizing latency, maximizing throughput, and conducting experiments to achieve industry-leading performance. Your work will have direct impact on millions of users worldwide.
The ideal candidate should have strong experience with modern inference stacks like PyTorch, TensorRT, and vLLM, plus familiarity with AI workflows including ComfyUI and LoRA adaptors. Deep understanding of model compilation, quantization, and serving architectures is essential. You should be comfortable with GPU architectures and kernel-level optimizations, along with experience in CUDA, Triton, or similar frameworks.
The position offers competitive compensation ($120K-$200K with 0.20%-0.75% equity) and comprehensive benefits including health coverage, learning opportunities, and fitness stipends. While the team works in-person in New York City, they welcome exceptional remote candidates. This is an opportunity to join a fast-growing pre-Series A company that's building the future of ML infrastructure.