Microsoft's Azure Machine Learning team is seeking a Principal Software Engineer to join their Inference team, working directly with OpenAI to host models efficiently on Azure. This role is at the forefront of serving all internal and external OpenAI workloads, currently handling millions of requests per day for Microsoft and 3P Copilots.
The position focuses on optimizing Large Language Models (LLMs) and Diffusion models for high-scale, low-latency inference. You'll be working with cutting-edge hardware and software stacks, implementing complex inferencing capabilities, and supporting production inference SLAs on one of the world's largest GPU fleets.
This is an excellent opportunity for experienced engineers passionate about machine learning infrastructure and optimization. The role offers competitive compensation ($137,600 - $267,000 base salary range), comprehensive benefits, and the chance to work on technology that's shaping the future of AI deployment at scale.
Key responsibilities include collaborating with OpenAI, implementing inferencing capabilities, optimizing performance, and maintaining high-availability services. The ideal candidate will have strong C++ and Python experience, with additional knowledge in ML, online services, and potentially CUDA or Rust.
Microsoft offers a collaborative, growth-minded environment with industry-leading benefits, including healthcare, educational resources, investment options, and generous parental leave. The position allows up to 100% work from home with 0-25% travel requirements, providing excellent work-life balance while working on transformative AI technology.