Taro Logo

Principal Software Engineer-distributed training system

Microsoft is a leading technology company providing state-of-the-art online advertising platforms and services.
Principal Software Engineer
Hybrid
6+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Principal Software Engineer-distributed training system

MAI Ads team in Microsoft APRD is responsible for providing the advertising industry with the state-of-the-art online advertising platform and service. Our team is at the core of this effort, working on the following research & development: Selection(recall), Relevance, User Response Prediction (Click Prediction and Conversion prediction), Autobidding, Large Language Model and Large Scale Machine Learning & Serving System. The team is a world-class R&D team of passionate and talented scientists and engineers who aspire to solve challenging problems and turn innovative ideas into high-quality products and services that can help hundreds of millions of users and advertisers, and directly impact our business.

As a Principal Software Engineer for the distributed training system, you will:

• Design and implement distributed training system for trillion parameter machine learning models. • Drive our team efforts around utilization and optimization of training and inference on GPUs. • Design and implement streaming training and publish of trillion parameter machine learning models. • Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions. • Collaborate with cross-functional teams to deliver high-quality solutions.

Qualifications: • Bachelor, Master, PhD degree in CS/EE or related areas is required. • 6+ years of industry experiences in software engineering. • Solid experience of shipping high performance C++, CUDA, python, C#, or equivalent language code. • Experience with machine learning and TensorFlow/PyTorch distributed training is preferred. • Domain knowledge of ads, search or content services is a plus. • Quick learning and solid problem solving and debugging skills. • Good communication skill, fluent in English (both oral and written).

Microsoft offers industry-leading healthcare, educational resources, discounts on products and services, savings and investments opportunities, maternity and paternity leave, generous time away, giving programs, and opportunities to network and connect. Microsoft is an equal opportunity employer and welcomes applications from diverse backgrounds.

Last updated 8 months ago

Responsibilities For Principal Software Engineer-distributed training system

  • Design and implement distributed training system for trillion parameter machine learning models
  • Drive team efforts around utilization and optimization of training and inference on GPUs
  • Design and implement streaming training and publish of trillion parameter machine learning models
  • Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions
  • Collaborate with cross-functional teams to deliver high-quality solutions

Requirements For Principal Software Engineer-distributed training system

Python
  • Bachelor, Master, PhD degree in CS/EE or related areas
  • 6+ years of industry experiences in software engineering
  • Solid experience of shipping high performance C++, CUDA, python, C#, or equivalent language code
  • Experience with machine learning and TensorFlow/PyTorch distributed training (preferred)
  • Domain knowledge of ads, search or content services (a plus)
  • Quick learning and solid problem solving and debugging skills
  • Good communication skill, fluent in English (both oral and written)

Benefits For Principal Software Engineer-distributed training system

Medical Insurance
Education Budget
Parental Leave
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?