Sr Staff Engineer, ML Hardware Infrastructure and Performance

LinkedIn is the world's largest professional network, built to create economic opportunity for every member of the global workforce.

Mountain View, CA, USA

$149,000 - $247,000

Machine Learning

Staff Software Engineer

Hybrid

5,000+ Employees

8+ years of experience

AI · Enterprise SaaS

This job posting may no longer be active. You may be interested in these related jobs instead:

Staff Software Engineer, AI Platform

Staff Software Engineer position at LinkedIn focusing on AI Platform development, including model training infrastructure, feature engineering, and model serving at scale.

AI Engineer/Lead AI Engineer

Salesforce

Lead AI Engineer position at Salesforce focusing on developing cutting-edge AI solutions and pro-code agents on the Agentforce platform, requiring 5+ years of experience in data science and AI/ML.

Staff Software Engineer

Intuit

Staff Software Engineer role at Intuit focusing on AI-powered developer tools and productivity enhancement.

Software Engineering SMTS ( AI Engineer)

Salesforce

Senior Machine Learning Technical Staff position focused on AI Engineering and LLM implementation at Salesforce, requiring expertise in prompt engineering and AI systems.

Lead Machine Learning Engineer

Disney

Lead Machine Learning Engineer role at Disney focusing on security and anomaly detection, offering competitive compensation and the opportunity to work with cutting-edge ML technologies.

Description For Sr Staff Engineer, ML Hardware Infrastructure and Performance

LinkedIn, the world's largest professional network, is seeking a Senior Staff Engineer to lead their ML Hardware Infrastructure and Performance team. This role represents a unique opportunity to shape the future of AI infrastructure at scale.

The position focuses on designing and maintaining large-scale GPU infrastructure for machine learning and AI workloads. As the technical leader, you'll be responsible for crucial decisions regarding hardware selection, architecture design, and operational excellence of LinkedIn's ML platform.

The role requires deep expertise in GPU computing, high-performance networking, and distributed systems. You'll work with cutting-edge technologies including latest GPU architectures, high-speed interconnects, and advanced storage systems. The position demands both technical depth and leadership skills, as you'll be guiding teams and influencing strategic direction.

Key aspects of the role include optimizing GPU server configurations, implementing high-throughput networking solutions, and ensuring reliable operation of production ML infrastructure. You'll collaborate with data scientists and ML engineers to understand and support their needs while maintaining system efficiency and reliability.

This hybrid position is based in Mountain View, CA, offering competitive compensation ($149,000-$247,000) and comprehensive benefits. The role requires 8+ years of experience in large-scale distributed systems, with significant focus on GPU-based ML workloads.

LinkedIn offers a culture built on trust, care, and inclusion, with opportunities for professional growth and impact. You'll be part of a team transforming how the world works through AI technology, while enjoying the stability and resources of a leading tech company.

Last updated 18 days ago

Responsibilities For Sr Staff Engineer, ML Hardware Infrastructure and Performance

Define and evaluate GPU server SKUs, memory configurations, and CPU-GPU ratios
Design high-throughput, low-latency networking solutions for distributed training
Lead end-to-end operational support for production GPU fleets
Develop and maintain observability tooling
Establish SLAs, SLOs, and best practices
Partner with Data Scientists and ML Engineers to understand workload requirements
Guide junior engineers in architecture best practices
Influence long-term roadmap for ML/AI infrastructure

Requirements For Sr Staff Engineer, ML Hardware Infrastructure and Performance

Kubernetes

Linux

Bachelor's degree in Computer Science, Electrical Engineering, or related field or equivalent industry experience
8+ years of experience designing and managing large-scale, distributed systems or HPC environments
3+ years focused on GPU-based ML or AI workloads
Deep knowledge of high-performance networking and parallel storage systems
Experience with containerization and job schedulers
Strong communication skills
Masters degree preferred

Benefits For Sr Staff Engineer, ML Hardware Infrastructure and Performance

Medical Insurance

Dental Insurance

Vision Insurance

401k

Flexible work arrangements
Comprehensive health benefits
Professional development opportunities
401k benefits

LinkedIn is the world's largest professional network, built to create economic opportunity for every member of the global workforce.

Mountain View, CA, USA

$149,000 - $247,000

Machine Learning

Staff Software Engineer

Hybrid

5,000+ Employees

8+ years of experience

AI · Enterprise SaaS

Interested in this job?