Sr Staff Engineer, ML Hardware Infrastructure and Performance

LinkedIn is the world's largest professional network, built to create economic opportunity for every member of the global workforce.
$149,000 - $247,000
Machine Learning
Staff Software Engineer
Hybrid
5,000+ Employees
8+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Staff Software Engineer, AI Platform

Staff Software Engineer position at LinkedIn focusing on AI Platform development, including model training infrastructure, feature engineering, and model serving at scale.

AI Engineer/Lead AI Engineer

Lead AI Engineer position at Salesforce focusing on developing cutting-edge AI solutions and pro-code agents on the Agentforce platform, requiring 5+ years of experience in data science and AI/ML.

Staff Software Engineer

Staff Software Engineer role at Intuit focusing on AI-powered developer tools and productivity enhancement.

Software Engineering SMTS ( AI Engineer)

Senior Machine Learning Technical Staff position focused on AI Engineering and LLM implementation at Salesforce, requiring expertise in prompt engineering and AI systems.

Lead Machine Learning Engineer

Lead Machine Learning Engineer role at Disney focusing on security and anomaly detection, offering competitive compensation and the opportunity to work with cutting-edge ML technologies.

Description For Sr Staff Engineer, ML Hardware Infrastructure and Performance

LinkedIn, the world's largest professional network, is seeking a Senior Staff Engineer to lead their ML Hardware Infrastructure and Performance team. This role represents a unique opportunity to shape the future of AI infrastructure at scale.

The position focuses on designing and maintaining large-scale GPU infrastructure for machine learning and AI workloads. As the technical leader, you'll be responsible for crucial decisions regarding hardware selection, architecture design, and operational excellence of LinkedIn's ML platform.

The role requires deep expertise in GPU computing, high-performance networking, and distributed systems. You'll work with cutting-edge technologies including latest GPU architectures, high-speed interconnects, and advanced storage systems. The position demands both technical depth and leadership skills, as you'll be guiding teams and influencing strategic direction.

Key aspects of the role include optimizing GPU server configurations, implementing high-throughput networking solutions, and ensuring reliable operation of production ML infrastructure. You'll collaborate with data scientists and ML engineers to understand and support their needs while maintaining system efficiency and reliability.

This hybrid position is based in Mountain View, CA, offering competitive compensation ($149,000-$247,000) and comprehensive benefits. The role requires 8+ years of experience in large-scale distributed systems, with significant focus on GPU-based ML workloads.

LinkedIn offers a culture built on trust, care, and inclusion, with opportunities for professional growth and impact. You'll be part of a team transforming how the world works through AI technology, while enjoying the stability and resources of a leading tech company.

Last updated 18 days ago

Responsibilities For Sr Staff Engineer, ML Hardware Infrastructure and Performance

  • Define and evaluate GPU server SKUs, memory configurations, and CPU-GPU ratios
  • Design high-throughput, low-latency networking solutions for distributed training
  • Lead end-to-end operational support for production GPU fleets
  • Develop and maintain observability tooling
  • Establish SLAs, SLOs, and best practices
  • Partner with Data Scientists and ML Engineers to understand workload requirements
  • Guide junior engineers in architecture best practices
  • Influence long-term roadmap for ML/AI infrastructure

Requirements For Sr Staff Engineer, ML Hardware Infrastructure and Performance

Kubernetes
Linux
  • Bachelor's degree in Computer Science, Electrical Engineering, or related field or equivalent industry experience
  • 8+ years of experience designing and managing large-scale, distributed systems or HPC environments
  • 3+ years focused on GPU-based ML or AI workloads
  • Deep knowledge of high-performance networking and parallel storage systems
  • Experience with containerization and job schedulers
  • Strong communication skills
  • Masters degree preferred

Benefits For Sr Staff Engineer, ML Hardware Infrastructure and Performance

Medical Insurance
Dental Insurance
Vision Insurance
401k
  • Flexible work arrangements
  • Comprehensive health benefits
  • Professional development opportunities
  • 401k benefits

Interested in this job?