HPC Engineer, AI Infrastructure

Tesla is a leading electric vehicle and clean energy company, known for its innovative approach to sustainable transportation and energy solutions.
$120,000 - $300,000
Cloud
Senior Software Engineer
In-Person
5,000+ Employees
3+ years of experience
AI · Automotive

Description For HPC Engineer, AI Infrastructure

Tesla's Supercomputing/AI infrastructure team is seeking an HPC Engineer to support and improve their AI/ML cluster infrastructure. This role is crucial for maintaining and enhancing the platform that enables Tesla's Full-Self-Driving (FSD), Tesla Bot, and Dojo engineering teams to be productive.

As an HPC Engineer, you will be responsible for:

  • Managing and operating AI infrastructure
  • Monitoring compute/GPU/network metrics
  • Linux troubleshooting & performance tuning
  • Collaborating with the Data Center team to coordinate server operations
  • Facilitating neural network training at scale
  • Streamlining FSD development
  • Enabling Dojo to become the most powerful supercomputer

Key responsibilities include:

  • Supporting AI/ML cluster infrastructure on GPU and Dojo platforms
  • Improving monitoring & self-healing pipelines and security posture
  • Optimizing server, storage, and network performance
  • Managing HPC clusters, workloads, and applications
  • Automation and systems engineering
  • Participating in 24x7 on-call rotation

The ideal candidate will have:

  • Proficiency in Python or Bash scripting
  • Strong Linux & network fundamentals
  • Experience with configuration management software and systems monitoring
  • Knowledge of high-throughput low-latency networks and GPU-based computing systems
  • Familiarity with Slurm, LSF, and parallel file systems
  • A Bachelor's Degree in a relevant field or exceptional skills
  • 3+ years of related experience

This role offers a competitive salary range of $120,000 - $300,000 annually, plus cash and stock awards, and a comprehensive benefits package. Join Tesla in pushing the boundaries of AI and autonomous technology!

Last updated 2 days ago

Responsibilities For HPC Engineer, AI Infrastructure

  • Support the AI/ML cluster infrastructure on both GPU and Dojo platforms
  • Improve monitoring & self-healing pipelines, as well as security posture
  • Work with hardware and storage vendors to tune and optimize server, storage and network performance
  • Performance tuning & OS provisioning on Linux systems
  • Manage HPC clusters, workloads and applications
  • Automation and systems engineering
  • Participate in 24x7 on-call rotation

Requirements For HPC Engineer, AI Infrastructure

Python
Linux
Kubernetes
  • Proficiency with scripting languages such as Python or Bash
  • Proficiency with Linux & network fundamentals
  • Experience with configuration management software, systems monitoring & alerting is a plus
  • Experience with high-throughput low-latency networks, GPU-based computing systems, and/or high performance storage systems is a plus
  • Experience with Slurm, LSF and storage management of parallel file systems is a plus
  • Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or proof of exceptional skills in related field
  • 3+ years of additional equivalent experience or evidence of exceptional ability related to the position

Benefits For HPC Engineer, AI Infrastructure

Medical Insurance
Dental Insurance
Vision Insurance
401k
Equity
Parental Leave
Commuter Benefits
  • Medical Insurance
  • Dental Insurance
  • Vision Insurance
  • 401k
  • Employee Stock Purchase Plans
  • Life Insurance
  • AD&D Insurance
  • Short-term Disability
  • Long-term Disability
  • Employee Assistance Program
  • Paid Time Off
  • Paid Holidays
  • Back-up Childcare
  • Parenting Support Resources
  • Critical Illness Insurance
  • Hospital Indemnity
  • Accident Insurance
  • Legal Services
  • Pet Insurance
  • Weight Loss Program
  • Tobacco Cessation Program
  • Tesla Babies Program
  • Commuter Benefits
  • Employee Discounts

Interested in this job?

Jobs Related To Tesla HPC Engineer, AI Infrastructure

Technical Account Manager

AWS Technical Account Manager: Trusted advisor and cloud operations architect for Enterprise Support customers, driving cloud adoption and customer success.

Regional Environmental Engineer, AWS Environmental

Lead environmental regulatory compliance for AWS data centers as a Regional Environmental Engineer, advancing performance culture and improving adherence.

Solutions Architect - AWS, Cross Industry-Manufacturing & Automotive

Join AWS as an Enterprise Solutions Architect to design scalable cloud solutions, drive revenue growth, and accelerate AWS adoption in India.

Cloud Software Engineer - Biophotonics

Senior Cloud Software Engineer for Apple's Biophotonics team, developing innovative solutions from firmware to cloud-based applications.

Senior Software Engineer

Senior Software Engineer role at Microsoft's Industry Solutions Engineering team, working on cloud-based solutions with customers.