Taro Logo

AI Infrastructure Engineer - HPC

A global technology company that designs, manufactures, and sells networking hardware, software, and telecommunications equipment.
$135,600 - $171,500
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
7+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Solutions Engineer - AI Infrastructure

Senior Solutions Engineer role at Cisco focusing on AI infrastructure, requiring 6+ years of technical presales experience and deep expertise in AI/ML technologies.

Solution Engineer - AI Infrastructure

Senior AI Solutions Engineer role at Cisco, focusing on pre-sales engineering and AI infrastructure implementation, requiring 6+ years of technical experience and deep AI expertise.

Solutions Engineer - AI Infrastructure

Senior Solutions Engineer role at Cisco focusing on AI infrastructure, requiring 6+ years of technical pre-sales experience and deep knowledge of AI/ML technologies.

Senior Machine Learning Engineer - Network Platform DS- Meraki

Senior Machine Learning Engineer position at Cisco Meraki, focusing on AI/ML solutions for cloud-managed IT infrastructure, offering competitive compensation and comprehensive benefits.

Solutions Engineer - AI Infrastructure, Service Provider

Senior Solutions Engineer role at Cisco focusing on AI infrastructure and pre-sales, requiring 6+ years experience and deep expertise in AI/ML technologies.

Description For AI Infrastructure Engineer - HPC

Cisco is seeking an experienced AI Infrastructure Engineer to join their Information Technology team. This role combines technical leadership with hands-on engineering in building and managing enterprise-scale AI infrastructure. The position focuses on developing and expanding Cisco's artificial intelligence platform using NVIDIA DGX and Cisco-UCS technologies.

The ideal candidate will have 7+ years of experience in deploying and administrating HPC clusters, with expertise in GPU resource scheduling, hybrid cloud technologies, and automation tools. They will lead technical initiatives in designing and implementing GPU compute clusters for deep learning and high-performance computing workloads.

Key responsibilities include:

  • Leading and motivating teams while managing complex AI infrastructure
  • Building and supporting NVIDIA & Cisco UCS based artificial intelligence platforms
  • Automating configuration management and system maintenance using DevOps tools
  • Optimizing system performance and identifying architectural improvements
  • Collaborating with internal Cisco Business Units and cross-functional teams

The position offers competitive compensation ranging from $135,600 to $171,500 USD, along with comprehensive benefits including medical, dental, vision insurance, 401(k) with company matching, and flexible vacation policies. This is an opportunity to work with cutting-edge AI technologies while contributing to Cisco's mission of powering an inclusive future.

The role is based in either RTP, North Carolina or San Jose, California, and requires deep technical expertise combined with strong leadership and communication skills. The successful candidate will join a dynamic team focused on innovation and transforming how Cisco operates through advanced technology solutions.

Last updated 2 days ago

Responsibilities For AI Infrastructure Engineer - HPC

  • Technical leadership in building and managing AI infrastructure
  • Design and implement GPU compute clusters for deep learning workloads
  • Plan, build, and install/upgrade NVIDIA DGX and Cisco UCS systems
  • Automate configuration management and system maintenance
  • Lead advancement of AI platforms and practices
  • Evaluate and optimize system performance
  • Administer Linux systems and GPU-enabled servers
  • Collaborate with internal teams and business units
  • Create technical documentation and presentations
  • Monitor system health and availability
  • Design tools for proactive issue detection

Requirements For AI Infrastructure Engineer - HPC

Python
Linux
Kubernetes
  • 7+ years of experience deploying and administrating HPC clusters
  • Familiar with GPU resource scheduling managers (Slurm, Kubernetes, RunAI)
  • Proficient in Hybrid Cloud, Virtualization, and Container technologies
  • Experience with provisioning tools like Base Command Manager, Warewulf, Satellite, Ironic
  • Experience with Agile and DevOps operating models
  • Experience with automation tools (Ansible, SaltStack, Puppet, Chef)
  • Proficient in programming languages (Python, GoLang, Bash, C/C++)
  • Deep understanding of operating systems, computer networks, and high-performance applications

Benefits For AI Infrastructure Engineer - HPC

401k
Medical Insurance
Dental Insurance
Vision Insurance
Parental Leave
  • Up to 16 days vacation for non-exempt employees
  • Flexible vacation policy for exempt employees
  • 12 paid holidays per year
  • 80 hours sick time
  • Wellbeing offerings
  • Life insurance
  • Short and long-term disability coverage

Interested in this job?