AI Infrastructure Engineer - HPC

Cisco

A global technology company that designs, manufactures, and sells networking hardware, software, and telecommunications equipment.

San Jose, CA, USA

$135,600 - $171,500

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

7+ years of experience

AI · Enterprise SaaS

This job posting may no longer be active. You may be interested in these related jobs instead:

Solutions Engineer - AI Infrastructure

Cisco

Senior Solutions Engineer role at Cisco focusing on AI infrastructure, requiring 6+ years of technical presales experience and deep expertise in AI/ML technologies.

Solution Engineer - AI Infrastructure

Cisco

Senior AI Solutions Engineer role at Cisco, focusing on pre-sales engineering and AI infrastructure implementation, requiring 6+ years of technical experience and deep AI expertise.

Solutions Engineer - AI Infrastructure

Cisco

Senior Solutions Engineer role at Cisco focusing on AI infrastructure, requiring 6+ years of technical pre-sales experience and deep knowledge of AI/ML technologies.

Senior Machine Learning Engineer - Network Platform DS- Meraki

Cisco

Senior Machine Learning Engineer position at Cisco Meraki, focusing on AI/ML solutions for cloud-managed IT infrastructure, offering competitive compensation and comprehensive benefits.

Solutions Engineer - AI Infrastructure, Service Provider

Cisco

Senior Solutions Engineer role at Cisco focusing on AI infrastructure and pre-sales, requiring 6+ years experience and deep expertise in AI/ML technologies.

Description For AI Infrastructure Engineer - HPC

Cisco is seeking an experienced AI Infrastructure Engineer to join their Information Technology team. This role combines technical leadership with hands-on engineering in building and managing enterprise-scale AI infrastructure. The position focuses on developing and expanding Cisco's artificial intelligence platform using NVIDIA DGX and Cisco-UCS technologies.

The ideal candidate will have 7+ years of experience in deploying and administrating HPC clusters, with expertise in GPU resource scheduling, hybrid cloud technologies, and automation tools. They will lead technical initiatives in designing and implementing GPU compute clusters for deep learning and high-performance computing workloads.

Key responsibilities include:

Leading and motivating teams while managing complex AI infrastructure
Building and supporting NVIDIA & Cisco UCS based artificial intelligence platforms
Automating configuration management and system maintenance using DevOps tools
Optimizing system performance and identifying architectural improvements
Collaborating with internal Cisco Business Units and cross-functional teams

The position offers competitive compensation ranging from $135,600 to $171,500 USD, along with comprehensive benefits including medical, dental, vision insurance, 401(k) with company matching, and flexible vacation policies. This is an opportunity to work with cutting-edge AI technologies while contributing to Cisco's mission of powering an inclusive future.

The role is based in either RTP, North Carolina or San Jose, California, and requires deep technical expertise combined with strong leadership and communication skills. The successful candidate will join a dynamic team focused on innovation and transforming how Cisco operates through advanced technology solutions.

Last updated 2 days ago

Responsibilities For AI Infrastructure Engineer - HPC

Technical leadership in building and managing AI infrastructure
Design and implement GPU compute clusters for deep learning workloads
Plan, build, and install/upgrade NVIDIA DGX and Cisco UCS systems
Automate configuration management and system maintenance
Lead advancement of AI platforms and practices
Evaluate and optimize system performance
Administer Linux systems and GPU-enabled servers
Collaborate with internal teams and business units
Create technical documentation and presentations
Monitor system health and availability
Design tools for proactive issue detection

Requirements For AI Infrastructure Engineer - HPC

Python

Linux

Kubernetes

7+ years of experience deploying and administrating HPC clusters
Familiar with GPU resource scheduling managers (Slurm, Kubernetes, RunAI)
Proficient in Hybrid Cloud, Virtualization, and Container technologies
Experience with provisioning tools like Base Command Manager, Warewulf, Satellite, Ironic
Experience with Agile and DevOps operating models
Experience with automation tools (Ansible, SaltStack, Puppet, Chef)
Proficient in programming languages (Python, GoLang, Bash, C/C++)
Deep understanding of operating systems, computer networks, and high-performance applications

Benefits For AI Infrastructure Engineer - HPC

401k

Medical Insurance

Dental Insurance

Vision Insurance

Parental Leave

Up to 16 days vacation for non-exempt employees
Flexible vacation policy for exempt employees
12 paid holidays per year
80 hours sick time
Wellbeing offerings
Life insurance
Short and long-term disability coverage

Cisco

A global technology company that designs, manufactures, and sells networking hardware, software, and telecommunications equipment.

San Jose, CA, USA

$135,600 - $171,500

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

7+ years of experience

AI · Enterprise SaaS

Interested in this job?