Taro Logo

Senior Systems Engineer, Artificial Intelligence Operations

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology.
DevOps
Staff Software Engineer
In-Person
5,000+ Employees
12+ years of experience
AI · Enterprise SaaS

Job Description

NVIDIA, a global leader in accelerated computing and AI technology, is seeking a Senior Systems Engineer for their AI Operations team. This role combines deep technical expertise in networking, automation, and AI infrastructure with customer-facing responsibilities. The position offers an opportunity to work on cutting-edge AI platforms and help build resilient operations for AI clusters.

The ideal candidate will be at the intersection of DevOps, AI infrastructure, and customer success, working to improve the reliability and performance of AI computing environments. You'll be responsible for developing automated workflows, conducting root cause analysis, and collaborating across teams to enhance product offerings.

This role is perfect for someone with extensive networking experience who wants to work at the forefront of AI technology. You'll be part of NVIDIA's mission to transform computing through AI, working with state-of-the-art technology and contributing to solutions that power the next generation of AI infrastructure.

The position offers the chance to work with a diverse, supportive team at a company that has been innovating in computer graphics, PC gaming, and accelerated computing for over 25 years. You'll be immersed in an environment that values technical excellence, innovation, and collaborative problem-solving.

Working at NVIDIA means being part of a team that's defining the future of computing, particularly in AI and digital twins technology. The company's commitment to pushing technological boundaries and solving complex challenges makes this an exciting opportunity for someone looking to make a significant impact in the field of AI operations.

Last updated 15 hours ago

Responsibilities For Senior Systems Engineer, Artificial Intelligence Operations

  • Understand internal and external customer requirements to improve AI cluster resiliency
  • Design AIOps-based solutions
  • Develop automated workflows for issue detection and root cause analysis
  • Collaborate with operators to debug sophisticated, full-stack AI cluster problems
  • Deliver technical presentations and lead hands-on demos or training
  • Handle evaluation deployments (POC/POV)
  • Ensure smooth, reliable installations

Requirements For Senior Systems Engineer, Artificial Intelligence Operations

Python
Linux
  • Bachelor of Science or equivalent experience
  • 12+ years of networking experience in enterprise or service provider environments
  • Strong hands-on expertise in routing and switching
  • Proficient in scripting and automation using Python or similar languages
  • Strong Linux expertise
  • Proven experience working directly with customers to resolve issues
  • Exceptional oral, written, and presentation skills
  • Demonstrated ability to collaborate effectively across teams

Related Jobs