Taro Logo

Datacenter Resiliency Architect - New College Grad 2025

World leader in accelerated computing, pioneering AI and digital twins technology to transform industries.
$120,000 - $235,750
Backend
Entry-Level Software Engineer
In-Person
5,000+ Employees
AI · Enterprise SaaS

Job Description

NVIDIA, the pioneering force behind GPU technology and AI computing, is seeking a Datacenter Resiliency Architect for their New College Graduate program 2025. This role sits at the intersection of hardware and software development, focusing on GPU and datacenter resilience. As part of NVIDIA's Accelerated and Resilient Compute Systems team, you'll be working on cutting-edge technology that powers AI, machine learning, and high-performance computing solutions.

The position offers an exciting opportunity to impact the development of industry-leading Datacenter GPUs and SOCs. You'll be responsible for architecting hardware and software resiliency features, analyzing system reliability metrics, and developing comprehensive verification solutions. The role combines aspects of both hardware and software engineering, requiring expertise in GPU architectures, RAS concepts, and programming languages like C++, Python, and CUDA.

This is an ideal position for recent graduates with advanced degrees who are passionate about high-performance computing and system reliability. NVIDIA offers competitive compensation, with base salary ranging from $120,000 to $235,750, plus equity and comprehensive benefits. The company's culture emphasizes innovation, collaboration, and pushing technological boundaries, making it an excellent environment for launching your career in technical architecture and system design.

Working at NVIDIA's Santa Clara headquarters, you'll be at the heart of Silicon Valley's tech ecosystem, collaborating with world-class engineers and researchers. The company's commitment to advancing AI and accelerated computing makes this an exceptional opportunity to contribute to technology that's transforming industries and society.

Last updated a month ago

Responsibilities For Datacenter Resiliency Architect - New College Grad 2025

  • Architect hardware and software Resiliency features to improve system Reliability, Availability, Serviceability (RAS)
  • Model and analyze RAS metrics for permanent and transient errors
  • Collaborate with architects, unit designers and software engineers
  • Develop and implement architecture verification testplans
  • Execute Architecture Testplan and debug tests
  • Analyze Architectural Vulnerability Factor and Liveness of on-die memory
  • Develop CUDA software diagnostics kernels
  • Develop and automate fault models

Requirements For Datacenter Resiliency Architect - New College Grad 2025

Python
  • Master's or PhD degree in Computer Engineering, Electrical Engineering or related field
  • Familiarity with GPU and Networking Architectures
  • Proficiency in RAS concepts and Architecture models
  • Scripting and automation with Python
  • Proficiency in C/C++
  • Excellent interpersonal skills
  • Strong debugging and analytical skills
  • Self-driven and results oriented

Benefits For Datacenter Resiliency Architect - New College Grad 2025

Equity
  • Equity
  • Competitive base salary
  • Full benefits package

Related Jobs

GPU Architecture Engineer - New College Grad 2025

Entry-level GPU Architecture Engineer position at NVIDIA, focusing on developing and enhancing GPU architecture features using C++ and Python, with competitive salary and benefits.

GPU Architecture Engineer - New College Grad 2025

Entry-level GPU Architecture Engineer position at NVIDIA focusing on developing and enhancing GPU architecture features and testing infrastructure.

GPU Architecture Engineer - New College Grad 2025

Entry-level GPU Architecture Engineer position at NVIDIA focusing on developing and enhancing GPU architecture features and testing infrastructure.

Entry Level Automation & Controls Engineer (Start Summer/Fall 2026)

Entry-level automation and controls engineering role at Barry-Wehmiller Design Group, focusing on PLC programming, control systems design, and manufacturing automation.

2026 January - Jr. Software Engineer (New Grad)

Entry-level Software Engineer position at Bumble Inc., working on full-stack development with modern technologies in a hybrid work environment in Austin, TX.