Platform Reliability Engineer

NVIDIA is the world leader in accelerated computing, pioneering AI and digital twins technology to transform industries.
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS · AI

Description For Platform Reliability Engineer

NVIDIA is seeking a Platform Reliability Engineer to join their team working on the Unified Commerce Platform (UCP). This role is crucial in maintaining the reliability and excellence of their commerce platform that handles critical functions like subscription management, payment processing, and fraud prevention. The position requires a unique blend of software engineering expertise and reliability engineering mindset.

The ideal candidate will be responsible for developing and implementing comprehensive testing frameworks, automation solutions, and reliability processes that ensure the platform meets its SLA commitments across all tenant environments. This includes creating automated testing strategies, performance monitoring systems, and proactive issue identification mechanisms.

As a Platform Reliability Engineer, you'll work at the intersection of development and reliability assurance, directly impacting customer trust and platform stability. The role involves designing test frameworks for various levels of testing, from unit tests to end-to-end validation, while also implementing monitoring solutions and establishing reliability processes.

The position offers the opportunity to work with cutting-edge commerce platform technology while ensuring its reliability and performance. You'll be part of a team that values both technical excellence and customer satisfaction, working on systems that process sensitive financial data and require the highest standards of security and reliability.

This role at NVIDIA, the world leader in accelerated computing, offers the chance to work on systems that directly impact business operations and customer experience. The company's focus on AI and digital twins technology makes this an exciting opportunity for someone passionate about reliability engineering in a cutting-edge technical environment.

Last updated 6 hours ago

Responsibilities For Platform Reliability Engineer

  • Design automated testing strategies and frameworks across unit, integration, API and end-to-end levels
  • Create performance testing frameworks to validate platform scalability
  • Develop comprehensive monitoring solutions with alerting systems
  • Implement specialized test frameworks for security controls
  • Design test data strategies with generation frameworks
  • Establish reliability processes and incident response protocols
  • Build scalable automation infrastructure
  • Develop comprehensive validation strategies

Requirements For Platform Reliability Engineer

Go
Java
JavaScript
Python
  • Bachelor's degree in Computer Science, Software Engineering, or related technical field
  • 5+ years of experience in software development, test automation, or quality engineering
  • Strong programming skills in languages such as Golang, Java, Python, or JavaScript
  • Experience designing and implementing automated test frameworks
  • Knowledge of testing practices in agile development environments
  • Experience with API testing and web service validation
  • Experience with testing in cloud environments (AWS, Azure, or GCP)
  • Background in performance testing methodologies and tools
  • Understanding of database testing and data validation techniques
  • Familiarity with security testing approaches for sensitive financial applications

Interested in this job?

Jobs Related To NVIDIA Platform Reliability Engineer

Senior SRE Software Engineer, Storage and Data

Senior SRE position at NVIDIA focusing on storage infrastructure reliability and performance optimization for DGX Cloud platform, requiring 5+ years of experience in system administration and reliability engineering.

Site Reliability Engineer, AI/ML Platforms

Senior Site Reliability Engineer role at Adobe focusing on AI/ML platforms, requiring expertise in Kubernetes, distributed systems, and DevOps practices.

Solutions Reliability Engineer III

Senior Solutions Reliability Engineer role at Capital Group in Singapore, focusing on system reliability and infrastructure management.

Senior DBA & Site Reliability Engineer

Senior DBA & Site Reliability Engineer position at Oracle, focusing on cloud infrastructure and database management for healthcare applications with 5+ years experience required.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Salesforce, responsible for maintaining and improving the reliability and performance of Salesforce's cloud infrastructure.