System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Rakuten

Japanese e-commerce and fintech company that operates 70+ businesses spanning e-commerce, digital content, communications and fintech services.

Tokyo, Japan

DevOps

Senior Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Description For System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Rakuten's AI & Data Division (AIDD) is seeking a Senior System Engineer to join their GPU Optimization Department. This role is crucial in managing and optimizing Rakuten's company-wide AI infrastructure, focusing on high-performance computing and GPU resource management. The position involves working with cutting-edge technologies including the latest Hopper and upcoming Blackwell architectures, spanning thousands of accelerators across hybrid infrastructure.

The role combines DevOps expertise with specialized knowledge in GPU infrastructure, requiring deep understanding of Kubernetes, distributed systems, and ML/AI workloads. You'll be responsible for building and scaling GPU infrastructure that supports both training (ranking models, LLMs) and inference workloads, ensuring efficient utilization and stability of Rakuten's AI computing resources.

This is an excellent opportunity for an experienced engineer who wants to work at the intersection of infrastructure and AI, managing large-scale GPU clusters and optimizing performance for critical AI workloads. You'll be part of a team that enables AI innovation across Rakuten's global operations, working with state-of-the-art hardware and software solutions.

The position offers exposure to cutting-edge AI infrastructure challenges, including work with large language models, real-time AI, and distributed training systems. You'll collaborate with global AI/ML teams and have the opportunity to shape the future of Rakuten's GPU platform architecture.

Last updated 5 days ago

Responsibilities For System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Optimize Kubernetes for GPU workloads, including scheduling policies, autoscaling, and multi-tenant resource isolation
Deploy and maintain inference serving platforms for high-throughput and low-latency model deployment
Automate cluster provisioning, monitoring, and recovery
Collaborate with ML engineers to troubleshoot GPU-related issues
Implement observability tools to track GPU utilization and cluster health
Develop infrastructure-as-code solutions for reproducible GPU environments

Requirements For System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Python

Kubernetes

Linux

3+ years of experience in DevOps/MLOps, GPU infrastructure, or distributed computing
Deep expertise in Kubernetes for GPU workload orchestration
Strong programming skills in Go or Python for platform development
Proficiency in Linux system administration, performance tuning, and networking
Experience with IaC tools and CI/CD pipelines
Bachelor's or higher degree in Computer Science, Engineering, or related field
Strong teamwork and communication skills
Advanced English language skills

Rakuten

Japanese e-commerce and fintech company that operates 70+ businesses spanning e-commerce, digital content, communications and fintech services.

Tokyo, Japan

DevOps

Senior Software Engineer

In-Person

5,000+ Employees

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Rakuten

Description For System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Responsibilities For System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Requirements For System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)

Rakuten

Jobs Related To Rakuten System Engineer, GPU Infrastructure & Platform Engineering - GPU Optimization Department (GPUOD)