Rakuten's AI & Data Division (AIDD) is seeking a Senior System Engineer to join their GPU Optimization Department. This role is crucial in managing and optimizing Rakuten's company-wide AI infrastructure, focusing on high-performance computing and GPU resource management. The position involves working with cutting-edge technologies including the latest Hopper and upcoming Blackwell architectures, spanning thousands of accelerators across hybrid infrastructure.
The role combines DevOps expertise with specialized knowledge in GPU infrastructure, requiring deep understanding of Kubernetes, distributed systems, and ML/AI workloads. You'll be responsible for building and scaling GPU infrastructure that supports both training (ranking models, LLMs) and inference workloads, ensuring efficient utilization and stability of Rakuten's AI computing resources.
This is an excellent opportunity for an experienced engineer who wants to work at the intersection of infrastructure and AI, managing large-scale GPU clusters and optimizing performance for critical AI workloads. You'll be part of a team that enables AI innovation across Rakuten's global operations, working with state-of-the-art hardware and software solutions.
The position offers exposure to cutting-edge AI infrastructure challenges, including work with large language models, real-time AI, and distributed training systems. You'll collaborate with global AI/ML teams and have the opportunity to shape the future of Rakuten's GPU platform architecture.