Taro Logo

Senior Site Reliability Engineer, Compute

Building the World's Favorite AI-first Cloud infrastructure company, pioneering vertically integrated, purpose-built AI infrastructure solutions powered by clean, renewable energy.
$183,000 - $210,000
Site Reliability
Senior Software Engineer
Hybrid
5+ years of experience
AI · Enterprise SaaS · Cloud

Description For Senior Site Reliability Engineer, Compute

Crusoe is revolutionizing AI cloud infrastructure by building sustainable, high-performance computing solutions. As a Senior Site Reliability Engineer focused on Compute, you'll be instrumental in developing and optimizing the company's virtualization and compute infrastructure. The role combines deep technical expertise in Linux kernel internals, virtualization technologies, and system optimization with a focus on supporting modern AI and HPC workloads.

You'll work with cutting-edge technologies including SmartNICs, BlueField devices, and TPUs, while being responsible for critical infrastructure components from the kernel to orchestration layers. The position requires strong programming skills in languages like C, Go, or Rust, and extensive experience with system-level debugging and performance optimization.

The company offers a comprehensive benefits package including equity, competitive salary, and various health and wellness benefits. Working in a hybrid environment in San Francisco, you'll be part of a team that's setting new standards in sustainable AI infrastructure. This is an opportunity to make a significant impact at a well-funded technology company that's aligned with both technological advancement and environmental responsibility.

The role is perfect for an experienced SRE who is passionate about infrastructure optimization, has deep Linux expertise, and wants to work on challenging problems at the intersection of AI, cloud computing, and sustainability. You'll be contributing to a platform that's considered the "gold standard" for reliability and performance in AI infrastructure.

Last updated 5 hours ago

Responsibilities For Senior Site Reliability Engineer, Compute

  • Develop automation and observability tools to monitor compute infrastructure
  • Support and scale virtualization stack including KVM, QEMU, and other hypervisors
  • Collaborate with Linux kernel and hardware teams to resolve performance bottlenecks
  • Optimize performance for AI and HPC workloads across CPU, GPU, and DPU/NIC resources
  • Perform root cause analysis for kernel crashes and hardware-software integration problems
  • Integrate hypervisor-level enhancements for guest VM reliability and workload isolation
  • Tune kernel subsystems including process scheduler, NUMA configuration, and memory management
  • Implement and validate support for emerging compute hardware

Requirements For Senior Site Reliability Engineer, Compute

Linux
Go
Rust
  • 5+ years of professional experience in SRE, Linux system engineering, or compute infrastructure roles
  • Strong proficiency in Linux kernel internals, with exposure to scheduler, memory allocation, and driver subsystems
  • Experience with virtualization architectures and technologies such as KVM, Xen, QEMU, or VMware
  • Familiarity with SmartNICs/DPUs and kernel bypass techniques
  • Expert-level skills in at least one programming language: C, Go, or Rust
  • Experience with system-level debugging, including kdump, kexec, and kernel panic analysis
  • Proficiency in Infrastructure as Code tooling and CI/CD practices
  • Strong understanding of compute scheduling, resource management, and high-throughput networking

Benefits For Senior Site Reliability Engineer, Compute

401k
Medical Insurance
Dental Insurance
Vision Insurance
Parental Leave
Education Budget
Equity
Commuter Benefits
  • Hybrid work schedule
  • Industry competitive pay
  • Restricted Stock Units
  • Health insurance package (HDHP and PPO, vision, and dental)
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit ($50 per pay period)

Interested in this job?

Jobs Related To Crusoe Senior Site Reliability Engineer, Compute

Senior Site Reliability Engineer, Storage

Senior Site Reliability Engineer position at Crusoe, focusing on storage infrastructure for AI-optimized cloud services, offering $183k-$210k salary with hybrid work in San Francisco.

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer position at Cisco ThousandEyes, focusing on production engineering and cloud infrastructure management in London with hybrid work arrangement.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Thomson Reuters, focusing on maintaining and improving system reliability and infrastructure scalability.

Senior Software Engineer, Site Reliability Tooling

Senior SRE Engineer role at Upstart, building and maintaining tooling for reliability and observability of AI-powered lending platforms. Remote-friendly with competitive compensation.

Senior Site Reliability Engineer, Storage

Senior Site Reliability Engineer position at Crusoe, focusing on storage infrastructure for AI-optimized cloud services, offering $183k-$210k salary with hybrid work in San Francisco.