Member of Technical Staff, Training Performance Engineer

AI company training and deploying frontier models for developers and enterprises building AI systems for content generation, semantic search, RAG, and agents.
Machine Learning
Senior Software Engineer
Hybrid
AI

Description For Member of Technical Staff, Training Performance Engineer

Cohere is seeking a Member of Technical Staff, Training Performance Engineer to join their mission of scaling intelligence to serve humanity. This role, part of the Pre-Training team, focuses on optimizing the performance of advanced language models and systems. The position combines software engineering, machine learning, and low-level kernel development expertise to enhance model performance and training throughput.

The ideal candidate will work on critical aspects of model optimization, including writing high-performance software, developing CUDA kernels, and implementing distributed training strategies. They will be responsible for identifying and removing performance bottlenecks while working with cutting-edge training and profiling tools.

Cohere offers a collaborative environment working alongside world-class researchers and engineers, with offices in major tech hubs like London, Toronto, San Francisco, and New York, while maintaining a remote-friendly culture. The company provides comprehensive benefits including health and dental coverage, mental health support, parental leave, and generous vacation time.

This role presents a unique opportunity to impact the future of AI development, working with frontier models and contributing to systems that power next-generation AI applications. The position requires strong technical expertise but also offers growth potential and the chance to work with leading researchers in the field. Cohere values diversity and maintains an inclusive work environment, welcoming applicants from all backgrounds.

Last updated a day ago

Responsibilities For Member of Technical Staff, Training Performance Engineer

  • Design and write high-performant and scalable software for training
  • Understand architectural modifications and design choices and their effects on training throughput and quality
  • Write low-level CUDA, triton kernels to optimize accelerator performance
  • Research, implement, and experiment with ideas on supercompute and data infrastructure
  • Work with top researchers in the field

Requirements For Member of Technical Staff, Training Performance Engineer

Python
Linux
  • Extremely strong software engineering skills
  • Proficiency in Python and ML frameworks such as JAX, Pytorch and XLA/MLIR
  • Experience writing kernels for GPUs using CUDA, triton
  • Experience using large-scale distributed training strategies
  • Familiarity with autoregressive sequence models, such as Transformers

Benefits For Member of Technical Staff, Training Performance Engineer

Dental Insurance
Medical Insurance
Mental Health Assistance
Parental Leave
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits
  • Mental health budget
  • 100% Parental Leave top-up for 6 months (Canada, US, and UK)
  • Personal enrichment benefits for arts, culture, fitness, and workspace
  • Remote-flexible work environment
  • Co-working stipend
  • 6 weeks of vacation

Interested in this job?

Jobs Related To Cohere Member of Technical Staff, Training Performance Engineer

Member of Technical Staff, Training Infra Engineer

Senior ML infrastructure engineering role at Cohere, focusing on building and optimizing large-scale AI model training systems with cutting-edge technology.

Senior Software Engineer - Windows AI Agent

Senior Software Engineer position at Microsoft focusing on Windows AI Agent development, specializing in scalable model infrastructure and cloud-based AI workflows.

Machine Learning Engineer

Senior Machine Learning Engineer role at Adobe, developing innovative ML models and deploying AI solutions for the Digital Experience platform. Salary range: $120,700-$228,600.

Senior MLOps / AIOps Engineer

Senior MLOps/AIOps Engineer position at Oracle in Casablanca, focusing on ML model deployment, CI/CD pipelines, and production infrastructure for enterprise AI systems.

Senior Machine Learning Engineer, Trust & Safety

Senior Machine Learning Engineer position at Hinge focusing on Trust & Safety, developing AI solutions for content moderation and user safety.