Member of Technical Staff, Pre-Training Data Engineer

Cohere

AI company training and deploying frontier models for developers and enterprises building AI systems for content generation, semantic search, RAG, and agents.

Toronto, ON, Canada • Ottawa, ON, Canada • San Francisco, CA, USA…

Data

Mid-Level Software Engineer

Remote

Description For Member of Technical Staff, Pre-Training Data Engineer

Cohere is at the forefront of AI development, training and deploying frontier models for developers and enterprises. As a Pre-Training Data Engineer, you'll be instrumental in developing the data infrastructure that powers Cohere's advanced language models. The role combines technical expertise with research innovation, focusing on end-to-end management of training data including ingestion, cleaning, filtering, and optimization.

You'll work with diverse data sources including web data, code data, and multilingual corpora, ensuring their quality and reliability. The position requires strong software engineering skills, particularly in Python, and experience with data processing frameworks like Apache Spark or Apache Beam. You'll be designing scalable pipelines, conducting data ablations, and experimenting with data mixtures to enhance model performance.

The company offers an inclusive work environment with offices in major tech hubs like Toronto, San Francisco, New York, London, and Paris, while embracing remote work flexibility. Benefits include comprehensive health coverage, mental health support, generous parental leave, and 6 weeks of vacation. You'll be joining a team of world-class researchers and engineers who are passionate about their craft and committed to scaling intelligence to serve humanity.

This role presents a unique opportunity to bridge the gap between raw data and cutting-edge AI models, directly contributing to improvements in critical training metrics. If you're passionate about transforming data into the foundation of AI systems and want to work on challenging problems with significant impact, this position offers the perfect blend of technical challenge and meaningful contribution to the future of AI technology.

Last updated 6 hours ago

Responsibilities For Member of Technical Staff, Pre-Training Data Engineer

Design and build scalable data pipelines to ingest, clean, filter, and optimize diverse datasets
Conduct data ablations to assess data quality and experiment with data mixtures
Develop robust data modeling techniques for optimal training efficiency
Research and implement innovative data curation methods
Collaborate with cross-functional teams to ensure data pipelines meet requirements

Requirements For Member of Technical Staff, Pre-Training Data Engineer

Python

Strong software engineering skills, with proficiency in Python and experience building data pipelines
Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
Experience working with large-scale datasets, including web data, code data, and multilingual corpora
Knowledge of data quality assessment techniques and experimentation with data mixtures
A passion for bridging research and engineering to solve complex data-related challenges in AI model training

Benefits For Member of Technical Staff, Pre-Training Data Engineer

Dental Insurance

Medical Insurance

Mental Health Assistance

Parental Leave

Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits
Mental health budget
100% Parental Leave top-up for 6 months (Canada, US, and UK)
Personal enrichment benefits for arts, culture, fitness, and workspace improvement
Remote-flexible work environment
Co-working stipend
6 weeks of vacation

Cohere

AI company training and deploying frontier models for developers and enterprises building AI systems for content generation, semantic search, RAG, and agents.

Toronto, ON, Canada • Ottawa, ON, Canada • San Francisco, CA, USA…

Data

Mid-Level Software Engineer

Remote

Interested in this job?

Member of Technical Staff, Pre-Training Data Engineer

Cohere

Description For Member of Technical Staff, Pre-Training Data Engineer

Responsibilities For Member of Technical Staff, Pre-Training Data Engineer

Requirements For Member of Technical Staff, Pre-Training Data Engineer

Benefits For Member of Technical Staff, Pre-Training Data Engineer

Cohere

Jobs Related To Cohere Member of Technical Staff, Pre-Training Data Engineer