ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

Hugging Face

AI platform building company democratizing good AI with over 5 million users & 100k organizations sharing 1M+ models, 300k datasets & apps.

France

Machine Learning

Software Engineering Intern

Remote

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

Hugging Face, a leading platform in AI development with over 5 million users and 100k organizations, is seeking a ML Research Engineer Intern for their SmolLMs team. This exciting opportunity focuses on advancing small language models that enable cheaper inference and on-device running, promoting customization and privacy.

The role involves working with state-of-the-art infrastructure, including a scalable CPU cluster and an H100 cluster with nearly 100 nodes. You'll be part of the SmolLM team, contributing to building high-quality pre-training and post-training datasets, and implementing cutting-edge architecture and training techniques to develop state-of-the-art models.

The ideal candidate should be passionate about training LLMs and building high-quality datasets, with strong Python skills. You'll have the opportunity to work on developing the best small models in the field, collaborating with a team that's pushing the boundaries of AI technology.

Hugging Face offers a supportive and inclusive work environment, emphasizing diversity and professional growth. The company provides flexible working arrangements, comprehensive development opportunities, and strong community engagement in the ML/AI field. Their open-source libraries have garnered over 400k+ stars on Github, demonstrating their significant impact in the AI community.

This internship offers a unique opportunity to contribute to groundbreaking research in small language models while working with cutting-edge technology and a talented team. Whether you're in the office or working remotely, you'll be supported with the resources and mentorship needed to succeed in this role.

Last updated 7 months ago

Responsibilities For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

Work with the SmolLM team on building next generation of small language models
Iterate on datasets and models
Work with distributed training infrastructure
Build high quality pre-training and post-training datasets

Requirements For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

Python

Proficiency in Python
Passion for training LLMs and building high-quality datasets
Cover letter explaining interest in open-source work at Hugging Face

Benefits For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

Flexible working hours
Remote work options
Office visits opportunity
Workstation support
Conference and training reimbursement
Educational development support