Taro Logo

ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

AI platform building company democratizing good AI with over 5 million users & 100k organizations sharing 1M+ models, 300k datasets & apps.
France
Machine Learning
Software Engineering Intern
Remote
AI
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

Hugging Face, a leading platform in AI development with over 5 million users and 100k organizations, is seeking a ML Research Engineer Intern for their SmolLMs team. This exciting opportunity focuses on advancing small language models that enable cheaper inference and on-device running, promoting customization and privacy.

The role involves working with state-of-the-art infrastructure, including a scalable CPU cluster and an H100 cluster with nearly 100 nodes. You'll be part of the SmolLM team, contributing to building high-quality pre-training and post-training datasets, and implementing cutting-edge architecture and training techniques to develop state-of-the-art models.

The ideal candidate should be passionate about training LLMs and building high-quality datasets, with strong Python skills. You'll have the opportunity to work on developing the best small models in the field, collaborating with a team that's pushing the boundaries of AI technology.

Hugging Face offers a supportive and inclusive work environment, emphasizing diversity and professional growth. The company provides flexible working arrangements, comprehensive development opportunities, and strong community engagement in the ML/AI field. Their open-source libraries have garnered over 400k+ stars on Github, demonstrating their significant impact in the AI community.

This internship offers a unique opportunity to contribute to groundbreaking research in small language models while working with cutting-edge technology and a talented team. Whether you're in the office or working remotely, you'll be supported with the resources and mentorship needed to succeed in this role.

Last updated 7 months ago

Responsibilities For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

  • Work with the SmolLM team on building next generation of small language models
  • Iterate on datasets and models
  • Work with distributed training infrastructure
  • Build high quality pre-training and post-training datasets

Requirements For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

Python
  • Proficiency in Python
  • Passion for training LLMs and building high-quality datasets
  • Cover letter explaining interest in open-source work at Hugging Face

Benefits For ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

  • Flexible working hours
  • Remote work options
  • Office visits opportunity
  • Workstation support
  • Conference and training reimbursement
  • Educational development support

Interested in this job?