Research Engineer, Media Understanding - Multimodal Representation Models

Google DeepMind

Google DeepMind is a leading AI research company focused on developing advanced artificial intelligence systems and solutions.

Mountain View, CA, USA

$215,000 - $250,000

Machine Learning

Senior Software Engineer

In-Person

5,000+ Employees

5+ years of experience

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Research Engineer, Media Understanding - Multimodal Representation Models

Google DeepMind is seeking a Research Engineer to join their multimodal features team in Media Understanding. This role presents an exceptional opportunity to advance state-of-the-art research in embedding/representation models within the context of large language models. The position involves developing cutting-edge models that will power Google products used by billions globally, focusing on understanding and processing diverse media types including text, images, audio, and video.

The team consists of research/software engineers, research scientists, and machine learning experts working collaboratively to achieve superhuman understanding of the visual world. The primary goal is to train the most powerful omnimodal embedding model for retrieval and other agentic use cases in Google products.

As a Research Engineer, you'll be at the forefront of developing next SOTA models for multimodal understanding. Your responsibilities will include researching new modeling techniques, implementing research ideas, conducting experiments to evaluate improvements, and identifying new opportunities. The role requires a strong background in computer science or related fields, with either a Ph.D. or significant industry experience.

The position offers a competitive compensation package ranging from $215,000 to $250,000, plus bonus, equity, and benefits. You'll be working in Mountain View, California, collaborating with world-class researchers and engineers. This role provides an unique opportunity to shape the future of multimodal AI and its applications while working on projects that have direct impact on billions of users worldwide.

Last updated 3 months ago

Responsibilities For Research Engineer, Media Understanding - Multimodal Representation Models

Conducting core research in computer vision, language understanding, multimodal models, and large scale AI models
Training and evaluating AI models for various product use cases
Researching, implementing, and adapting state of the art deep learning approaches
Collaborating with other GDM and partner teams to build advanced embedding models

Requirements For Research Engineer, Media Understanding - Multimodal Representation Models

Python

Ph.D. in Computer Science or related quantitative field, or B.S./M.S. with 5+ years of relevant experience
Strong research experience and publication record in top tier conferences
Experience with core software engineering and applied implementations of AI
Ability to work across teams and collaborate with research and product teams
Experience with Google-scale infrastructure (preferred)