Google DeepMind is seeking a Research Engineer to join their multimodal features team in Media Understanding. This role presents an exceptional opportunity to advance state-of-the-art research in embedding/representation models within the context of large language models. The position involves developing cutting-edge models that will power Google products used by billions globally, focusing on understanding and processing diverse media types including text, images, audio, and video.
The team consists of research/software engineers, research scientists, and machine learning experts working collaboratively to achieve superhuman understanding of the visual world. The primary goal is to train the most powerful omnimodal embedding model for retrieval and other agentic use cases in Google products.
As a Research Engineer, you'll be at the forefront of developing next SOTA models for multimodal understanding. Your responsibilities will include researching new modeling techniques, implementing research ideas, conducting experiments to evaluate improvements, and identifying new opportunities. The role requires a strong background in computer science or related fields, with either a Ph.D. or significant industry experience.
The position offers a competitive compensation package ranging from $215,000 to $250,000, plus bonus, equity, and benefits. You'll be working in Mountain View, California, collaborating with world-class researchers and engineers. This role provides an unique opportunity to shape the future of multimodal AI and its applications while working on projects that have direct impact on billions of users worldwide.