Interview Process:
TL;DR: Avoid if you value your time. The process dragged on for over a month with 5-6 rounds, no feedback between rounds (despite repeated requests), and ended with a silent rejection (no-reply server). This level of communication is unacceptable for such a lengthy process and such a small start-up that wants to be the AI leader in Europe!
Note to international applicants: Mistral is doing a lot of consultancy work, i.e., repurposing and retraining smaller LLMs (1-3B parameters) for various downstream tasks, e.g., for clients in automotive, finance, etc. Be very cautious as there might be some hidden requirements for French fluency down the line to communicate with local clients, which might make it difficult to progress career-wise.
Interview Tips:
For the live coding round, make sure you can implement efficiently from scratch (PyTorch only) all fundamental transformer modules (e.g., MHA, GQA, MQA, Self/Cross-Attention, LayerNorm, RmsNorm, FFNs, Positional Embeddings (rotary, learned, static), Masking strategies, Mixture of Experts (MoE), etc., with possible twists).
For the pair programming, they asked to debug an issue with pre-norm in a transformer block with residuals (fairly straightforward).
For the quiz round, focus on 'why'. You should be able to talk and reason about everything mentioned above and all of their variations, in depth. You should be able to provide geometric and algebraic explanations + intuitions (although the latter might not be appreciated that much). Additionally, you need to know practicalities about training/inferencing LLMs at large scale, such as KV-caching, FlashAttention, pre-training, fine-tuning, alignment, RHLF, etc. For the scaling part, read the blog post from Hugging Face (The Ultra Scale Playbook); they will ask you 3-4 questions from there, about FSDP, Zero1/2/3, as well as tensor, pipeline, and data parallelism, computation-communication overlap, etc. Make sure you understand these concepts very well.
Salary range in EUR: 75k-100k, based in Paris.
What is MHA, GQA, KV-cache?
Implement MHSA.
Fix a bug in a transformer block.
The following metrics were computed from 1 interview experience for the Mistral AI AI Engineer role in Paris, France.
Mistral AI's interview process for their AI Engineer roles in Paris, France is extremely selective, failing the vast majority of engineers.
Candidates reported having very negative feelings for Mistral AI's AI Engineer interview process in Paris, France.