Evals Software Engineer

Apollo Research

Apollo Research focuses on AI safety, particularly deceptive alignment, through interpretability and behavioral model evaluations.

London, UK

Mid-Level Software Engineer

In-Person

11 - 50 Employees

2+ years of experience

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Evals Software Engineer

Apollo Research is seeking an Evals Software Engineer to join their team in London. The role focuses on building and maintaining software for AI safety evaluations, particularly related to deceptive alignment. Key responsibilities include extending internal libraries for language model evaluations, collaborating with researchers, and advocating for good software design practices. The ideal candidate should have at least 2 years of experience in Python development and a strong interest in AI safety.

The Evals team at Apollo Research works on conceptual safety cases, building evaluations for deceptive alignment properties, conducting evaluations on frontier models, and creating model organisms to demonstrate behaviors related to deceptive alignment. The role offers an opportunity to work on cutting-edge AI safety research and development.

Apollo Research aims for a culture emphasizing truth-seeking, goal-orientation, constructive feedback, and helpfulness. They welcome applicants from all backgrounds and offer a range of benefits including private medical insurance, flexible work hours, unlimited vacation, and a professional development budget.

The position is based in London, with the possibility of visa sponsorship for international candidates. The interview process includes multiple stages, focusing on practical skills related to the job rather than general coding challenges. Early applications are encouraged as reviews are conducted on a rolling basis.

Last updated a year ago

Responsibilities For Evals Software Engineer

Maintain and extend internal library for building and running language model evaluations
Work closely with researchers to understand their challenges and increase productivity
Collaboratively iterate on the vision and priorities for the internal software stack
Advocate for good software design practices and codebase health
Keep up to date with latest approaches to implementing evals frameworks and LLM scaffolding
Optionally run well-scoped research projects

Requirements For Evals Software Engineer

Python

At least 2 years of FTE-equivalent experience in software development with Python
Strong communication skills and user/researcher empathy
Experience in rapidly iterating on software products
Enjoy close in-person collaboration and pair programming
Experience building organization-internal tools

Benefits For Evals Software Engineer

Medical Insurance

Private Medical Insurance
Flexible work hours and schedule
Unlimited vacation
Unlimited sick leave
Lunch, dinner, and snacks provided on workdays
Paid work trips, including staff retreats, business trips, and relevant conferences
Yearly $1,000 professional development budget