Skriv sökord

Snabbfakta

    • Greater London

Ansök senast: 2025-04-18

Interpretability Researcher

Publicerad 2025-02-17

Mechanistic Interpretability

AISI is launching a brand-new Mechanistic Interpretability team to research the fundamental question of how can we tell if a model is scheming? This is an ambitious bet to bring interpretability as a field into prime time. We believe that this is a vital challenge that mechanistic interpretability can help solve, ensuring that dangerous capability evaluations can be reliably determine if models are safe to release even when the models themselves are capable of gaming the evals. We also think it can lead to an entirely new field of alignment evaluations and make substantial contributions to the problem of technical AI safety.

To launch this project we're looking for a team lead, research scientists and research engineers. Apply now to join the largest technical AI safety lab on the planet - help us make this happen!

Role Summary

This team will have a large amount of scientific autonomy, with the ability to chase ambitious research bets. Your responsibilities may involve any of the following:

  • Supervised fine tuning (SFT) of large models for scheming.
  • Training sparse auto encoders (or fine-tuning open source SAEs).
  • Circuit discovery/analysis.
  • Automated scheming detection.

You’ll receive coaching from your manager and mentorship from the research directors at AISI (including Geoffrey Irving and Yarin Gal). You will also regularly interact with world-famous researchers and other incredible staff (including alumni from Anthropic, DeepMind, OpenAI and ML professors from Oxford and Cambridge). We have a very strong learning & development culture to support this, including Friday afternoons devoted to deep reading and multiple weekly paper reading groups. From a compute perspective, you'll have unparalleled access to resources including 5,448 Nvidia Grace-Hopper GPUs (e.g., H100s).

Person Specification

You may be a good fit if you have some of the following skills, experience and attitudes:

  • Hands-on mechanistic interpretability research experience.
  • Experience working within a research team that has delivered multiple exceptional scientific breakthroughs in deep learning (or a related field). We’re looking for evidence of an exceptional ability to drive progress.
  • Comprehensive understanding of large language models (e.g. GPT-4), including both a broad understanding of the literature and hands-on experience with pre-training or fine tuning LLMs.
  • Strong track-record of academic excellence (e.g. multiple spotlight papers at top-tier conferences).
  • Improving scientific standards and rigour through mentorship & feedback.
  • Strong written and verbal communication skills.
  • Experience working with world-class multi-disciplinary teams, including both scientists and engineers (e.g. in a top-3 lab).
  • Acting as a bar raiser for interviews.

Salary & Benefits

We are hiring individuals at all ranges of seniority and experience within the research unit, and this advert allows you to apply for any of the roles within this range. We will discuss and calibrate with you as part of the process. The full range of salaries available is as follows:

  • L3: £65,000 - £75,000
  • L4: £85,000 - £95,000
  • L5: £105,000 - £115,000
  • L6: £125,000 - £135,000
  • L7: £145,000

There are a range of pension options available which can be found through the Civil Service website.

Selection Process

In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process.

Required Experience

We select based on skills and experience regarding the following areas:

  • Mechanistic interpretability experience
  • Research problem selection
  • Research science
  • Writing code efficiently
  • Python
  • Frontier model architecture knowledge
  • Frontier model training knowledge
  • Model evaluations knowledge
  • AI safety research knowledge
  • Written communication
  • Verbal communication
  • Teamwork
  • Interpersonal skills
  • Tackle challenging problems
  • Learn through coaching

Desired Experience

We additionally may factor in experience with any of the areas that our work-streams specialise in:

  • Autonomous systems
  • Cyber security
  • Chemistry or Biology
  • Safeguards
  • Safety Cases
  • Societal Impacts
#J-18808-Ljbffr

Liknande jobb

Publicerad: 2025-02-25
  • London
Publicerad: 2025-03-11
  • London
Publicerad: 2025-03-06
  • Cardiff