Demo

Research Engineer / Scientist, Safeguards

Anthropic
San Francisco, CA Full Time
POSTED ON 2/8/2025
AVAILABLE BEFORE 5/6/2025

About the role

The Safeguards Research Team conducts critical safety research and engineering to ensure AI systems can be deployed safely. As part of Anthropic's broader safeguards organization, we work on both immediate safety challenges and longer-term research initiatives, with projects spanning jailbreak robustness, automated red-teaming, monitoring techniques, and applied threat modeling. We prioritize techniques that will enable the safe deployment of more advanced AI systems (ASL-3 and beyond), taking a pragmatic approach to fundamental AI safety challenges while maintaining strong research rigor.

You take a pragmatic approach to running machine learning experiments to help us understand and steer the behavior of powerful AI systems. You care about making AI helpful, honest, and harmless, and are interested in the ways that this could be challenging in the context of human-level capabilities. You could describe yourself as both a scientist and an engineer. You’ll focus on risks from powerful future systems (like those we would designate as ASL-3 or ASL-4 under our Responsible Scaling Policy), as well as better understanding risks occurring today. You will work in collaboration with other teams including Interpretability, Fine-Tuning, Frontier Red Team, and Alignment Science.

Representative projects :

  • Testing the robustness of our safety techniques by training language models to subvert our safety techniques, and seeing how effective they are at subverting our interventions.
  • Run multi-agent reinforcement learning experiments to test out techniques like AI Debate.
  • Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
  • Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts.
  • Contribute ideas, figures, and writing to research papers, blog posts, and talks.
  • Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy.

You may be a good fit if you :

  • Have significant software, ML, or research engineering experience
  • Have some experience contributing to empirical AI research projects
  • Have some familiarity with technical AI safety research
  • Prefer fast-moving collaborative projects to extensive solo efforts
  • Pick up slack, even if it goes outside your job description
  • Care about the impacts of AI
  • Strong candidates may also :

  • Have experience authoring research papers in machine learning, NLP, or AI safety
  • Have experience with LLMs
  • Have experience with reinforcement learning
  • Have experience with Kubernetes clusters and complex shared codebases
  • J-18808-Ljbffr

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Research Engineer / Scientist, Safeguards?

    Sign up to receive alerts about other jobs on the Research Engineer / Scientist, Safeguards career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $113,077 - $147,784
    Income Estimation: 
    $135,356 - $164,911
    Income Estimation: 
    $153,902 - $198,246
    Income Estimation: 
    $135,356 - $164,911
    Income Estimation: 
    $153,053 - $187,211
    Income Estimation: 
    $153,902 - $198,246
    Income Estimation: 
    $113,077 - $147,784
    Income Estimation: 
    $135,356 - $164,911
    Income Estimation: 
    $153,902 - $198,246
    Income Estimation: 
    $98,763 - $126,233
    Income Estimation: 
    $116,330 - $143,011
    Income Estimation: 
    $113,077 - $147,784
    Income Estimation: 
    $116,330 - $143,011
    Income Estimation: 
    $135,356 - $164,911
    Income Estimation: 
    $153,902 - $198,246
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at Anthropic

    Anthropic
    Hired Organization Address Washington, DC Full Time
    About the role As part of our Public Sector team, you'll serve as the operational backbone of our public sector initiati...
    Anthropic
    Hired Organization Address New York, NY Full Time
    About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be saf...
    Anthropic
    Hired Organization Address San Francisco, CA Full Time
    About the Role As a societal impacts research scientist at Anthropic, you'll join a team conducting empirical research o...
    Anthropic
    Hired Organization Address San Francisco, CA Full Time
    About the Role As a Senior Revenue Operations Analyst at Anthropic, you will play a key role in supporting operational e...

    Not the job you're looking for? Here are some other Research Engineer / Scientist, Safeguards jobs in the San Francisco, CA area that may be a better fit.

    SEAL Research Scientist/ Research Engineer

    TBWA\Chiat\Day, San Francisco, CA

    AI Assistant is available now!

    Feel free to start your new journey!