What are the responsibilities and job description for the Research Engineer / Scientist, Safeguards position at Anthropic?

About the role

The Safeguards Research Team conducts critical safety research and engineering to ensure AI systems can be deployed safely. As part of Anthropic's broader safeguards organization, we work on both immediate safety challenges and longer-term research initiatives, with projects spanning jailbreak robustness, automated red-teaming, monitoring techniques, and applied threat modeling. We prioritize techniques that will enable the safe deployment of more advanced AI systems (ASL-3 and beyond), taking a pragmatic approach to fundamental AI safety challenges while maintaining strong research rigor.

You take a pragmatic approach to running machine learning experiments to help us understand and steer the behavior of powerful AI systems. You care about making AI helpful, honest, and harmless, and are interested in the ways that this could be challenging in the context of human-level capabilities. You could describe yourself as both a scientist and an engineer. You’ll both focus on risks from powerful future systems (like those we would designate as ASL-3 or ASL-4 under our Responsible Scaling Policy), as well as better understanding risks occurring today. You will work in collaboration with other teams including Interpretability, Fine-Tuning, Frontier Red Team, Alignment Science,

These papers give a simple overview of the topics the team works on: Best-of-N Jailbreaking, Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats, Rapid Response: Mitigating LLM Jailbreaks with a Few Examples, Many-shot Jailbreaking, When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Note: Currently, the team has a preference for candidates who are able to be based in the Bay Area. However, we remain open to any candidate who can travel 25% to the Bay Area.

Representative projects:

Testing the robustness of our safety techniques by training language models to subvert our safety techniques, and seeing how effective they are at subverting our interventions.
Run multi-agent reinforcement learning experiments to test out techniques like AI Debate.
Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts.
Contribute ideas, figures, and writing to research papers, blog posts, and talks.
Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy.

You may be a good fit if you:

Have significant software, ML, or research engineering experience
Have some experience contributing to empirical AI research projects
Have some familiarity with technical AI safety research
Prefer fast-moving collaborative projects to extensive solo efforts
Pick up slack, even if it goes outside your job description
Care about the impacts of AI

Strong candidates may also:

Have experience authoring research papers in machine learning, NLP, or AI safety
Have experience with LLMs
Have experience with reinforcement learning
Have experience with Kubernetes clusters and complex shared codebases

Apply for this job

Receive alerts for other Research Engineer / Scientist, Safeguards job openings

Job openings at Anthropic

Research Engineer / Scientist, Safeguards

Anthropic

San Francisco, CA Full Time

About the role The Safeguards Research Team conducts critical safety research and engineering to ensure AI systems can b...

External Affairs, US State & Local

Anthropic

San Francisco, CA Full Time

About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be saf...

Technical Documentation & Content Lead Job at Anthropic in San Francisco

Anthropic

San Francisco, CA Full Time

About the role : Anthropic is pioneering safe and reliable AI assistants. We're seeking a technical leader who can creat...

Data Operations Manager

Anthropic

San Francisco, CA Full Time

About the Role Anthropic’s Research team is looking for a Data Operations Manager to help us level up our human data ope...

Not the job you're looking for? Here are some other Research Engineer / Scientist, Safeguards jobs in the San Francisco, CA area that may be a better fit.

Research Engineer / Scientist, Safeguards

What are the responsibilities and job description for the Research Engineer / Scientist, Safeguards position at Anthropic?

About the role

You may be a good fit if you:

Strong candidates may also:

What is the career path for a Research Engineer / Scientist, Safeguards?

Job openings at Anthropic

Not the job you're looking for? Here are some other Research Engineer / Scientist, Safeguards jobs in the San Francisco, CA area that may be a better fit.

We don't have any other Research Engineer / Scientist, Safeguards jobs in the San Francisco, CA area right now.

AI Assistant is available now!