Demo

Research Engineer, Trust & Safety

Anthropic
San Francisco, CA Full Time
POSTED ON 12/20/2024
AVAILABLE BEFORE 2/20/2025

About the role

We are looking for Research engineers to help design and build safety and oversight algorithms for our AI models and products. As a Trust and Safety Research Engineer, you will work to design and train ML models based on research progress, which detect harmful user/model behaviors and help ensure society's well-being. You will apply your research skills to uphold our principles of safety, transparency, and oversight while enforcing our terms of service and acceptable use policies.

What you will be working on:

  • Design, iterate and build ML models to detect unwanted or anomalous behaviors from both users and LLM models
  • Work with T&S ML engineers to review and iterate experiment ideations. Co-author the experiment success criteria and  production deployment roadmaps
  • Partner with T&S Policy and Enforcement cross-functional teams to understand emerging and sustained abuse patterns from user prompts and behaviors. Incorporate the insights into T&S research datasets
  • Surface abuse patterns to sibling research teams in the company. Collaborate together to harden Anthropic’s LLMs at the pre/post training stages
  • Stay current with state-of-the-art research in AI and machine learning, and propose ways to apply these advancements to T&S systems

You may be a good fit if you:

  • Have 4 years of experience in a research engineering or an applied research scientist position, preferably with a focus on trust and safety
  • Have significant Python programming experience and machine learning experience
  • Have proficiency in building trustworthy and safe AI technology
  • Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
  • Care about the societal impacts and long-term implications of your work and are results oriented

Strong candidates may also:

  • Have experience fine-tuning large language models with supervised learning or reinforcement learning
  • Have experience with machine learning frameworks like Scikit-Learn, Tensorflow, or Pytorch
  • Have experience authoring research papers in machine learning, NLP, or AI alignment or similar industry experience
  • Have developed evaluations for language models

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Research Engineer, Trust & Safety?

Sign up to receive alerts about other jobs on the Research Engineer, Trust & Safety career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$113,077 - $147,784
Income Estimation: 
$135,356 - $164,911
Income Estimation: 
$153,902 - $198,246
Income Estimation: 
$98,763 - $126,233
Income Estimation: 
$116,330 - $143,011
Income Estimation: 
$113,077 - $147,784
Income Estimation: 
$116,330 - $143,011
Income Estimation: 
$135,356 - $164,911
Income Estimation: 
$153,902 - $198,246
Income Estimation: 
$77,900 - $95,589
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$149,493 - $192,976
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Anthropic

Anthropic
Hired Organization Address New York, NY Full Time
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be saf...
Anthropic
Hired Organization Address San Francisco, CA Full Time
About the role: Anthropic is working on frontier AI research that has the potential to transform how humans and machines...
Anthropic
Hired Organization Address San Francisco, CA Full Time
About the role: We are seeking a Head of Strategic Finance for Compute at Anthropic. Compute is a critical ingredient in...
Anthropic
Hired Organization Address San Francisco, CA Full Time
About the role As an Enterprise Account Executive at Anthropic, you’ll drive adoption of safe, frontier AI by securing s...

Not the job you're looking for? Here are some other Research Engineer, Trust & Safety jobs in the San Francisco, CA area that may be a better fit.

Research Engineer, Trust & Safety

Menlo Ventures Management, L.P, San Francisco, CA

Research Engineer, Trust & Safety

Lionheart Ventures, San Francisco, CA

AI Assistant is available now!

Feel free to start your new journey!