What are the responsibilities and job description for the Research Engineer, Trust & Safety position at Anthropic?

About the role

We are looking for Research engineers to help design and build safety and oversight algorithms for our AI models and products. As a Trust and Safety Research Engineer, you will work to design and train ML models based on research progress, which detect harmful user/model behaviors and help ensure society's well-being. You will apply your research skills to uphold our principles of safety, transparency, and oversight while enforcing our terms of service and acceptable use policies.

What you will be working on:

Design, iterate and build ML models to detect unwanted or anomalous behaviors from both users and LLM models
Work with T&S ML engineers to review and iterate experiment ideations. Co-author the experiment success criteria and production deployment roadmaps
Partner with T&S Policy and Enforcement cross-functional teams to understand emerging and sustained abuse patterns from user prompts and behaviors. Incorporate the insights into T&S research datasets
Surface abuse patterns to sibling research teams in the company. Collaborate together to harden Anthropic’s LLMs at the pre/post training stages
Stay current with state-of-the-art research in AI and machine learning, and propose ways to apply these advancements to T&S systems

You may be a good fit if you:

Have 4 years of experience in a research engineering or an applied research scientist position, preferably with a focus on trust and safety
Have significant Python programming experience and machine learning experience
Have proficiency in building trustworthy and safe AI technology
Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
Care about the societal impacts and long-term implications of your work and are results oriented

Strong candidates may also:

Have experience fine-tuning large language models with supervised learning or reinforcement learning
Have experience with machine learning frameworks like Scikit-Learn, Tensorflow, or Pytorch
Have experience authoring research papers in machine learning, NLP, or AI alignment or similar industry experience
Have developed evaluations for language models

Apply for this job

Receive alerts for other Research Engineer, Trust & Safety job openings

Job openings at Anthropic

IT Systems Engineer

Anthropic

New York, NY Full Time

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be saf...

Application Security Engineer

Anthropic

San Francisco, CA Full Time

About the role: Anthropic is working on frontier AI research that has the potential to transform how humans and machines...

Head of Strategic Finance, Compute

Anthropic

San Francisco, CA Full Time

About the role: We are seeking a Head of Strategic Finance for Compute at Anthropic. Compute is a critical ingredient in...

Enterprise Account Executive (API Sales)

Anthropic

San Francisco, CA Full Time

About the role As an Enterprise Account Executive at Anthropic, you’ll drive adoption of safe, frontier AI by securing s...

Not the job you're looking for? Here are some other Research Engineer, Trust & Safety jobs in the San Francisco, CA area that may be a better fit.

Research Engineer, Trust & Safety

Menlo Ventures Management, L.P, San Francisco, CA

Research Engineer, Trust & Safety

What are the responsibilities and job description for the Research Engineer, Trust & Safety position at Anthropic?

About the role

You may be a good fit if you:

Strong candidates may also:

What is the career path for a Research Engineer, Trust & Safety?

Job openings at Anthropic

Not the job you're looking for? Here are some other Research Engineer, Trust & Safety jobs in the San Francisco, CA area that may be a better fit.

We don't have any other Research Engineer, Trust & Safety jobs in the San Francisco, CA area right now.

AI Assistant is available now!