Demo

Model Behavior Architect, Alignment Finetuning

Anthropic
San Francisco, CA Full Time
POSTED ON 2/17/2025
AVAILABLE BEFORE 4/17/2025

About the Role: 

As a Model Behavior Architect at Anthropic, you'll be at the forefront of shaping AI system behavior to ensure it aligns with human values. Working within the Alignment Finetuning team, you'll combine your expertise in model evaluation, prompt engineering, and ethical judgment and knowledge to help create AI systems that respond with good judgment across diverse scenarios.

Responsibilities:

  • Interact with models to carefully identify where model behavior and judgment can be improved
  • Gather internal and external feedback on model behavior to document areas for improvement
  • Design and implement subtle prompting strategies and data generation pipelines that improve model responses
  • Identify and fix edge case behaviors through rigorous testing of your data generation pipelines
  • Develop evaluations of language model behaviors across judgment-based domains like honesty, character, and ethics
  • Work collaboratively with researchers on related teams like Trust and Safety, Alignment Science, and Applied Finetuning

You May Be a Good Fit If You:

  • Have extensive experience with prompt engineering and chaining for language models
  • Demonstrate strong skills in evaluating AI system outputs on subtle or fuzzy tasks
  • Have a background in philosophy, psychology, data science, or related fields
  • Care about AI safety and the ethical implications of both current and future AI behaviors
  • Are comfortable using basic Python and running basic scripts
  • Have a keen eye for identifying subtle issues in AI outputs
  • Understand how LLMs are trained and are familiar with concepts in reinforcement learning
  • Have experience finetuning large language models
  • Are happy to engage in test-driven development and to carefully analyze data and data pipelines

Strong Candidates May Also Have:

  • Formal training in ethics or moral philosophy or moral psychology
  • Experience in data science with emphasis on data verification
  • Conceptual understanding of language model training and finetuning techniques
  • Previous experience developing evaluation frameworks for large language models
  • Background in AI safety research or similar fields
  • Experience with RLHF, constitutional AI, or other alignment techniques
  • Published work related to AI ethics or safety
  • Knowledge of model behavior benchmarking

Join us in our mission to ensure advanced AI systems behave reliably and ethically while staying aligned with human values.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Model Behavior Architect, Alignment Finetuning?

Sign up to receive alerts about other jobs on the Model Behavior Architect, Alignment Finetuning career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$63,912 - $88,987
Income Estimation: 
$78,601 - $108,479
Income Estimation: 
$36,885 - $46,221
Income Estimation: 
$79,078 - $104,694
Income Estimation: 
$55,611 - $73,900
Income Estimation: 
$65,218 - $79,682
Income Estimation: 
$65,218 - $79,682
Income Estimation: 
$79,078 - $104,694
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Anthropic

Anthropic
Hired Organization Address New York, NY Full Time
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be saf...
Anthropic
Hired Organization Address San Francisco, CA Full Time
About the Role As a Senior Revenue Operations Analyst at Anthropic, you will play a key role in supporting operational e...
Anthropic
Hired Organization Address San Francisco, CA Full Time
About the role We're seeking a Marketing Operations & Analytics Manager to build and scale our marketing operations infr...
Anthropic
Hired Organization Address San Francisco, CA Full Time
About the role As a Claude for Work Account Executive (Higher Education) at Anthropic, you'll join the foundational team...

Not the job you're looking for? Here are some other Model Behavior Architect, Alignment Finetuning jobs in the San Francisco, CA area that may be a better fit.

Model Behavior Architect, Alignment Finetuning

The Rundown AI, Inc., San Francisco, CA

Model Behavior Architect, Alignment Finetuning

Menlo Ventures, San Francisco, CA

AI Assistant is available now!

Feel free to start your new journey!