What are the responsibilities and job description for the Research Scientist/Engineer, Alignment Finetuning position at Anthropic?
About the role:
As a Research Scientist/Engineer on the Alignment Finetuning team at Anthropic, you'll lead the development and implementation of techniques aimed at training language models that are more aligned with human values: that demonstrate better moral reasoning, improved honesty, and good character. You'll work to develop novel finetuning techniques and to use these to demonstrably improve model behavior.
Responsibilities:
- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
- Use these to train models to have better alignment properties including honesty, character, and harmlessness
- Create and maintain evaluation frameworks to measure alignment properties in models
- Collaborate across teams to integrate alignment improvements into production models
- Develop processes to help automate and scale the work of the team
You may be a good fit if you:
- Have an MS/PhD in Computer Science, ML, or related field, or equivalent experience
- Possess strong programming skills, especially in Python
- Have experience with ML model training and experimentation
- Have a track record of implementing ML research
- Demonstrate strong analytical skills for interpreting experimental results
- Have experience with ML metrics and evaluation frameworks
- Excel at turning research ideas into working code
- Can identify and resolve practical implementation challenges
Strong candidates may also have:
- Experience with language model finetuning
- Background in AI alignment research
- Published work in ML or alignment
- Experience with synthetic data generation
- Familiarity with techniques like RLHF, constitutional AI, and reward modeling
- Track record of designing and implementing novel training approaches
- Experience with model behavior evaluation and improvement