What are the responsibilities and job description for the Model Behavior Architect, Alignment Finetuning position at Anthropic?
About the Role:
As a Model Behavior Architect at Anthropic, you'll be at the forefront of shaping AI system behavior to ensure it aligns with human values. Working within the Alignment Finetuning team, you'll combine your expertise in model evaluation, prompt engineering, and ethical judgment and knowledge to help create AI systems that respond with good judgment across diverse scenarios.
Responsibilities:
- Interact with models to carefully identify where model behavior and judgment can be improved
- Gather internal and external feedback on model behavior to document areas for improvement
- Design and implement subtle prompting strategies and data generation pipelines that improve model responses
- Identify and fix edge case behaviors through rigorous testing of your data generation pipelines
- Develop evaluations of language model behaviors across judgment-based domains like honesty, character, and ethics
- Work collaboratively with researchers on related teams like Trust and Safety, Alignment Science, and Applied Finetuning
You May Be a Good Fit If You:
- Have extensive experience with prompt engineering and chaining for language models
- Demonstrate strong skills in evaluating AI system outputs on subtle or fuzzy tasks
- Have a background in philosophy, psychology, data science, or related fields
- Care about AI safety and the ethical implications of both current and future AI behaviors
- Are comfortable using basic Python and running basic scripts
- Have a keen eye for identifying subtle issues in AI outputs
- Understand how LLMs are trained and are familiar with concepts in reinforcement learning
- Have experience finetuning large language models
- Are happy to engage in test-driven development and to carefully analyze data and data pipelines
Strong Candidates May Also Have:
- Formal training in ethics or moral philosophy or moral psychology
- Experience in data science with emphasis on data verification
- Conceptual understanding of language model training and finetuning techniques
- Previous experience developing evaluation frameworks for large language models
- Background in AI safety research or similar fields
- Experience with RLHF, constitutional AI, or other alignment techniques
- Published work related to AI ethics or safety
- Knowledge of model behavior benchmarking
Join us in our mission to ensure advanced AI systems behave reliably and ethically while staying aligned with human values.