Demo

Model Behavior Architect, Alignment Finetuning

The Rundown AI, Inc.
San Francisco, CA Full Time
POSTED ON 2/19/2025
AVAILABLE BEFORE 5/16/2025

About the Role :

As a Model Behavior Architect at Anthropic, you'll be at the forefront of shaping AI system behavior to ensure it aligns with human values. Working within the Alignment Finetuning team, you'll combine your expertise in model evaluation, prompt engineering, and ethical judgment and knowledge to help create AI systems that respond with good judgment across diverse scenarios.

Responsibilities :

  • Interact with models to carefully identify where model behavior and judgment can be improved
  • Gather internal and external feedback on model behavior to document areas for improvement
  • Design and implement subtle prompting strategies and data generation pipelines that improve model responses
  • Identify and fix edge case behaviors through rigorous testing of your data generation pipelines
  • Develop evaluations of language model behaviors across judgment-based domains like honesty, character, and ethics
  • Work collaboratively with researchers on related teams like Trust and Safety, Alignment Science, and Applied Finetuning

You May Be a Good Fit If You :

  • Have extensive experience with prompt engineering and chaining for language models
  • Demonstrate strong skills in evaluating AI system outputs on subtle or fuzzy tasks
  • Have a background in philosophy, psychology, data science, or related fields
  • Care about AI safety and the ethical implications of both current and future AI behaviors
  • Are comfortable using basic Python and running basic scripts
  • Have a keen eye for identifying subtle issues in AI outputs
  • Understand how LLMs are trained and are familiar with concepts in reinforcement learning
  • Have experience finetuning large language models
  • Are happy to engage in test-driven development and to carefully analyze data and data pipelines
  • Strong Candidates May Also Have :

  • Formal training in ethics or moral philosophy or moral psychology
  • Experience in data science with emphasis on data verification
  • Conceptual understanding of language model training and finetuning techniques
  • Previous experience developing evaluation frameworks for large language models
  • Background in AI safety research or similar fields
  • Experience with RLHF, constitutional AI, or other alignment techniques
  • Published work related to AI ethics or safety
  • Knowledge of model behavior benchmarking
  • Join us in our mission to ensure advanced AI systems behave reliably and ethically while staying aligned with human values.

    J-18808-Ljbffr

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Model Behavior Architect, Alignment Finetuning?

    Sign up to receive alerts about other jobs on the Model Behavior Architect, Alignment Finetuning career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $63,912 - $88,987
    Income Estimation: 
    $78,601 - $108,479
    Income Estimation: 
    $36,885 - $46,221
    Income Estimation: 
    $79,078 - $104,694
    Income Estimation: 
    $55,611 - $73,900
    Income Estimation: 
    $65,218 - $79,682
    Income Estimation: 
    $65,218 - $79,682
    Income Estimation: 
    $79,078 - $104,694
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at The Rundown AI, Inc.

    The Rundown AI, Inc.
    Hired Organization Address Chicago, IL Full Time
    About Writer Writer is the full-stack generative AI platform delivering transformative ROI for the world’s leading enter...
    The Rundown AI, Inc.
    Hired Organization Address San Francisco, CA Full Time
    We’re seeking a highly experienced and innovative Senior / Staff UX / Front-end Engineer to join our team. In this role,...
    The Rundown AI, Inc.
    Hired Organization Address San Francisco, CA Full Time
    RoleWe seek an outgoing individual passionate about Machine Learning to join our team as an ML Solutions Engineer. In th...
    The Rundown AI, Inc.
    Hired Organization Address San Francisco, CA Full Time
    About the Team : OpenAI, in close collaboration with our capital partners, is embarking on a journey to build the world’...

    Not the job you're looking for? Here are some other Model Behavior Architect, Alignment Finetuning jobs in the San Francisco, CA area that may be a better fit.

    Model Behavior Architect, Alignment Finetuning

    Anthropic, San Francisco, CA

    Model Behavior Architect, Alignment Finetuning

    Menlo Ventures, San Francisco, CA

    AI Assistant is available now!

    Feel free to start your new journey!