What are the responsibilities and job description for the Member of Technical Staff - Assessment Team position at Future House USA?
About FutureHouse
FutureHouse is a philanthropically-funded moonshot focused on building an AI Scientist. Our 10-year mission is to build semi-autonomous AIs that can scale scientific research, to accelerate the pace of discovery and to provide world-wide access to cutting-edge scientific, medical, and engineering expertise. At Future House, we're not just envisioning the future; we're building it.
The Assessment Team will be responsible for establishing FutureHouse as the world leader for evaluating the scientific abilities of AI systems, particularly for biology research. The goal of the Assessment Team will be to monitor the capabilities of the AI systems we are building and to tell us how far we are on the path to an AI Scientist. This work is particularly important because having robust methods to evaluate performance is essential to make progress, for example. It is also important because these methods are how we will evaluate and develop mitigations for the risks associated with autonomous or semi-autonomous AI Scientists.
Many aspects of scientific reasoning (like inference, hypothesis generation, etc.) are extremely difficult to assess. State-of-the-art benchmarks today are mostly question-answering benchmarks, which are usually tests of knowledge, rather than reasoning. To get at the core cognitive components that make humans good at science, the Assessment Team will need to develop fundamentally new methods. We are well-positioned to do this because, unlike most AI organizations, we have real, practicing biology expertise in-house, and a wet lab for assessment methods that require lab validation.
The team will be diverse, consisting both of biology researchers with hands-on, practical wet-lab experience, and AI researchers with backgrounds in alignment, benchmarks, evals, and assessments. We are particularly excited about candidates who have experience spanning both biology and AI.
This position is full-time in-person in San Francisco.
Core Responsibilities :
- Conducting foundational research into assessment methods. The Assessment Team will develop new methods for evaluating the scientific reasoning capabilities of AI systems, such as generating hypotheses and interpreting data.
- Developing specific benchmarks and assessment procedures. The Assessment Team will develop benchmarks and assessment procedures to measure specific aspects of a model or agent's behavior that are pertinent to biology research.
- Deploying benchmarks and assessment procedures. The Assessment Team will scale up its benchmarks and assessment procedures and apply them both internally to accelerate progress and evaluate risks, and externally to evaluate the systems being built by our partners. The Assessment Team will also make its work publicly available to the greatest extent possible, so that its methods and benchmarks can benefit the entire community.
Position Requirements :
What we offer :
And also the normal HR stuff :
What can you expect from FutureHouse?
Salary : $100,000 - $500,000