With these grants, we are particularly interested in funding the following research directions:
- Weak-to-strong generalization: Humans will be weak supervisors relative to superhuman models. Can we understand and control how strong models generalize from weak supervision?
- Interpretability: How can we understand model internals? And can we use this to e.g. build an AI lie detector?
- Scalable oversight: How can we use AI systems to assist humans in evaluating the outputs of other AI systems on complex tasks?
- Many other research directions, including but not limited to: honesty, chain-of-thought faithfulness, adversarial robustness, evals and testbeds, and more.
For more on the research directions, FAQs, and other details, see our Superalignment Fast Grants page.
Funder: OpenAI
URL: https://openai.com/blog/superalignment-fast-grants
Deadline: February 18, 2024