Superalignment Fast Grants

With these grants, we are particularly interested in funding the following research directions:

Weak-to-strong generalization: Humans will be weak supervisors relative to superhuman models. Can we understand and control how strong models generalize from weak supervision?
Interpretability: How can we understand model internals? And can we use this to e.g. build an AI lie detector?
Scalable oversight: How can we use AI systems to assist humans in evaluating the outputs of other AI systems on complex tasks?
Many other research directions, including but not limited to: honesty, chain-of-thought faithfulness, adversarial robustness, evals and testbeds, and more.

For more on the research directions, FAQs, and other details, see our Superalignment Fast Grants page.

Funder: OpenAI

Deadline: February 18, 2024

Contact Info