Superalignment Fast Grants

With these grants, we are particularly interested in funding the following research directions:

  • Weak-to-strong generalization: Humans will be weak supervisors relative to superhuman models. Can we understand and control how strong models generalize from weak supervision
  • Interpretability: How can we understand model internals? And can we use this to e.g. build an AI lie detector?
  • Scalable oversight: How can we use AI systems to assist humans in evaluating the outputs of other AI systems on complex tasks?
  • Many other research directions, including but not limited to: honesty, chain-of-thought faithfulness, adversarial robustness, evals and testbeds, and more.

For more on the research directions, FAQs, and other details, see our Superalignment Fast Grants page.

Funder: OpenAI

URL: https://openai.com/blog/superalignment-fast-grants

Deadline: February 18, 2024