Week 5 - Open Problems and Careers in AI Safety

You will finish the course with a week that prepares you for your next step in the field of AI safety: it gives an overview of the open problems, opportunities and career paths within the field. Please note that going through the core readings of this week in full would take you a lot longer than two hours. You are not expected to do that - the readings have been deliberately chosen in such a way that many sections can be skipped while still getting full value out of the rest of the sections. You are encouraged to read the sections that either seem the most relevant to you - for example, the open problems in a subfield that has caught your attention during the course -, or sections that you expect to broaden your perspective about work in the field of AI safety.

Core readings:

  1. Foundational Challenges in Assuring Alignment and Safety of Large Language Models (Anwar et al., 2024)

    • This reading gives a comprehensive overview of unsolved safety problems for LLMs.

  2. Unsolved Problems in ML Safety (Hendrycks et al., 2021)

    • This reading gives a significantly more broad overview of unsolved problems in AI safety, treating those problems from a model-agnostic viewpoint.

  3. (My understanding of) What Everyone in Technical Alignment is Doing and Why (Larsen, 2022)

    • This blog post provides a comprehensive overview of organisations and research agendas in the field of AI safety as of 2022.

  4. Alignment Careers Guide (Rogers-Smith, 2022)

    • This article is long, but full of action-guiding advice which might help you to narrow in on what skills you might want to build, or what sort of long-term path in technical alignment you might want to pursue.

  5. Careers in Alignment (Ngo, 2022)

    • Ngo compiles a number of resources for thinking about careers in alignment research. Use this resource to get a sense of the career types that exist in technical alignment research, and to consider which paths suit and excite you.

  6. AI Safety Technical Research - Career Review (Hilton, 2023)

    • This article provides an in-depth review of the AI safety technical researcher career path. It is authored by 80,000 Hours, which is a nonprofit organisation that provides research and guidance to help individuals make high-impact career choices. The article discusses what the career path is like and what difficulties it involves, and also provides practical advice about topics such as how to upskill and whether to do a PhD to enter the field.

Further readings:

  1. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback (Casper et al., 2023)

  2. 200 Concrete Open Problems in Mechanistic Interpretability (Nanda, 2022) (skip to the last section and follow the links that seem the most interesting to you)

  3. Concrete Problems in AI Safety (Amodei et al., 2016)

  4. Levelling up in AI safety Research Engineering (Mukobi, 2022)

    • A helpful guide laying out some suggested steps for gaining skills towards an eventual role as a machine learning research engineer. These are highly applicable to many roles at alignment organisations.

  5. Resources that (I think) new alignment researchers should know about (Wasil, 2023)

Podcast spotlight:

For a discussion of careers in the field of AI safety and ways of entering the field, listen to the 80,000 Hours podcast episode with Daniel Ziegler and Catherine Olsson. Ziegler is a technical researcher at Redwood Research, a non-profit alignment research organisation, while Olsson is a research engineer in the Anthropic mechanistic interpretability team. You can also listen to the 80,000 Hours podcast episode with Jan Leike on how to become an AI alignment researcher.

Project:

From a list of subjects in AI Safety (based on what the supervisors are interested in) write either a Research Proposal with or without a Proof of Concept. It is supposed to be a one pager, exclusive references. Use at least 5 references, keep in mind of origin of papers arXiv is discouraged.

You get 2 weeks for this and if you want early feedback/guidance you can reach out to us (for the feedback you can send it once before the actual deadline).

Before the deadline you send it in, we review it, and you get 3 days to fix it and then we send it to the supervisors if they are interested. If interested we go further with this project in Q4.

Topics:

  • Anything related to the safety of recursively self-improving AI models

  • Anything related to representation engineering and steering methods

  • Anything related to recurrent neural networks or hybrid transformer-recurrent models

  • Building 'Physically Grounded Autonomy', integrating energy-efficient hardware with autotelic agents that can generate their own goals to create AI that is as versatile and sustainable as biological intelligence

  • Security/privacy of LLMs and RAG systems

  • Privacy-preserving (fully) decentralized machine learning

  • Security and privacy of neuromorphic systems

  • AI-assisted software/system security testing, vulnerability discovery and patching

  • Autonomous cyberdefense