Reading Group

The Reading Group meets bi-weekly, usually Thursdays at 19:45 UTC. To join, add “soeren.elverlin” on Skype.

Usually, we start with small-talk and a presentation round, then the host gives a summary of the paper for roughly 20 minutes. The summary of the article is uploaded on the Youtube Channel. This is followed by discussion (both on the article and in general) and finally we decide on a paper to read the following week.

Join us by Skype, by adding ‘soeren.elverlin’. Also check out our Facebook Group.


Eliciting Latent Knowledge Paul Christiano 2022-04-21
Democratising Risk: In Search of a Methodology to Study Existential Risk 3/3 Carla Zoe Cremer et al. 2022-03-31
Democratising Risk: In Search of a Methodology to Study Existential Risk 2/3 Carla Zoe Cremer et al. 2022-03-17
Democratising Risk: In Search of a Methodology to Study Existential Risk 1/3 Carla Zoe Cremer et al. 2022-03-03
A General Language Assistant as a Laboratory for Alignment Jared Kaplan et al. 2022-02-17
Digital People Would Be An Even Bigger Deal Holden Karnofsky 2022-02-03
Finetuned Language Models are Zero-Shot Learners Jason Wei et al. 2022-01-13
Treacherous Turns from Deep Learning Søren Elverlin 2021-12-30
Soares, Tallinn, and Yudkowsky discuss AGI cognition Eliezer Yudkowsky et al. 2021-12-16
Beyond fire alarms: freeing the groupstruck Katja Grace 2021-12-02
Distinguishing AI takeover scenarios Sam Clarke and Sammy Martin 2021-11-18
A Theoretical Computer Science Perspective on Consciousness Manuel and Lenore Blum. 2021-10-28
Recursively Summarizing Books with Human Feedback Paul Christiano et al. 2021-10-14
“A brief review of the reasons multi-objective RL could be important in AI Safety Research Ben Smith et al. 2021-09-30 Søren Elverlin 2021-09-16
Learning to summarize from human feedback Paul Christiano et al. 2021-09-01
What does GPT-3 understand? Symbol grounding and Chinese rooms Stuart Armstrong 2021-08-19
MIRI comments on Cotra’s ‘Case for Aligning Narrowly Superhuman Models Eliezer Yudkowsky and Evan Hubinger 2021-08-05
The case for aligning narrowly superhuman models Ajeya Cotra 2021-07-22
Another (outer) alignment failure story Paul Christiano 021-07-08
AI Risk Skepticism 2/2 Roman Yampolskiy 2021-06-24
Is AI Safety a Progressive Science? John Fox 2021-06-17
AI Risk Skepticism 1/2 Roman Yampolskiy 2021-06-10
Intelligence and Unambitiousness Using Algorithmic Information Theory Michael Cohen 2021-05-27
Conversation with Ernie Davis Robert Long 2021-05-20
Conversation with Rohin Shah Asya Bergal et al. 2021-05-06
Draft Report on AI Timelines Ajeya Cotra 2021-04-22
Metaethical.AI June Ku 2021-04-08
Misconceptions about continuous takeoff Matthew Barnett 2021-03-25
Extrapolating GPT-N performance Lukas Finnveden 2021-03-04
Eight Claims about Multi-AGI safety Richard Ngo 2021-02-11
Functionally Effective Conscious AI Without Suffering A. Agarwal and S. Edelman 2021-02-04
Consequences of Misaligned AI Simon Zhuang and Dylan Hadfield-Menell 2021-01-28
AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy Tan Zhi Xuan 2021-01-21
An AGI Modifying Its Utility Function in Violation of the Strong Orthogonality Thesis James D. Miller et al. 2021-01-07
On Classic Arguments for AI Discontinuities Ben Garfinkel 2020-12-17
Unpacking Classic AI Risk Arguments Ben Garfinkel 2020-11-26
The Human Condition Hannah Arendt 03-12-2020
Unpacking Classic AI Risk Arguments Ben Garfinkel 26-11-2020
Sharing the world with digital minds 2/2 Carl Shulman and Nick Bostrom 19-11-2020
Sharing the world with digital minds 1/2 Carl Shulman and Nick Bostrom 12-11-2020
On Scaling Laws Jared Kaplan 05-11-2020
An Empirical Model of Large Batch Training Jared Kaplan et al. 30-10-2020
Universal Intelligence Shane Legg and Marcus Hutter 22-10-2020
Roadmap to a Roadmap Matthijs Maas et al. 15-10-2020
On GPT-3 Gwern Branwen 08-10-2020
Corrigibility Ali 01-10-2020
Language Models are Few Shot Learners 2/2 Tom B. Brown et al. 24-09-2020
How close are we to creating Artificial General Intelligence? David Deutsch 17-09-2020
Language Models are Few Shot Learners 1/2 Tom B. Brown et al. 10-09-2020
Scrutinizing Classic AI Risk Arguments 2/2 Ben Garfinkel 27-08-2020
Scrutinizing classical AI risk arguments 1/2 Ben Garfinkel 13-08-2020
‘Indifference’ methods for managing agent rewards Stuart Armstrong and Xavier O’Rourke 06-08-2020
AI Research Considerations for Human Existential Safety (ARCHES) Andrew Critch and David Krueger 30-07-2020
Risks from learned optimization Evan Hubinger et al. 23-07-2020
Problem of fully updated deference Eliezer Yudkowsky 16-07-2020
Pessimism About Unknown Unknowns Inspires Conservatism Michael K. Cohen and Marcus Hutter 09-07-2020
Steven Pinker on the Possible Existential Threat of AI Steven Pinker 02-07-2020
The Off-Switch Game Dylan Hadfield-Menell et al. 25-06-2020
Formal Metaethics and Metasemantics for AI Alignment June Ku 18-06-2020
Discussion: If I were a well-intentioned AI Stuart Armstrong, Scott Garrabrant 10-06-2020
Measuring the Algorithmic Efficiency of Neural Networks Danny Hernandez et al. 28-05-2020
If I were a well-intentioned AI 3+4/4 Stuart Armstrong 21-05-2020
Conversation with Adam Gleave Adam Gleave 15-05-2020
If I were a well-intentioned AI 2/4 Stuart Armstrong 07-05-2020
The Offence-Defence Balance of Scientific Knowledge Toby Shevlane et al. 01-05-2020
Conversation with Paul Christiano Paul Christiano et al. 22-04-2020
If I were a well-intentioned AI 1/4 Stuart Armstrong 15-04-2020
The Role of Cooperation in Responsible AI Development Gillian Hadfield et al. 08-04-2020
Q & A with Stuart Russell Stuart Russell 08-01-2020
Raging robots, hapless humans: the AI dystopia David Leslie 17-12-2019
Human Compatible (9-10) Stuart Russell 11-12-2019
Human Compatible (7-8) Stuart Russell 05-12-2019
Human Compatible (1-6) Stuart Russell 28-11-2019
Why AI Doomsayers are like Sceptical Theists John Danaher 20-11-2019
Policy Desiderata for Superintelligent AI 2/2 Nick Bostrom 12-11-2019
Policy Desiderata for Superintelligent AI 1/2 Nick Bostrom 06-11-2019
AI safety via debate 2/2 Paul Christiano et al. 30-10-2019
AI safety via debate 1/2 Paul Christiano et al. 22-10-2019
AI Insights Dataset Analysis Colleen McKenzie et al. 15-10-2019
A Tutorial on Machine Learning and Data Science Tools Andreas Holzinger 09-10-2019
Superintelligence Skepticism as a Political Tool Seth Baum 02-10-2019
Computing Machinery and Intelligence 2 A. M. Turing 26-09-2019
Computing Machinery and Intelligence 1 A. M. Turing 17-09-2019
A shift in arguments for AI Risk 2 Tom Sittler 11-09-2019
A shift in arguments for AI Risk 1 Tom Sittler 04-09-2019
TAISU report and retrospective Søren Elverlin 28-08-2019
Jeff Hawkins on neuromorphic AGI within 20 years Steve Byrne 20-08-2019
Stuart Armstrong presents “Synthesising…” Stuart Armstrong 14-08-2019
Synthesising a human’s preferences into a utility function 4/4 Stuart Armstrong 08-08-2019
Synthesising a human’s preferences into a utility function 3/4 Stuart Armstrong 01-08-2019
Synthesising a human’s preferences into a utility function 2/4 Stuart Armstrong 24-07-2019
Synthesising a human’s preferences into a utility function 1/4 Stuart Armstrong 17-07-2019
Reframing Superintelligence Q&A Eric Drexler 09-07-2019
Reframing Superintelligence 3 Eric Drexler 03-07-2019
Reframing Superintelligence 2 Eric Drexler 27-06-2019
Reframing Superintelligence 1 Eric Drexler 12-06-2019
Ethics Guidelines for Trustworthy AI Pekka Ala-Pietilä et al 06-06-2019
Likelihood of discontinuous progress around the development of AGI 2 Katja Grace 29-05-2019
Likelihood of discontinuous progress around the development of AGI 1 Katja Grace 23-05-2019
Value Learning Q/A Rohin Shah 15-05-2019
Value Learning Comments Rohin Shah et al. 08-05-2019
Value Learning 6/6 Rohin Shah 30-04-2019
Value Learning 5/6 Rohin Shah 25-04-2019
Value Learning 4/6 Rohin Shah 10-04-2019
Value Learning 3/6 Rohin Shah 02-04-2019
Thoughts on Human Models Ramana Kumar and Scott Garrabrantrant 26-03-2019
Value Learning 2/6 Rohin Shah 19-03-2019
Value Learning 1/6 Rohin Shah 12-03-2019
Critique of Superintelligence (2/2) Fods12 05-03-2019
How Viable is Arms Control For Military AI? (2/2) Matthijs Maas 26-02-2019
How Viable is Arms Control For Military AI? (1/2) Matthijs Maas 19-02-2019
Superintelligence: Paths, Dangers, Strategies Fods12 12-02-2019
Critique of Superintelligence (1/2) Fods12 07-02-201
Embedded Agency Q & A Scott Garrabrant 30-01-2019
Embedded Agency (4/4) Abram Demski and Scott Garrabrant 24-01-2019
Embedded Agency (3/4) Abram Demski and Scott Garrabrant 16-01-2019
Embedded Agency (2/4) Abram Demski and Scott Garrabrant 10-01-201
Embedded Agency (1/4) Abram Demski and Scott Garrabrant 02-01-201
The Vulnerable World Hypothesis 2/2 Nick Bostrom 12-12-2018
The Vulnerable World Hypothesis 1/2 Nick Bostrom 04-12-2018
Foom Justifies AI Risk Efforts Now Robin Hanson 28-11-2018
Why Altruists Should Perhaps Not Prioritize AI 2/2 Magnus Vinding 20-11-2018
Building Safer AGI by introducing Artificial Stupidity Roman Yampolskiy et al. 15-11-2018
Why Altruists Should Perhaps Not Prioritize AI Magnus Vinding 07-11-2018
Are we Approaching an Economic Singularity? (2/2) William D. Nordhaus 30-10-2018
Are we Approaching an Economic Singularity? William D. Nordhaus 23-10-2018
The Rocket Alignment Problem Eliezer Yudkowsky 17-10-2018
Technology Roulette Richard Danzig 10-10-2018
Towards a new Impact Measure Alex Turner 03-10-2018
Incomplete Contracting and AI Alignment Dylan Hadfield-Menell et al. 19-09-2018
Open Ended Intelligence David Weinbaum et al. 12-09-2018
Strategic Implications of Openness in AI Development Nick Bostrom 05-09-2018
A Survey of Artificial General Intelligence Projects Seth Baum 29-08-2018
MIRI’s Strategic Background Malo Bourgon 22-08-2018
The Malicious use of AI Miles Brundage et al. 15-08-2018
The Learning-Theoretic AI Alignment Research Agenda Vadim Kosoy 09-08-2018
No Basic AI Drives and A Rebuttal to Omohundro’s ‘Basic A.I. Drives’ Alexander Kruel and Scott Jackish 01-08-2018
The Basic AI Drives Stephen Omohundro 24-07-2018
AI and compute / Interpreting AI Compute trends Amodei et al., Ryan Carey 18-07-2018
Learning which reward to maximise Stuart Armstrong et al. 11-07-2018
AlphaGo Zero and the Foom Debate Eliezer Yudkowsky 04-07-2018
The Hanson-Yudkowsky AI-Foom Debate (2/2) Kaj Sotala 28-06-2018
The Hanson-Yudkowsky AI-Foom Debate (1/2) Kaj Sotala 20-06-2018
Taking AI Risk Seriously Andrew Critch 14-06-2018
Current thoughts on Paul Christano’s research agenda Jessica Taylor 06-06-2018
Challenges to Christiano’s capability amplification proposal Eliezer Yudkowsky 30-05-2018
Long-term strategies for ending existential risk from fast takeoff Daniel Dewey 24-05-2018
Machines that Think Toby Walsh 16-05-2018
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver et al. 10-05-2018
Iterated Distillation and Amplification Ajeya Cotra 02-05-2018
Deciphering China’s AI Dream Jeffrey Ding 24-04-2018
Why the Singularity is not a Singularity Edward Felten 18-04-2018
The Ethics of Artificial Intelligence Yudkowsky and Bostrom 22-03-2018
An Untrollable Mathematician Abram Demski 14-03-2018
Takeoff Speeds Paul Christiano 07-03-2018
We’re told to fear robots. But why do we think they’ll turn on us? Steven Pinker 01-03-2018
Cognitive Biases Potentially Affecting Judgment of Global Risks Eliezer Yudkowsky 21-02-2018
Goodhart Taxonomy Scott Garrabrant 13-02-2018
An AI Race for Strategic Advantage: Rhetoric and Risks Seán S ÓhÉigeartaigh et al. 07-02-2018
Reply to Bostrom’s arguments for a hard takeoff Brian Tomasik 31-01-2018
Superintelligence as a Cause or Cure for Risks of Astronomical Suffering Kaj Sotala et al. 24-01-2018
Impossibility of deducing preferences and rationality from human policy Stuart Armstrong et al. 17-01-2018
On the Promotion of Safe and Socially Beneficial Artificial Intelligence Seth Baum 09-01-2018
Refuting Bostrom’s Superintelligence Argument Sebastian Benthall 03-01-2018
Logical Induction (1+7) Scott Garrabrant et al. 27-12-2017
Conceptual Confusions in Assessing AGI Chris Cooper 20-12-2017
Disjunctive Scenarios of Catastrophic AI Risk (2/2) Kaj Sotala 06-12-2017
Disjunctive Scenarios of Catastrophic AI Risk (1/2) Kaj Sotala 01-12-2017
Artificial Intelligence in Life Extension: from Deep Learning to Superintelligence Alexey Turchin 22-11-2017
Good and safe uses of AI Oracles Stuart Armstrong 08-11-2017
Positively shaping the development of artificial intelligence Robert Wilbin 01-11-2017
There is no Fire Alarm for Artificial General Intelligence Eliezer Yudkowsky 25-10-2017
Fitting Values to Inconsistent Humans Stuart Armstrong 18-10-2017
Age of Em (Intelligence Explosion) Robin Hanson 11-10-2017
Age of Em (Chapter 27) Robin Hanson 27-09-2017
Meditations on Moloch Scott Alexander 20-09-2017
Incorrigibility in the CIRL Framework Ryan Carey 13-09-2017
OpenAI Makes Humanity Less Safe Ben Hoffman 06-09-2017
Open Problems Regarding Counterfactuals: An Introduction For Beginners Alex Appel 30-08-2017
A Game-Theoretic Analysis of the Off-Switch Game Tobias Wängberg et al. 23-08-2017
Benevolent Artificial Anti-Natalism Thomas Metzinger 16-08-2017
Where the Falling Einstein Meets the Rising Mouse Scott Alexander 09-08-2017
Superintelligence Risk Project Jeff Kaufman 03-08-2017
Staring into the Singularity Eliezer Yudkowsky 26-07-2017
Artificial Intelligence and the Future of Defense Matthijs Maas et al. 19-07-2017
Prosaic AI Alignment Paul Christiano 12-07-2017
A model of the Machine Intelligence Research Institute Sindy Li 05-07-2017
Deep Reinforcement Learning from Human Preferences Paul Christiano et al. 28-06-2017
–Holiday– 21-06-2017
The Singularity: A Philosophical Analysis (2/2) David J. Chalmers 14-06-2017
The Singularity: A Philosophical Analysis (1/2) David J. Chalmers 07-06-2017
Why Tool AIs want to be Agent AIs Gwern Branwen 31-05-2017
A Map: AGI Failure Modes and Levels Alexey Turchin 24-05-2017
Neuralink and the Brain’s Magical Future Tim Urban 17-05-2017
The Myth of Superhuman AI Kevin Kelly 10-05-2017
Merging our brains with machines won’t stop the rise of the robots Michael Milford 03-05-2017
Building Safe AI Andrew Trask 26-04-2017
AGI Safety Solutions Map Alexey Turchin 19-04-2017
Strong AI Isn’t Here Yet Sarah Constantin 12-04-2017
Robotics: Ethics of artificial intelligence Stuart Russell et al. 05-04-2017
Using machine learning to address AI risk Jessica Taylor 29-03-2017
Racing to the Precipice: a Model of Artificial Intelligence Development Armstrong et al. 22-03-2017
Politics is Upstream of AI Raymond Brannen 15-03-2017
Coherent Extrapolated Volition Eliezer Yudkowsky 08-03-2017
–Cancelled due to illness– 01-03-2017
Towards Interactive Inverse Reinforcement Learning Armstrong, Leike 22-02-2017
Notes from the Asilomar Conference on Beneficial AI Scott Alexander 15-02-2017
My current take on the Paul-MIRI disagreement on alignability of messy AI Jessica Taylor 08-02-2017
How feasible is the rapid development of Artificial Superintelligence? Kaj Sotala 01-02-2017
Response to Cegłowski on superintelligence Matthew Graves 25-01-2017
Disjunctive AI scenarios: Individual or collective takeoff? Kaj Sotala 18-01-2017
Policy Desiderata in the Development of Machine Superintelligence Nick Bostrom 11-01-2017
Concrete Problems in AI Safety Dario Amodei et al. 04-01-2017
–Holiday– 28-12-2016
A Wager on the Turing Test: Why I Think I Will Win Ray Kurzweil 21-12-2016
Responses to Catastrophic AGI Risk: A Survey Sotala, Yampolskiy 14-12-2016
Discussion of ‘Superintelligence: Paths, Dangers, Strategies’ Neil Lawrence 07-12-2016
Davis on AI capability and motivation Rob Bensinger 30-11-2016
Ethical guidelines for a Superintelligence Ernest Davis 22-11-2016
Superintelligence: Chapter 15 Nick Bostrom 15-11-2016
Superintelligence: Chapter 14 Nick Bostrom 09-11-2016
Superintelligence: Chapter 11 Nick Bostrom 01-11-2016
Superintelligence: Chapter 9 (2/2) Nick Bostrom 25-10-2016
Superintelligence: Chapter 9 (1/2) Nick Bostrom 18-10-2016
Superintelligence: Chapter 8 Nick Bostrom 11-10-2016
Superintelligence: Chapter 7 Nick Bostrom 04-10-2016
Superintelligence: Chapter 6 Nick Bostrom 27-09-2016
Superintelligence: Chapter 5 Nick Bostrom 20-09-2016
Taxonomy of Pathways to Dangerous Artificial Intelligence Roman V. Yampolskiy 13-09-2016
Unethical Research: How to Create a Malevolent Artificial Intelligence Roman V. Yampolskiy 06-09-2016
Superintelligence: Chapter 4 Nick Bostrom 30-08-2016
Superintelligence: Chapter 3 Nick Bostrom 23-08-2016
Superintelligence: Chapter 1+2 Nick Bostrom 16-08-2016
Why I am skeptical of risks from AI Alexander Kruel 09-08-2016
–Break due to family extension– 02-08-2016
–Break due to family extension– 26-07-2016
Intelligence Explosion FAQ Luke Muehlhauser 19-07-2016
A toy model of the treacherous turn Stuart Armstrong 12-07-2016
The Fable of the Dragon Tyrant Nick Bostrom 05-07-2016
The Fun Theory Sequence Eliezer Yudkowsky 28-06-2016
Intelligence Explosion Microeconomics Eliezer Yudkowsky 21-06-2016
Strategic Implications of Openness in AI Development Nick Bostrom 14-06-2016
That Alien Message Eliezer Yudkowsky 07-06-2016
The Value Learning Problem Nate Soares 31-05-2016
Decisive Strategic Advantage without a Hard Takeoff Kaj Sotala 24-05-2016