On p(Doom)

Advik Lahiri
Jul 2, 2024
18 min read

1) What is P(Doom) and how do we assess it?

2) Is there anything systematic we know that impacts these assessments?

3) How do naive people versus tech professionals make these assessments?

P(Doom) is the probability that artificial intelligence will lead be an existential risk to humanity.

The subject of existential risks of AI has been trending recently, with many senior and well respected figures in the field giving worrying views on it (this will be discussed later on). There are perspectives saying that AI does not pose such a danger. However, as you pointed out in your email, the media seems to be focussing on covering the more extreme views on AI. Naturally, with the media attention that AI is getting, the chance of existential risk is being reported on. These existential risks are what P(Doom) assesses, but P(Doom) is not a very popular term/concept, at least not yet. As of now, outlets prefer the qualitative opinions of researchers rather than quantitative measurements. Still, P(Doom) has been going around mostly in niche communities of machine learning (ML) and AI safety researchers. Thus, through my research, I found most information on P(Doom) on forums (Effective Altruism, LessWrong etc.) where ML and AI researchers post articles and reports and on research centre websites, whose veracities I have done my best to verify. Many of the work, research, and reports on these forums and ML websites builds off other works by other researchers in the same communities, thus the research is largely interlinked. Also, surveys do not have a very large number of participants as a result of how specialised AI safety, ML, and P(Doom) are. Interestingly, when GPT-4 was asked ‘What is P(Doom)?’, it responds by saying. ‘"P(Doom)" is not a widely recognized or standard term in relation to a specific concept or calculation. Could you please provide more context or clarify what you mean by "P(Doom)"?’. When asked again, it says ‘I apologize, but "P(Doom)" is not a term or concept that I am familiar with. It is possible that it may be a term specific to a certain context or domain that is not widely known.’ When P(Doom) is explained, GPT-4 does give a response but it is quite vague. Regardless, the fact that P(Doom) is still a niche term that has not been given a lot of coverage by the media or even through academic work is all the more reason to research into it.

As for how P(Doom) is assessed, each researcher has a different way of looking at AI’s potential, future, and impact on humanity. The extremely varied range of P(Doom) figures simply shows that there are multiple metrics. However, P(Doom) for many seems to be calculated mostly on ‘gut-feelings’ and expert opinion. This too will be discussed further later on. Nonetheless, these differing opinions, predictions, and probabilities are based on differing understandings of the same problems. The problem being the Alignment Problem. AI alignment is a field within AI safety research that concerns steering AI towards humanity’s goals and objectives. AI alignment seeks to ensure that AI advances human objectives, forming a beneficial symbiosis of man and machine. The problem is that AI alignment has not been successful. One aspect to this problem is how would it even take place? How would AI researchers establish some sort of universal failsafe, firewall, or standard across all AGI, that is in keeping with human objectives. The problem that would arise if the previous problem is not solved is, if alignment does not take place before an ‘Artificial General Intelligence’ (AGI) or ‘Higher Level Machine Intelligence (HLMI) is established, then there may be a misaligned AI. A misaligned AI would be an artificial general intelligence that does not have the advancement of humanity’s goals in mind, and rather has independent objectives. These independent objectives could be an existential risk for humanity. A simple logical argument used by researchers for this is that humans have become the most dominant species in the world by virtue of our relative cognitive prowess. If an AGI is established that aims to be superior to human intellect, then an AGI would theoretically and logically become the most dominant species, and thus disempower humanity. This plays into recent events.

Open AI and DeepMind have stated their aim to develop an AGI that would outperform humans in cognitive tasks. In response to this, an open letter calling to ‘Pause Giant AI Experiments’ was issued this year on March 22, with signatures from top AI leaders, experts, and researchers (including Elon Musk). The open letter states, ‘Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable’, hence, ‘we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.’ Some more key statements:

‘As stated in the widely-endorsed Asilomar AI Principles, Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources. Unfortunately, this level of planning and management is not happening, even though recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control.’

‘AI labs and independent experts should use this pause to jointly develop and implement a set of shared safety protocols for advanced AI design and development that are rigorously audited and overseen by independent outside experts.’

‘AI research and development should be refocused on making today's powerful, state-of-the-art systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal.

‘In parallel, AI developers must work with policymakers to dramatically accelerate development of robust AI governance systems.’

‘Having succeeded in creating powerful AI systems, we can now enjoy an "AI summer" in which we reap the rewards, engineer these systems for the clear benefit of all, and give society a chance to adapt’

Clearly, this open letter is meant to be some sort of ‘breather’ for AI safety research to develop a strategy for alignment so that when an AGI is established, it does not misalign. However, it cannot be said whether the big AI developers have ‘paused’. GPT-5 may still be developing. Some say that there are ulterior objectives to the 6 month pause.

Coming back to the alignment problem, this is essentially the basis of P(Doom) figures. With this in mind, P(Doom) is the probability that alignment will not take place in time, leading to a misaligned AGI that poses an existential risk for humanity. This also where opinions start to differ because now everything related to this is a prediction. What is the status of AI alignments and AGI? How far apart is their development? When will an AGI be fully and properly established? Will AI alignment take place? Most researchers and the intellectual cliques they fall into (the moderates, the extremists, and so on) have differing answers to these questions. Now, it is these differing answers that will be explored.

Current AI technologies, though extremely sophisticated and advanced, does not seem to be at the level of an AGI/HLMI. The most prominent piece of AI technology would be GPT-4. This was stated in the GPT-4 development report:

‘Finally, we facilitated a preliminary model evaluation by the Alignment Research Center (ARC) of GPT-4’s ability to carry out actions to autonomously replicate5 and gather resources—a risk that, while speculative, may become possible with sufficiently advanced AI systems—with the conclusion that the current model is probably not yet capable of autonomously doing so.’

According to this (and if we generalise the state of GPT-4 to be representative of other AI technologies), there is no AGI yet. But when will there be?

Reports vary across the internet. Before diving into perspectives, one shall try to create a foundation for understanding and comparison. A forecast for AGI on Metaculus based on ~1670 predictions stated that the ‘first general AI system [will] be devised, tested and publicly announced’ on March 28, 2032. Apart Research, a machine learning research non-profit, took an optimistic arrival date for the alignment of AGI as 2037 and based on sampling the probability mass of the Metaculus forecast, came to this calculation on whether an alignment solution will arrive before an AGI does:

P(solution<AGI)=Mean(Sample(Normal(2037,10))<Sample(Metaculus(weak AGI)))=58.73%

The Apart Research report also provided a mean P(Doom) based on the P(Doom) of 24 AI safety researchers which is 28.73%.

The 24 researcher’s individual P(Doom)s are listed below:

Joseph Carlsmith: 10%
Rohin Shah: P(doom|AGI launched 2030) = 20%, P(doom|AGI by debate) = 30%
Steven Byrnes: 90%
AI Impacts survey: 10% (see report, see data)
Leopold Aschenbrenner: 0.5% <2070
Ben Garfinkel: 0.4% <2070
Daniel Kokotajlo: 80%
Neel Nanda: 9% <2070
Nate Soares: 77% <2070
Christian Tarsney: 3.5% <2070
David Thorstad: 0.00002% <2070
David Wallace: 2% <2070
FeepingCreature: 85%
Dagon: 80%
Anonymous 1 (software engineer at AI research team): 2% <2070
Anonymous 2 (academic computer scientist): 0.001% <2070
JBlack: 90%
Anders Sandberg, Nick Bostrom: 5% human extinction <2100
Buck Shlegeris: 50%
James Fodor: 0.05%
Stuart Armstrong: 5-30%
Jaan Tallinn: 33-50%
Paul Christiano: P(doom from narrow misalignment | no AI safety) = 10%, P( doom from narrow misalignment | 20,000 in AI safety) = 5%
Eliezer Yudkowsky: No numerical estimates but very high
Vanessa Kosoy: 30% success

Here some graphs from the report that visualise its findings:

It should be noted that the findings of this report, though a good starting point, are not universal. The sets used to draw out data are limited, and thus should rather be seen as a perspective.

Another perspective is a survey carried out between HLMI researchers, by Katja Grace, a researcher at the Machine Intelligence Research Institute and the Future of Humanity Institute at the University of Oxford, in 2022. The survey was sent to 4271 ML researchers and received 738 responses. The results of the survey were this:

37 years until a 50% chance of HLMI; this prediction is 8 years earlier than a prediction made in 2016, which was 45 years later. Aggregate predictions were that HLMI would be established in 2061 (2016) and 2059 (2022). This is even more optimistic than the Metaculus estimate.
P(extremely bad outcome)=5%; this was the median response. It was the same in 2016. 48% of respondents gave at least 10% chance of an extremely bad outcome; 25% gave 0% chance.
Explicit P(doom)=5-10%
Support for AI safety research: 69% of respondents believe that society should prioritise AI safety research ‘more’ or ‘much more’ that it is; this is up from 2016’s 49$ of respondents.
There is ‘about an even chance’ that an argument for intelligence explosion is broadly correct.
The median respondent believes that AI will be vastly better than humans at all professions within 30 years of HLMI being established.

Here is the downloadable data (though it is quite dense and hard to get through).

It should also be noted that speculation on when AGI will arrive is not unanimously agreed on. Metaculus and Grace’s survey’s estimates are not the same but are still in the somewhat distant future. However, some think that AGI is ‘practically’ here. Some think that it will come in the next 5 years. This estimate, according to two AI safety researchers (Andrea Miotti and Gabriel Alfour) is based on the fact that AI safety (alignment) has not been solved and that the following current technologies already tread the path of an AGI:

Powerful Agents (Agent57, GATO, Dreamer V3)
Reliably good Multimodal Models (StableDiffusion, Whisper, Clip)
Just about every language task (GPT3, ChatGPT, Bing Chat)
Human and Social Manipulation
Robots (Boston Dynamics, Day Dreamer, VideoDex, RT-1: Robotics Transformer [1])

There is disagreement in most aspects of the future of AI and the components of P(Doom).

Another researcher from the Machine Intelligence Research Institute, Rob Bensinger, also carried out a survey in 2021. It was a two question survey:

1. How likely do you think it is that the overall value of the future will be drastically less than it could have been, as a result of humanity not doing enough technical AI safety research?

2. How likely do you think it is that the overall value of the future will be drastically less than it could have been, as a result of AI systems not doing/optimizing what the people deploying them wanted/intended?

It did not have a very high participant pool compared to Katja Grace’s, with the survey being sent to 117 long term AI-risk researchers and 44 responding. I am not diving very deep into this survey, since it was carried out 2 years ago and may not hold up today with the context of new AI technologies like GPT-4 which are getting people so worried. Still, some of the graphs and mean results are interesting to note (graph is on the following page).

‘Each point is a response to Q1 (on the horizontal axis) and Q2 (on the vertical axis). Circles denote (pure) technical safety researchers, squares (pure) strategy researchers; diamonds marked themselves as both, triangles as neither. In four cases, shapes are superimposed because 2–3 respondents gave the same pair of answers to Q1 and Q2. One respondent (a "both" with no affiliation specified) was left off the chart because they gave interval answers: [0.1, 0.5] and [0.1, 0.9].

Purple represents OpenAI, red FHI, green CHAI or UC Berkeley, orange MIRI, blue Open Philanthropy, and black "no affiliation specified". No respondents marked DeepMind as their affiliation.’

Bensinger separated technical safety researchers (left) and strategy researchers (right) as well:

‘Overall, the mean answer of survey respondents was (~0.3, ~0.4), and the median answer was (0.2, 0.3).’

With all of this data from various sources, it is clear that the risk of AI and P(Doom) has no concrete standing. There are extremely varied opinions among ML and AI experts, showing how uncertain it is as a subject, and thus, how uncertain the future is. Apart Research claims P(Doom) (though limited to those 24 researchers cited) as 28.73%. Grace’s survey shows that P(Doom) is 5-10%. Some people think that this P(Doom) figure is too low, some think that any P(Doom) above 0 is worrying. Bensinger’s survey does not clearly show P(Doom) but is extremely close. In fact it is essentially asking about the risk of AI if alignment does not take place, from a researcher’s perspective and the AI perspective. The mean and median answers are in the range (slightly above for the mean of AI risk from not doing what’s intended) of the Apart Research and Grace’s survey.

Now, moving past the straight data, what is the rationale behind these figures? As one may expect there is no common consensus and rationale for individual P(Doom)s are not often given. Nonetheless, here are a few key perspectives.

Michael Tontchev is an AI researcher at Meta. His P(AI Doom by 2100) is about 20%. He says that though it does not seem ‘possible to give an exact calculation for this’, giving a numerical figure is more helpful than simply being ambivalent and on the fence. Tontchev’s logic for giving 20% as an estimate is that he can find ‘no great counter argument that knocks down all the challenges of aligning AIs’. This harks back to the initial aspect of the alignment problem of how it would even work. Tontchev also uses this (slightly causal) logic as an argument:

‘So by default, I start at 100% chance, and then make adjustments for a few reasons:

-15%: I am fallible, and there’s a chance I am wrong in my specific arguments. Even though notable experts are increasingly speaking out about AI risk, there are still smart people who disagree with me and with them. Almost no argument is bulletproof.
-20%: I don’t know everything about the topic, and there could be some big part that I'm missing. I hope so!
-30%: We might wake up (possibly due to an early, large-scale catastrophe) and 1) put in the right level of investment for the right amount of time for figuring out alignment and 2) slow down capability gain to the point where alignment significantly exceeds capabilities by the time we hit AGI.
-15%: Maybe permanent alignment turns out to be easy in a really unexpected and lucky way. Maybe?’

Moving on to Paul Christiano, the former head of language model alignment on OpenAI’s safety team who now runs the Alignment Research Center. Paul Christiano is the ex-OpenAI researcher making bold claims who was featured quite a lot in the news last month. His P(Doom) is 50%, a very high figure. Specifically, he says that there is a 10-20% chance of AI takeover, and if an AGI reaches human capacity there is a 50/50 chance of doom. Christiano admitted that he gives different P(Doom) figures often when he has read something new or simply feels differently about the matter.Nonetheless, he gave a very detailed of his probability in a number of AI risk scenarios on a blog post:

‘Probability of an AI takeover: 22%

Probability that humans build AI systems that take over: 15%(Including anything that happens before human cognitive labor is basically obsolete.)
Probability that the AI we build doesn’t take over, but that it builds even smarter AI and there is a takeover some day further down the line: 7%

Probability that most humans die within 10 years of building powerful AI (powerful enough to make human labor obsolete): 20%

Probability that most humans die because of an AI takeover: 11%
Probability that most humans die for non-takeover reasons (e.g. more destructive war or terrorism) either as a direct consequence of building AI or during a period of rapid change shortly thereafter: 9%

Probability that humanity has somehow irreversibly messed up our future within 10 years of building powerful AI: 46%

Probability of AI takeover: 22% (see above)
Additional extinction probability: 9% (see above)
Probability of messing it up in some other way during a period of accelerated technological change (e.g. driving ourselves crazy, creating a permanent dystopia, making unwise commitments…): 15%’

Christiano gives a sort of rationale on the post, but it is more of an explanation on the distinctions in AI safety (extinction risk VS. existential risk, dying now VS. dying later). Still, his probabilities delineate our future with AI much more clearly. He has also said that he does not necessarily believe that AI or AGI in the sense we know it now will be an existential risk for humanity. Rather, he thinks that an AGI will lead to something, and we do not know what this something is, that will be an existential risk for humanity.

The next perspective is Geoffrey Hinton. Hinton is the ‘Godfather of AI’. He was also featured in the news last month for resigning from Google, warning of the danger of AI. He even said that he regretted some of his work. His P(Doom) is supposedly 50%. Hinton also said that ‘GPT-4 eclipses a person in the amount of general knowledge it has and it eclipses them by a long way. In terms of reasoning, it's not as good, but it does already do simple reasoning’. In addition, ‘And given the rate of progress, we expect things to get better quite fast. So we need to worry about that’. Considering this, Hinton’s argument is that an AGI is not too far away and with how the world is right now, this technology will inevitably land up in the hands of a bad actor (Hinton gives the example of Putin) and these bad actors may give ‘robots the ability to create their own sub-goals.’ These sub-goals may lead to AGI creating goals like ‘I need to get more power’, which will doom humanity. This boils down to an alignment argument, though Hinton suggests that AGI will not autonomously gain the ability to think differently and independently, rather human evil will play into this.

The next perspective is Eliezer Yudkowsky. Yudkowsky has been the main ‘doomer’ in the AI community for a very long time now. Yudkowsky also founded the LessWrong forum and the Machine Intelligence Research Institute. As stated previously, he has a very high P(Doom) figure. Some sources say that it is above 90%. In response to the open letter calling for a 6 month pause on AI development, he said that it isn’t enough: it needs to be shut down. The reasons for Yudkowsky’s pessimism are in the following:

Brain Inefficiency: This is inefficiency in the economic sense, in rates. Yudkowsky believes that in intelligence per dollar, humans are inefficient, especially compared to an AGI.
Mind Inefficiency: Brain inefficiency is in terms of our cognitive computing capacities. Mind inefficiency seems to concern our social capacities and how humans work together. Yudkowsky thinks that humans are incompetent in that regard.
Evolution: Humans are effectively at the end of our evolution. However, AI still has enormous room for improvement in software and hardware. For hardware, especially through ‘Drexlerian nanotech’.
Mindspace: Similar to the previous point, Yudkowsky believes that human mindspace is incredibly narrow while AI has the capacity for an endless mindspace, to a degree that it would be unrecognisable or unfathomable. Here is a diagram of Yudkowsky’s that explains this:

Yudkowsky’s explanation:

‘This tiny dot (representing human intelligence) belongs to a wider ellipse, the space of transhuman mind designs - things that might be smarter than us, or much smarter than us, but which in some sense would still be people as we understand people.

This transhuman ellipse is within a still wider volume, the space of posthuman minds, which is everything that a transhuman might grow up into.

And then the rest of the sphere is the space of minds-in-general, including possible Artificial Intelligences so odd that they aren't even posthuman.’

Another aspect of the orthodox AI-doom case which Yudkowsky supports is the Orthogonality Thesis. This states that arbitrary levels of intelligence can be mixed-and-matched arbitrarily with arbitrary goals, so that, for example, an AGI intellectually far superior than human intelligence may devote its entire artificial existence to the production of paperclips.

There are naturally people who disagree with Yudkowsky or the general case for AGI risk. Here are links to counter arguments against Yudkowsky.^{, ,}I will not go into them; rather I will look into the cases of those who are not pessimists and have relatively low P(Doom)s.

One such individual would be Yann LeCun, an AI optimist and the chief AI scientist at Meta. LeCun, on a podcast, was refuting Elon Musk and effectively others who thought that AI has a high chance of destroying humanity. LeCun stated that this is ‘Completely false’ and based on the assumption of ‘the existence of ‘hard take-off’’. A hard take-off is the theory that the minute an AGI/HLMI is established, it will refine itself to be even better, creating more and more advanced versions of AGIs that will destroy humanity. LeCun says that this theory ‘is completely ridiculous because there is no process in the real world that is exponential for very long. Those systems will have to recruit all the resources in the world. They would have to be given limitless power, agency.’

Scott Aaronson is a researcher at OpenAI and is an AI optimist. He believes that AI may transform civilization, but ‘it will do so in the form of tools and services that can no more plot to annihilate us than can Windows 11 or the Google search bar. In that scenario, the young field of AI safety will still be extremely important, but it will be broadly continuous with aviation safety and nuclear safety and cybersecurity and so on, rather than being a desperate losing war against an incipient godlike alien.’ Aaronson also rejects the Orthogonality Thesis, which seems to be his main reason for not believing in AI-doom. He says that even people of high IQs who have mastered philosophy, literature, and science may also dedicate their lives to non-trivial things. The same could happen with an AGI. ‘If you really accept the practical version of the Orthogonality Thesis, then it seems to me that you can’t regard education, knowledge, and enlightenment as instruments for moral betterment’, he says. His argument seems to boil down to the fact that we simply don’t know how an AGI will behave, if it can behave autonomously. In the end, Scott Aaronson’s P(Doom) is 2%.

A final perspective is William MacAskill, a philosopher and founder of the Effective Altruism forum. I have taken the following excerpts from the endnotes of his book ‘What We Owe The Future’ which encapsulate his views on P-Doom and AI risk.

‘This century (between now and 2100), the world could take one of approximately four trajectories. Global GDP could continue to grow at approximately the same rate (2–4 percent annually) as it has for the last hundred years. Or it could grow even faster, perhaps driven by advances in artificial intelligence. Or it could grow somewhat slower, tending towards stagnation. Or there could be a major global catastrophe that results in billions dead. I think that the likelihood of each of these four scenarios is between 10 percent and 50 percent. I think that the stagnation scenario is most likely, followed by the faster-than-exponential growth scenario, followed by continued-exponential scenario, followed by the catastrophe scenario. If I had to give precise credences, I’d say: 35 percent, 30 percent, 25 percent, 10 percent.’

‘I think the total risk of the end of civilisation this century is between 0.1 percent and 1 percent, with most of that risk coming from engineered pathogens, automated weaponry (which I didn’t have space to discuss in this book), and currently unknown technology. This doesn’t include the possibility of artificial intelligence systems that are misaligned with human preferences taking control of civilisation; I put that possibility at around 3 percent this century, though I’ll note that what counts as “misaligned with human preferences” feels vague to me. I think most of the risk we face comes from scenarios where there is a hot or cold war between great powers.

My credence that there will be a catastrophe this century that moves us back to preindustrial levels of technology is around 1 percent. My credence on recovery from such a catastrophe, with current natural resources, is 95 percent or more; if we’ve used up the easily accessible fossil fuels, that credence drops to below 90 percent.’

Evidently, MacAskill is an AI optimist.

Finally, the general person’s P(Doom). In all my research, I was not able to find a figure for the naive person’s P(Doom). It seems like the term has been reserved for ML and AI communities (as was said earlier, all the more reason to research into it). Still, one source defined the P(Doom) for the average person in AI alignment or who is aware of AI alignment to be about 30% (note that this is from a post which cites the Bensinger survey results for this figure). Reasons for these high P(Doom) figures, though people who are alarmist would think that 30% is quite low, would be that people in the know but simultaneously not at the top of the field of experts see how quickly AI is developing and behind a veil of uncertainty and not understanding the technology, such people simply give high P(Doom) figures. Another reason is how the media’ influence on people. The media has been covering extremist perspectives on AI-doom much more than optimist views. This emphasis naturally has had an influence on the public, making it seem like the present, future, and potential of AI are life-threatening. This translates into relatively high P(Doom) figures

To conclude, I hope that the research presented here shows what P(Doom) is and how it stands today, how many different factors play into it, and how decisive it has been with so many people having different perspectives and understandings. P(Doom) and AI safety is a large and complex niche within machine learning and this research should be seen as a starting point to further work into it. There may be some limitations within this document, and so viewing it as a starting point would be ideal. Regardless, I hope that the initial, guiding questions have been answered.

Note: The resources I have directly used information from have been cited through footnotes. I will be compiling a more comprehensive list of all the resources I used in building up the foundation of my understanding in this research.

For now, one resource that should be helpful is a database of myriad researchers and their understandings of what existential risks may look like for humanity. It is not specifically for AI, but most of it should concern it:

Database of existential risk estimates (or similar)

Metaphysical Musings

On p(Doom)

Recent Posts

Contact Me: a dvik.lahiri28@gmail.com

Metaphysical Musings

Contact Me: advik.lahiri28@gmail.com

Contact Me: a dvik.lahiri28@gmail.com