Library / In focus

Future of Life Institute PodcastCivilisational risk and strategy

Could Powerful AI Break Our Fragile World? (with Michael Nielsen)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through Could Powerful AI Break Our Fragile World? (with Michael Nielsen), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 64 full-transcript segments: median 0 · mean -5 · spread -26–0 (p10–p90 -17–0) · 8% risk-forward, 92% mixed, 0% opportunity-forward slices.

Slice bands

64 slices · p10–p90 -17–0

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 64 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyflicore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video hjt7B_x9GGQ · stored Apr 2, 2026 · 1,614 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/could-powerful-ai-break-our-fragile-world-with-michael-nielsen.json when you have a listen-based summary.

Show full transcript

People who were working on the discovery of the neutron, I doubt they were were thinking about the incineration of Hiroshima at that time, even though it occurred only 13 years later. Just a tiny tiny sort of gap from these people just playing with systems in their lab to this absolutely immense destruction. AI and and AGI are going to result in a large number of just extraordinary things. I am the beneficiary of one of the very first drugs that was done using protein site and very sympathetic to people who are just excited for that sort of positive upside but there's the possibility of an enormous negative downside. No, I think at first I rolled my eyes a bit. You know, I would see you the pope commenting or the the United Nations commenting and that was just completely wrong. It's important that all those people are thinking about this and solutions will come from very different and maybe unexpected directions. Michael, welcome to the Future of Life Institute podcast. Thanks for having me, guys. You've written a series of wonderful insightful essays on on AI risk. And one point you make here is that deeply understanding reality is intrinsically dual use. What does that mean? And what are some examples of that? I suppose I just mean that if certainly I mean to some extent I'm I'm speaking of an empirical fact. If you look back through millennia whenever we've had sufficiently deep understanding of reality it's always created opportunities to change reality in ways that give us power and that can have both constructive and destructive uses. I mean a relatively recent example in the scheme of human history is something like the discovery of subatomic physics and quantum physics in the early part of the 20th century and this led to many many wonderful things a lot of modern material science the semiconductor industry and I mean much of the modern economy it was also led in part to nuclear weapons and thermonuclear weapons and it's very difficult to see how you could get one without also getting the other I can't prove that this is always the case. But certainly looking historically at many examples, if you go back further and you think about Newtonian mechanics and things like that and sort of the way that plugs into artillery calculations and and things like that, it's kind of another another example to some extent. And this is the core of the problem of AI risk in in your opinion, right? It's it's about the fact that when we understand or when we create systems that can understand reality at a deeper level that then confers power to these systems in ways that can be both good or bad. Yeah. In fact, it's not even AI risk. It's it's a broader thing which is risk created by science and technology. There's this wonderful paper from I think it's 1955 by John vonman which really mirrors a lot of discussion of AI risk other people like Oenheimer wrote similar things although not quite as broad and it's just you know as human beings attain this very deep understanding our ability to do things like modify the weather starts to increase or modify the climate starts to increase. So yeah, this is a different framing than thinking of of AI risk in terms of alignment. You are quite concerned that it would be easier to create a truth seeking system than an aligned system. Why would that be easier? I don't mean well easier I'm not sure that's quite where I would focus. Certainly the point is that you know whenever you have you the the notion of alignment you know famously people will like aligned to what and the historic end so I mean we've been solving the alignment problem as a a society and as a civilization for for millennia um you go back to the code of ham rabi and people like that and we're trying to attain this kind of you know social consensus and then the institutions necessary to implement that social consensus however imperfectly obviously civilization is is is not perfect. Bad things still happen. But we've achieved pretty good degree of alignment with some consensus notion that the issue becomes certainly a very significant issue is when it is possible for people to unilaterally defect sort of locally if tremendous power is is available to them they they they can violate it. this social consensus. It's also just sort of the sheer fact that different groups will converge on different uh consensuses. So the United States military, the Russian military, the Chinese military are unlikely to have the same standards of alignment as say consumer apps in the United States. And consumer apps in the United States are going to have very different notions than say applications in China. When DeepSync released their model, quite a number of people at least informally reported, I don't know whether I I didn't see any sort of formal evaluation, but they informally reported that a lot of things that the CCP doesn't particularly like appeared to be censored in the model. So, you know, there's a significant difference in the alignment just along that axis. Why is it that we can't solve AI risk by simply doing what we've been doing with other risky technologies, which is adapt as we as we go? We something happens, there's a catastrophe, we implement new regulation and when and we can kind of we can we can co-evolve with this technology. Why would advanced AI be different? Yeah, I mean, it's a great question. It's kind of the the the quadrillion dollar question and and you maybe it turns out that we can't it it remains to be the same. You know there's a certain amount of sort of optim optimism there. If you think historically about sort of previous technologies particularly previous sort of major platform technologies very often this works incredibly well. An example I love is the introduction of jet engines. And actually one of the very first companies to work on this comet had I think it was three maybe four fatal crashes in their first year of operation. And of course you know they they shut down but at the same by the same token while that's a terrible human loss it created an enormous incentive for the other companies to get it right and you know really improved their technology a lot. So that sort of I mean not just capitalism but in fact just plain consumer sentiment and whatnot often does a really good job of of doing this kind of alignment. A very large number of technologies that works remarkably well just sort of word of mouth. It's kind of you know it it's it's sufficient to exert a lot of pressure on the people doing it. It's not perfect. You still get snake oil being sold. You sometimes get terrible things being sold. Um underground economies often have big problems with this. the things like, you know, drugs which actually cause deaths. There's no legal recourse there and it's very difficult to communicate, you know, sort of in a public way. You can't say this supplier is bad, you know, on Twitter, you're opening yourself up to kind of all kinds of problems. So, sort of the signaling doesn't work there. Now if you think about problems like nuclear weapons or climate change or thorough fluorocarbons back in the 70s and 80s those kinds of things all instances where the traditional institutions have at best struggled for a variety of reasons. Climate is I think a very interesting one in particular. you know, you've just got this situation where you've got this enormous kind of engine of capital which is an a strong interest in continuing to perpetuate this situation and they don't particularly want to switch away. They're not experts on renewable energy or things like that. So you the question becomes what does AI risk or ASI X risk you know what what character does it have? How do you how do you classify it there? And and I suppose at least my intuition, my belief is that this ability to put enormously dangerous capabilities in the hands of individuals makes it much more similar to the case of something like climate or or sort of democratized nukes um than it is to a conventional technology. There is also the framing of AI or ASI as a generator of new technologies that could potentially be dangerous. and then perhaps talk about this this framing of of AI as a as a generator of new technologies in a world that is potentially vulnerable. Yeah, I mean the obvious example that's sort of a you know it's a project or you know it's a prototype for what we we're going to see alphafold you solved sort of the protein structure prediction. If you can predict structures pretty well, that's actually enormously helpful for doing design work and say, you know, is the design I want. You know, sort of the the way the the generative models tend to work is they just they try lots of slight variations. They keep running the predictor and seeing how how well it works. So, we've got had a big iteration in the last few years on our ability to do antibbody design and things like that. Now, at this point, my understanding, and I'm not an expert, is that we're kind of in an intermediate state there. There's not a whole lot of new drugs have hit the market. There's not a whole lot of of these kinds of things which have hit the market, but it's getting close. Like, it's really interesting just to look at the literature and you can see these people who are interested in protein design, they just go absolutely kind of bananas after alpha fold. I I can't remember who it was, somebody pointed me out pointed out to me like the job listings at one of the companies that does this. all was all for Python programmers and people who are expert in PyTorch and things like this that that that kind of thing but it's it's still early days but then you start to think about things like you know pryon design which is two words that ought to strike terror into anybody's hearts and then you know the question of why it was protein design and not virus design that was kind of done first is is an interesting one. My understanding is just there wasn't as much sort of data collected about about viruses. You had the protein data bank which was this enormous very well curated service and and you had just less good data partially due to historic accident is my impression as a non-expert but that again sort of thinking you know probably not that far away from being able to do high quality virus design and that's going to have you know many many many wonderful impacts but it's also something that you don't necessarily want in the hands of individuals unstable I think in the early days. It is going to be a specialist. It's clearly going to be a specialist technology which will require a considerable amount of background to to use and deploy. Over time, you know, access to that will become increasingly democratized. Maybe take a step back. You asked about the vulnerable world and that is just the question of whether or not there are, you know, relatively simple, easy to deploy, easy to make and deploy technologies which can cause catastrophic or existential risk to humanity. And this sounds I think implausible to most people sort of a priori but then when you start to look at actual kind of examples from even the history of life on on Earth and it's it's maybe not not not so ridiculous. People you very very well-informed people were extremely skeptical that nuclear weapons would work. You go back much much you billion billions of years in time. There's things like the great oxygenation event where you the ability to metabolize into oxygen actually you know wiped out probably most species on earth and many there there there are many examples one can can can give along the way which show that very simple but completely transformative technologies are possible and you have to ask the question are we close to being able to do that again you any tool which is capable of significantly advancing science of technology can potent potentially you plug plug in and and and help enable that. And then that I think naturally leads to the question of whether our institutions can keep up with the pace of development of these new new ways to affect the world that could be potentially dangerous. H how do we know whether we're on the right path here? Well, I mean to some extent, yeah, of course, the the you the funny and ironic thing about this is that smallcale disasters are very helpful for improving institutions. Certainly, I think things like if you look at things like say the nuclear non-prololiferation treaty and the various testban treaties, the infrastructure when I say infrastructure, I mean sort of almost the social infrastructure that was necessary to create those that all gets reused to some extent. Even an example I love is the Vienna and the Montreal protocols which resulted in the banning of CFCs. One of the things which is interesting about those is that every state in the world signed it and that had never been done before. And so you have this sort of interesting situation where it's like oh it's possible everybody to agree and just to blanket replace technology. So I I mean I think those kinds of things represent really significant institutional progress. Unfortunately, they're not always, you know, they're quite bespoke. They're not, you know, it's not that we've got something that kind of, you know, helping those things directly scale up. Something that I think is valuable about podcasts like this and terms like existential risk and and things like that, they do start to create kind of common norms, just a common language which people can use to communicate about this. Um, and that's obviously incredibly valuable at the um, you know, it just it starts to make it easier to coordinate. How concerned should we be about losing control to advanced AI? Is that perhaps the the right framing here? I mean, I I think you that that's a very significant issue. It's one of many possible risks, but I mean, your listeners will have heard many people talk about this before and it's probably the most prominent of all the the ASI express um scenarios. So, my personal opinion, yes, I'm very concerned about that. Um, I don't think it's kind of the the fundamental issue. I'm just more concerned in general with this this possibility for destructive technologies. It's it would be an example of such a destructive technology. It is not necessarily the only. Which direction are the the incentives here? The market incentives, which directions are they pushing in? Because on the one hand, you you can you can see that it's not useful to put a product out there that's dangerous, right? that's not it's not profitable in the long run to to deploy dangerous products. On the other hand, it seems like we are racing the companies uh countries and so on towards developing advanced AI systems at a very fast pace and perhaps without thinking long and hard about the the consequences. It's not even just the the question of profitable is also just the fact that it you it's people running these companies and they don't want to die. they don't want their children to die. So that's kind of for for many of those people that's an even more significant thing than concerns about you know will I get some nice stock options to return to the the quantum mechanics and the nuclear physics example though you know the people who were working on the discovery of the neutron not I believe anyway I doubt they was thinking about the incineration of Hiroshima at that at that time even though it occurred only I think it was 13 years later you know just a tiny tiny sort of gap from these people just playing systems in their lab to this absolutely immense destruction. And yet from the point of view of the people who were discovering things like the neutron and the proton and the structure of the nucleus and so on. Yeah, that that just seemed like a a wonderful game with enormous benefits to humanity. In fact, it does did have enormous benefits to humanity. That's maybe I mean that's part of what is really difficult of course about the situation. AI and and AGI are going to result in a large number of just extraordinary things. I suffer from a a a mild chronic lifelong condition. I am the beneficiary of one of the very first drugs that was done using protein design. It's a previous iteration. It wasn't done with what we would today call an AI system. But boy, do am I like do I ever think this is a great thing at some level. And you know, and I'm so excited for other people to to kind of get access to those kinds of things as well and very sympathetic to people who are just excited for that sort of positive upside, but there's the possibility of, you know, a a very hard to defend an enormous negative downside. How do we recognize which world we're in then if the bad world in which there's a huge downside looks like the good world in which we get a bunch of upside from AI and we avoid the risk? This is this is one of the things that that make makes this debate quite difficult or makes this arguing about AI risk difficult because how do we distinguish between these worlds? It it it is difficult actually I'm speaking a little through my hat here. I'm not quite sure but you know if you think about as I say sort of circa 1930 people had started to consider the possibility of of nuclear-l like weapons at about that time. It was still very speculative, but I think it was probably very unclear just how you are we going to get things which can wipe out a city block, a city, a country. And so there's kind of some unknown level of of destruction. And so if you're sort of thinking, oh, maybe some of this work that I'm doing on understanding subatomic physics might one day be used in this destructive way, there's just this sort of enormous range of possible negative outcomes that I think they probably needed to wait. But sort of the E= MC² gives you some bounds on on the the the size of it. It turns out I think thermonuclear bonds there something like 7% efficient or something and at some level the the question of sort of cost per megat ton becomes the the relevant one there and I don't think it was at all apparent what that cost was going to be for a long time. So that's the analog I suppose of your question. I I want to put it in that sort of concrete context to just make it clear that actually a very you know it wouldn't surprise me at all if if at that time in history it could have ranged over five six seven orders of magnitude of a plausible estimate. So that's a that's a problem. I'm not sure I've addressed part of your question. I don't think I've addressed all of it for perhaps one thing to ask here is alignment and the question of whether focusing on alignment is the right approach. you you describe what you call the alignment dilemma. Could you describe what what that is? Okay. An analogy I've come to like recently is imagining you're an Olympic swimmer. You you directly identify that if you want to be an outstanding sprinter, if you want to win yourself a gold medal, you need to work on your strength. In fact, you need to work very very hard on your strength. But if all you do is work on your strength, that's actually going to be quite counterproductive. you are not going to be a good a good swimmer. So you need to sort of work on that. But in the sort of the larger context, I think you know it's very clear that a major impact of the work on alignment which has been done so far has just been to speed up the creation of of these systems certainly I think in particular the invention of well invention and successful use of RHF you really turned chat bots to some extent not just RHF but other things as well but that that was very important turned chat bots from being sort of interesting systems into things which are very friendly but to you know you can make them much more friendly to consumers you can make them much more friendly to governments and so that's a sort of a way of really speeding up the the capitalist race that you referenced before so you know is that you know it helps you with one set of important goals which is indeed you know making your AI system relatively controllable and and and and so on but at the expense of you know causing tens of billions of dollars, hundreds of billions of dollars to come pouring into the space and really accelerating everything. Um so it's not clear to me to me that seems maybe a little bit like having gone absolutely nuts on uh certain type of strength training program while trying to to you know basically slowing you know really damaging your ability to swim. You're not actually solving the problem. What would actually solve the problem then? What is the alternative that a person that could be working on alignment should be working on? Yeah. Well, I I mean I suppose I'm just pretty negative actually on on most of the work on on alignment. Anything which is focused purely on the properties of the individual systems. I think it you know the the tendency is just for that to align with the interests of the companies. The things that I am more interested in and think are much more promising. It's all external. It's all sort of governance in the rest of the world. So people who are interested in questions like how to do real-time sort of monitoring of biological threats and and sort of you know response models there computer security these kinds of things. Safety is you know safety isn't a property of a system. Safety is a property of the system and its complete environment. And in fact very often that you know it's really working on the environment which matters much more. At the moment the people who can work on safety of the systems you those systems are controlled by the companies you the natural out you know effect is all of that work will be aligned with the interests of the companies. It's not perfect, of course, because they're large entities and and it's, you know, very hard to to to to get that alignment perfect, but I think generally speaking, that's, you know, it's basically it's it's most work on the alignment of individual systems is serving the needs of capital. If you're thinking about sort of the alignment problem to society in general, you're ensuring a flourishing human civilization, the kind of thing FLI is dedicated to, you know, that's, you know, now you're talking about people who are you working on these, I think, sort of much broader problems. You I mentioned things like the nuclear nonproliferation treaty and things like that. You know, that's sort of a a long line of work that is aimed at at creating this kind of collective safety. Do you think it matters whether the companies are aiming at creating agent systems or whether they're aiming to create better and better AIs that remain tools? Yeah, I mean I think you can flip the the question around a bit and sort of ask what's in capital's interest and I think there it's quite clear you know there's an enormous interest in in making them agentic even just for very simple kind of reasons. You know, people are going to love sort of your friendship and romance bots and it's better. You know, they're going to be more interesting and more attractive if they're somewhat more agentic. You know, they could buy you flowers. They can, you know, do these kinds of things. Even that sort of very simple kind of thing. Over time, there's something of a slippery slide. I mean, famously, there have been quite anonym of of flash crashes. They're often a little bit obscure what's going on, but some kind of AI like what or sort of early machine learning type or data science type system, you know, which is hooked up to the markets in many cases seems to be causing some kind of massive change in in the market sometimes, you know, sort of a trillion dollar kind of a fluctuation. Is that agentic in you know the full human sense? No. But it's a system that you know has to some degree kind of goals and rewards and is capable of taking not just a little bit of action in the world but potentially you know being responsible for the movement of hundreds of billions or trillions of dollars and you know those examples go back let's see I mean certainly 15 years there was a flash crash in 2010 so I think I think it's very hard to see a situation where they remain merely tools whatever that is I mean it's it's not even clear like how do you what's the what's the criterion by which something is just a tool versus something which is you know it's the difference between anuensis and a composer at some level actually those two things are not so distinct how did you come to be convinced that there's actually something to be concerned about with AI risk yeah I think mostly uh so in the late 1980s I read a bunch of writing by Carl Sean and of his friends about nuclear at risk. A lot of people now view some of that as being naive or wrong. I also read Eric Drexler's book and of creation where he talked about great goo and sort of scenarios like that. There's a lot of technical problems with some of that work but it got me sort of interested and aware. It also seemed he's quite up upfront about sort of the technical problems at some level there. And then later on you know I became a theoretical physicist. I worked for a very long time on quantum computing and it just became clear, oh yeah, the world actually has, you know, this enormous latent power inside it. Our current technology seems to me just to be merely scratching the the the surface. You got to you got to explain that. I think what what does latent power mean? Is this something that you that you understand when you understand a discipline deeply? when you become a physicist or a biologist or something. It's funny, you know, I I'm not super expert on something like nuclear fishision, but I kind of understand it well enough. It's a little bit it's a little bit shocking, you know, you kind of go through and you realize, oh wow, this is actually really quite simple and it's just hidden. You know, it is not obvious at all that it's out there in the world. And it's a good thing for us. It's sort of rather fortunate for us that it's not a little bit easier to make these these bombs and it's not a little bit easier to make them much larger. Those are due to sort of somewhat contingent random facts about the world. There's that experience I guess over and over I worked on things like quantum teleportation and quantum algorithms and things like that and again you just have this experience of you know people write down you know these very simple sort of models of the world and there's all these things hidden inside them that you you don't realize that science is just so full of that actually public key cryptography is I think a wonderful example it's miracle you you it could have been invented I think I mean certainly many hundreds of years ago just these incredible ideas hidden essentially inside number theory. A recent example of of latent power hidden in biology might be something like mirror life. Again, of of course I'm I'm not an expert here, but it seems to me plausible that mirror life is an actual threat. And this is something that that you wouldn't be this is something that you wouldn't have been able to guess if you're not a bio biologist kind of with deep understanding here. I think that's I think that's right. It is a good example. I mean, it's by no means clear that it is actually an enormous threat, but I don't I'm not particularly keen to find out either, you know. It's just that's that's playing with fire. Actually, fire is I mean, yeah, the permutable example is is is it's such a good example. It is, you know, sometimes people say, oh, you know, these sort of vulnerable world recipe for ruin that seems so implausible. And then you say, well, you know, can you take a a simple technology which you can buy for well under a dollar, anybody can operate, anybody can get it anywhere in the world and with no extra input, have it cause a billion dollars of damage and, you know, a thousand deaths. And of course, the answer is yes. You just hit it match and it's a good thing. You know, imagine, you know, if you increase the oxygen content of the air just a little, fires would become much more fierce. Be much easier to establish a firestorm. There's anthropic reasons why this doesn't happen. We live in a pretty friend, we know we live in a pretty friendly world because if we didn't live in a pretty friendly world, we wouldn't have survived to this point. Unfortunately, now we have these enormous brains and you know all these wonderful ways of deepening our understanding of of of the world and it's by no means clear that we're going to remain in that relatively friendly regime. It's a pretty deep question actually whether we live in a friendly world or not whether the world is vulnerable. What evidence do do you think we have so far about this is is is the way that the world works as we understand it scientifically favorable to our existence is one way of asking the question. I don't know how to attack the broad question except in sort of in a very negative way where you you could in fact demonstrate oh here's a uh you know here's a particular lethal technology actually this is I mean it's a problem with discussions of of ASIX risk I've had many conversations with sincere thoughtful leaders in technology who are like yeah I kind of see why people are concerned but like you know these AI doomers they they always get quite vague when I ask them to to you know connect the dots in the scenarios. So like well I'm not sure I actually want you know how much we should want the dots connected I get the feeling some of them are not going to be satisfied with anything other than a detailed description of a recipe which might be intellectually satisfying but also seems like a dreadful mistake to to make. This is one of your persuasion paradoxes where if you make a highly persuasive case case, a detailed case of how AI could be risky, you're then putting dangerous information out there. And this is something that hinders the debate from moving forward. Yeah, I think it I mean it hinders it somewhat. It it honestly at this point I don't think it hinders it that much. Although it it is by its very nature quite difficult to know. you know, I hear dark whispers from people that, you know, they, you know, there are rumors that a model was used to do such and such. It's maybe a fun thing to to hint at. I suspect very often the reality is is less is less interesting and the kind of the gossip for now. Perhaps that's a key point here when you say for now because if this if skepticism about AI risk is is is skepticism about how intelligent current systems are then you can experience tells us that you can probably wait a year or five years and then you will have more advanced systems and you will at some point face the issues. Well, there's that and you're saying I mean you just have to wonder as well like that there's some gap between what has been done publicly and what is being done privately. You know I mean one of two things is going to happen. either militaries are going to start working and intelligence agencies and whatnot on systems themselves or quite likely you there's going to be contracts with kind of the big current labs where there's some you know sort of separation and it's not necessarily going to be completely obvious actually what the capabilities are in the public eye anymore. You think about things like the, you know, there are significant parts of the the budget for the intelligence agencies which are are not even they're not even just that that basic kind of information is not public not always public information much less details program and so you you you wonder what what what's going to take place there and in particular what what capabilities will be developed. I I agree and I I would actually expect that as models become more capable in the domain of coding for example, some of the labs might or some of the companies might be tempted to to deploy these models internally in order to have them help on improving AI research further so that they can perhaps get a further lead in the race and and that that would be the maximally profitable or interesting thing from their perspective to do with the models as opposed to deploying them broadly in society. That that that situation could become quite dangerous because then you don't have this public information about AI models. You don't have this this kind of back and forth as we talked about where institutions can can adapt to the risks and so on. That's actually something that I've changed my mind about a bit. I mean just a few years ago I I was not wild about open source models at all. And I think conditional on there being major private efforts, then I actually I'm very in favor of there being comparable open source models. Even though they create a threat vector, they also create sort of a surface area being able to say, oh, you know, these uh some threats that are being created, it also means that the organizations who are doing you know computer security audits and who are doing all these kind of evals, it means that they don't have to they ideally they would be in a very adversarial relationship with the major labs, but in fact they need to work with them as partners and so to some extent they they don't really have quite the right relationship, but if they're using, you know, Llama or one of the other open source models, they it doesn't matter at all. A paper I really like, it's a sort of a small paper, was Kevin Espel's group did this finetuning of Llama to see how much it could help with creating pandemic agents. Yeah, it's kind of telling that I mean it it's just it's just good to be able to use llama in in that kind of a kind of a way and they don't need to get permission from open AI or anthropic or whoever that's the correct kind of you know that that's the situation you want. So it's a way to get information into the world about the current capabilities of models and and I think that's a good point but but in particular not just information but you actually want to be able to do sort of the most adversarial things possible with things which maybe the you know the labs might not be so keen on having discussed publicly you can just sort of say well you know I don't I don't care and in particular not having the organizations doing the evaluations need to be aligned with the companies creating them is I think that's very healthy. One thing that makes me nervous about open source is that you can't recall these models. So whenever you have an open source model released, you you can never recall that product no matter what it's capable of and it's kind of proliferated into the world as we as we've been talking about this if it falls into the wrong hands that could be a real problem. Yeah, I agree. I mean that's certainly a a significant downside. It's interesting to think about the question just how large are the models eventually going to be. What do you mean how large are the models eventually going to be? It's very much sort of a stop gap intermediate kind of an argument I'm I'm trying to make. If a model actually can only do inference you running on a significant cluster if it can only you know do those kinds of things. I think it's implausible as a as a long-term you know compute is going to continue to get cheaper. Yeah, maybe sure maybe it eventually turns out that you you need whatever beyond a quadrillion, you know, parameter model is and so it's it's actually quite complicated to do this. But yeah, that that's all it's a massive reach to to push back on your argument at all. I mean I mean so far we can kind of depend on consumer hardware not being advanced enough to run frontier models you know using a lot of inference or think reasoning models thinking models using a lot of inference right at the moment. Yeah. Yeah. But but but as you as you mentioned yourself that's that's probably going to going to change over time unless we then find a way to most advanced models to require even more inference in a way that keeps up with with computer hardware. I don't think that's that's going to happen. I I think we're going to enter into a world where where you can run very advanced models on basically smartphones. I I I think that's quite likely. I I mean the way in which that is you know there's sort of an economic reason why that's going to fail to be true in its strongest sense which is just you know if you believe that more scale in the model helps then you should expect whatever sort of the the most powerful current chip is is probably going to be required to to to run the best model. And so if that's costing, you know, $30,000 or or whatever, you know, that's it's it's going to have an advantage over a smartphone. I don't actually know whether or not it's true. It's probably not public information whether or not the current frontier models you when they're doing inference, are they simply running on a single chip? I'm actually just shamefully ignorant of of that. I assume the answer is probably yes, you know, and so they've got it deployed across a cluster but you know you you know run your particular chat GBT query and you know it's essentially a single chip which is responsible for the output. I don't actually know whether that's the case but you know that's sort of interesting if if so it means that for a new sort of tens of thousands of dollars you can do inference with the the chip actually I suppose okay so they are doing a lot of things in parallel okay so I just don't with I'm think about the reasoning models okay I I take that back actually there probably is quite likely something more expensive being done which may may not be so easy to do then on a okay so it is It's a question of sort of long-term what algorithms are done and how easily can they be sort of scaled up beyond a single beyond a how easily can inference be scaled up beyond a single chip. And that's kind of a that's an interesting I suppose that's a that's an interesting economic constraint on your assertion about eventually the cell phone version not being so not being so different from the frontier model. Good great great thought. Yeah, there are a bunch of open questions when when we have a conversation like this and and some of them can be answered by some people in the world, but there are also generally open questions where no one has the answer. When we're thinking about AI risk or highly advanced AI in general, we are we are somewhat kind of fumbling in the dark without having an established science. and it feels like perhaps we don't have the time to to have an established science because that can take maybe maybe decades, maybe maybe a century. How do we act in that situation when when things are moving incredibly quickly and we don't, you know, we don't it seems to me that we don't really have enough time to deeply understand the models that we're creating before the mo before the models get very advanced. No. Well, I agree with all of that. I wish I had a magic, you know, solution. What I'm asking here is like how do we pragmatically approach that situation? I know you have a you you know, you've been kind of involved in in open science and so on and and you know, you know something about how science works. How do we how do we get a scientific grasp on on this situation we're in? Well, okay, first of all, you know, I don't think it's primarily a scientific problem, unfortunately. you know to a very large extent it's you know it involves everything that's part of the problem and it's also part of why I suppose I'm interested and excited excited is maybe the wrong term encouraged at least to see people from so many different areas get involved I think about people like Vitalika and actually or a bunch of economists you they have different ways of thinking about how do you deal with the cost of externalities in the And so sort of seeing that kind of person even know I think at first I rolled my eyes a bit and I would see you know the pope commenting or the the United Nations commenting or you know people you know the representative from you know whatever the ambassador from whatever country would would make an announcement at the UN. And initially I think I I did the sort of the very arrogant physicist thing of sort of rolling my eyes out and that was just completely wrong. It's important that all those people are thinking about this and and worrying about it. And suppos I hope and believe that it's quite likely that solutions will come from very different and maybe unexpected directions. There's a think there's a kind of really interesting hubris in pessimism. To be pessimistic about something, you have to believe that you're so cle you and your friends are so clever and so allseeing that if there was a solution, you would know about it. And you know, when I grew up, I was told things about, you know, that that CFC's were going to kill us all or excuse me, the o, you know, damage from the ozone layer was was going to cause terrible, terrible problems or that acid rain or many, many other concerns. And in each case what was going on was the people sort of diagnosis problem who were very pessimistic about it didn't realize that always that like just how much cleverness was being brought to bear by people of goodwill sometimes with very different backgrounds to themselves. So that that I at least find encouraging. I I I also like the thought, you know, people with very diverse expertise and very diverse interests working on it seems very important to me. It's why having having the pope comment is actually it's actually quite helpful sort of if Roger's done thoughtfully in an engaged way because it does bring more people into into the conversation and in particular starts to bring more expertise into the into the conversation. Do you think that over the long term this problem is is something that can be solved with governance mechanisms or cultural norms or so on? Can we have a world in which we we haven't solved the the foundational technical problems of how to say align or how to control AI systems but then we have good enough governance that that things somehow work out. I mean okay so you know governance tends to get used in two separate ways in these kinds of conversations. One is just a very practical kind of a one you know we have all these institutions that you can point to. So you have the United Nations, the US government etc etc the judiciary those are all governance institutions and then it's also sometimes used in this vague way which is you know how human beings control human actions and outcomes generally speaking and I you know it's always the case that with difficult new technologies we have to expand the the the the former the question is how much do we need to expand I think that there's a you know a very large expansion should necessary. It may be one that that that is essentially insoluble. One of the terrible things is that you know disasters are one of the main things that that cause an expansion. You get the Union coat carbide or you get what's her name Rachel Carson and Silent Spring. These kinds of things where they can point to a very large problem and it leads to improved governance mechanisms. It is very unfortunate. There's kind of this epistemic problem where if you can point to a very immediate very legible threat, it is much easier to get an expansion in sort of the governance mechanisms we have available. Whereas if you're putting to a relatively illeible threat, it's so difficult. I mean that's you know climate has been such a problem. It took 60 years really to make almost any progress at all on the the climate models and then you know many decades of wrangling and wrangling and and wrangling this very long-term problem. So you don't get the same kind of immediate turnaround as to some extent you get with something like the threat from the ozone hole that we had where people could just go and look and oh my goodness you know there is this very rapidly expanding hole. One thing that that makes AI risk different from climate change is that the risks are going to be more apparent in people's lives. I think I think people are going to be able to interact with models that are impressively advanced and that that will make them probably that will convince them that there's something to this issue. Whereas climate change you you can without you know without scientific grounding but but you can kind of look out your window and be a climate skeptic and and think okay nothing is really happening here. Yeah I mean even just the time scales we don't notice you know the changes over over 20 years and also just random luck. One question I'm interested in is over the course of history, you have a lot of people coming with doomsday scenarios, predictions of of of danger, right? And a good huristic has been throughout history to be quite skeptical about about this. Yeah. Basically, ignore ignore them. Watch your guide. Great example. Exactly. Yeah. Or Yeah. Pick pick your your doomsday scenario. When do we know whether that heristic no longer applies? Right. Certainly I mean in this particular case in so far as you you can actually point to significant problems which are averted or deal or dealt with that's going to be very helpful. A problem with Y2K of course is that after the fact it's very difficult to say you know there are still sort of two schools thought one says oh there was never really a problem we spent goodness knows how much money and whatnot on a on a non-pro and then there's another school of thought that will say oh we spent so much money and that's why nothing happened. You know, I love examples like if you just look at the number of of uh nuclear states, you know, it's going up very rapidly. Then the non-prololiferation treaty comes into force and it doesn't quite flatline, but it like it it you know, you can see it you can see the impact of the intervention there. It's certainly I think interesting and potentially a really good thing about some of the available organizations that they may be able to start to make legible certain impacts that would be I mean very confidence inspiring people do this in in things like computer security you know where there is just a certain amount of kind of public awareness there's not enough by all accounts you know ransomware and whatnot is most of the damage there is hidden from public site so it's a little bit difficult to see the impact of of institutions and tools. Hopefully, people will be able to make some of the impact of any interventions much more legible. Um, simply as a way of, you know, then being able to see what works and what doesn't work. I I would love a sort of a nice compact example like that, you know, the tailing off in the increase in the number of nuclear states. So, we don't have, you know, we might have had 100 or 150 now and instead we've just had a small increase because of the nuclear cartel. I don't have a a ready example to mind for AI. We're not we're not in the situation quite yet. One paradoxical situation you've pointed out is that if humanity reacts to a threat and thereby decreases the risk of that threat becoming a a reality. Well, then the skeptics will be able to say, okay, there was never a threat to begin with, right? This of course I would love for this to be the the situation with AI risk that that we actually react and and then it's fine that the skeptics can say there was no risk here to begin with but this this is such a good example to point out how how kind of difficult it is to navigate this terrain epistemically. You you we probably won't ever have a clear answer about you know how big was the risk at a certain time. we we probably never will have a settled science here. So, I'm Yeah, I've asked you this I I I asked you this before, but what is what are the good huristics here? What are the good kind of rules of thumb for navigating a a situation like that? I'm glad that you think I might have something intelligent to say about that. I'm not sure, well, I'm sure there are people have more intelligent things to say than I do. I'm not sure anybody solved it. Actually, I just want to like lean a little. You asked me before and you've sort of mentioned it again about this science of AI risk and there certainly I mean very clearly there are some local things that that that that can be said you know there are particular threats and and that we can reason about in a a scientific manner thinking about it sort of more broadly you know it is peculiarly difficult I mentioned the example of climate before there is this kind of just a astonishing fact that people started making arguments about climate change in the 19th century and there was kind of a famous debate between Angstrom and and Harinius at the start of the the 20th, I think, where they kind of came to opposite conclusions, both based on really quite plausible arguments and quite plausible physical experimentation. And from a policy point of view, it left us in a really strange situation. We kind of had a good argument that climate change might happen. We had a good argument that climate change would not. And it wasn't really until the 1950s or 1960s that in fact the anything like the modern understanding started to develop and then by the 1990s it was really becoming the consensus and quite clear sort of what what was going to what was going to happen. That's a period of what 80 odd years for what is fundamentally a very simple physical problem. And in the case of AI, you know, it's so much harder because fundamentally you're talking about the structure of knowledge itself. You know, how are our systems going to change our ability to navigate that that that structure? It's a it's a much much deeper and much less accessible sort of part of part of the universe. Um I don't see it's like that you know there is no science of science in in some sense in in sort of in the there are some I mean people I I have made contributions to the the field known as the science of science but it's not a predictive model of discovery by definition you you you you don't have such a such a thing you don't know what what hasn't been discovered that seems much much harder to ever imagine having a sort of a detailed science of it. Maybe seems a lot more like like aircraft safety engineering where at least you can have sort of, you know, reasonable models of the kinds of things that that that can happen. I don't know. I mean, you talk about changes in epistemics. Gosh, think about things like if you think about if you think about sort of a science of science, you it's it's so hard to do because the tool any tools that you would be using are actually themselves subject to change. you know at some point probability theory you know was invented and then people like Kogurov you know really massively improved it. So if you had a science a predictive science of science sort of back around whatever 1920 or so 1930 when he was getting ready to do that I think it was 1933 his famous paper the tools that you would be using were actually intertwined with the tools which he was discovering for people like Pearl and others with sort of the modern theory of causal inference. So there is this sort of funny intertwining our very epistemic tools are going to be changed. So, I don't know. I That seems very very very hard. Do you think that's a limit that that we're reaching because we're human or do you think that's more of a fundamental limit? I'm thinking of whether we could use AI to to understand AI, whether we could in a hopeful scenario dedicate a lot of say AI inference or AI thinking to trying to help us navigate or understand this terrain. Sure. Sure. Sure. I mean, we we actually I mean, we do this kind of thing. So, one estimate I've seen is that the US spends about $300 billion a year on fire safety. And so, at some level, that's actually it's actually quite a similar sort of a situation. And a lot of that is monitoring, trying to understand what are the threat vectors, you know, what kinds of technological modification should we do locally to make things more spy safe and so on. And it's it's truly remarkable the number is that large. And you can imagine sort of, you know, dedicating a substantial fraction of all the world's AI resources to monitoring and attempting to understand and make legible what the other AIs are doing. It's kind of like a justice system, you know, for AI. The problem is it's difficult to know where to begin there, right? What is it exactly? You have your overseeing AI system do because you're you're trying you're you're kind of trying to have the AI solve the problem for you, but be before you can begin solving the problem, you need to you need to kind of stake out the territory that the the where the work is even done. It just seems I mean so hard sort of I mean for several several separate reasons. One is just the possibility to do stuff sort of encrypted away from you know where it becomes illegible to outside eyes and the other is the intrinsic illegibility of intellectual work anyway. You know it it can just be sort of very difficult to understand the implications of what is going on. So you know you're monitoring your your systems and you you know it it may be very difficult to tell that that something is actually something that that has very negative implications. Really tricky problem. great problem for a philosopher like yourself. If it was just intellectual, it would be fascinating. Yeah. I mean, on that note, as a as a final topic here, I I would love for us to talk about your recent essay on deep atheism and optimistic cosmism. The these are two different worldviews or approaches to to seeing the world. H how do they how do they differ? Oh, yeah. Yeah, I mean this was just so Joe Joe Carl Smith wrote this a great essay series otherness and control in the a age of AGI really just about how how human beings think about the control of technology the control of the universe in general how they relate to the universe and deep atheism is Joe's term for having a very fundamental stance of distrust towards the universe people often object they hear this term and that's not atheism you know it has nothing really to do with whether or not god exists which I'm sympathetic to as an objection, but let's let's stick with the term anyway. Yeah. And I suppose I mean I just I read I read this essay of Joe's a year or so ago. I thought that was interesting and then I was surprised by just how often the concept has has come to mind ever since. And so I wanted to understand just a little bit more for myself what I think about about this. I think in particular I've been reading William James's in fact I've just finished just before this William James' wonderful book the varieties of religious experience and James does this really wonderful thing which is instead of sort of asking the question you know is this true or whatever he just asks what are individuals religious you know religious and mystical experience and in quite a non-judgmental way he's just interested in kind of you know essentially going and looking for accounts of oh this is what it's like to you know go into a trans Oh, this is what it's like to have a sudden conversion experience and so on in a very open way. He's not completely non-judgmental. He's not a credulous person, but he is a person who is quite open to just hearing a lot of different things. And I find that that very beautiful. And I think of this this idea of deep atheism as it's a particular version of that. Some people really believe that as sort of a psychological stance that the universe is kind of out out to get them while others are much more more trusting and it's interesting to to to try and understand what causes that orientation. I think it's related surprisingly much to your your views on on AI express. You know, some some things like I said I I said before understanding things like the great oxygenation event or the origin of nuclear weapons and what probably changed my stance towards the universe to starting to think, oh wow, there's like these enormously powerful things that are sometimes hidden in plain sight and the only barrier is understanding. we learn a little bit more and we realize oh wow we can change the world a lot sometimes in really negative ways. So that's kind of a you know a change in my own stance which was caused by these I mean apparently quite innocuous types of understanding. But um I'm very interested actually in the the question. I'm in I happened to be in New York today and um I lived here for just a sort of a couple of months last year and I was fascinated moving from neighborhood to neighborhood at the types of feeling in each neighborhood and the types of institutions which were available in some sense. You know what you see walking down the street particularly as a kid. Yeah. That's kind of the set of actions which are available to you. Oh, you know, you can I can go to the park. Oh, I can play on the swing. Well, that's only true if there are parks in your neighborhood. Otherwise, you s you don't internalize that. You know, if you're a Stanford student, you internalize, oh, if I want to do something, I can I can raise venture capital and, you know, start a company. If you go to, you know, certain other high schools or universities, you you're not going to have that experience. It's not a verb that that's in in your lexicon. So, so, so also sort of thinking about the way in which that kind of experience conditions people like what sets their level of optimism, what sets their level of agency, what sets sort of to what extent do they feel a very win-win orientation versus something else? What experiences in their past? It's not just experiences. It's there's genetics, there's environmental determinants, there's myths which which condition it. These are all I suppose sort of closely related to this orientation towards debatheism which I got interested in because of the the to some extent just because of the connection to to AI express. It's very very random of connections but these factors probably play a large role in in in how people in what people end up believing about AI risk in the end. We're not always as as rational as we might as we might hope that we were. Actually, one of the things that I I love in in Joe Kosma's essay is you he takes a sort of kind of his prototypical example in many ways is Elliot Yakowski who of course is one of the people who has has done a great deal to develop and and and popularize these concerns and he finds quite a few examples to suggest that Yowski like really you know he does have this attitude of fundamental distrust towards the universe which I thought I mean just is fascinating. It's a very concrete example of the connection that that you mentioned. The psychologist Danny Kanan, I heard him talk on a podcast once and he said in his opinion like how optimistic or pessimistic a person is tends to be just a personality feature which is not particularly rationally grounded and there's not much you can do to change it, but it does determine an awful lot about what you believe about the world. And this is also clearly, you know, very closely very closely connected. At the at the same time though there is this you know there is a reality out there that something is going to happen once we develop superhuman AI systems in some sense there there might be a test of whether the world is friendly to us or whether the world is out to get us. Yeah I I mean it's going to be quite a test. I don't know. I'm laughing better than crying. I I must admit I do find the semi paradox quite concerning this context. you know, somebody else out there should have if if it was possible to navigate ASI, it it it should have been developed and and they should have colonized the stars and we should see them everywhere. It worries me that that that's not the case. That that I think is part of the reason why I worry a lot about the vulnerable world hypothesis. You know, the FEMY paradox seems like a considerable piece of evidence in favor of the vulnerable world hypothesis. I should say by the way actually people sort of worry about being guilt by ASI. I I I at some level you the alignment problem still persists even for you know if you had a society that was just AGIS you know the problem in many ways for them would be even worse they would have more power they would have more capacity to wreak destruction unless you kind of get to this singleton situation where there's just sort of you know essentially a single force running everything you we sometimes think of the alonement problem as being just sort of a a problem for humanity But it's actually a problem in general. Yeah, I agree. Michael, thanks for chatting with me. It's been it's been a pleasure. Thank you so much, guys.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs