Library / In focus

Future of Life Institute PodcastCivilisational risk and strategy

How AI Could Help Overthrow Governments (with Tom Davidson)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through How AI Could Help Overthrow Governments (with Tom Davidson), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 113 full-transcript segments: median 0 · mean -4 · spread -31–17 (p10–p90 -10–0) · 9% risk-forward, 91% mixed, 0% opportunity-forward slices.

Slice bands

113 slices · p10–p90 -10–0

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 113 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyflicore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video ER7_Mr7IJn0 · stored Apr 2, 2026 · 3,062 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/how-ai-could-help-overthrow-governments-with-tom-davidson.json when you have a listen-based summary.

Show full transcript

It's in everyone's interest to prevent a coup. Currently, no one small group has complete control. If everyone can be aware of these risks and aware of the steps towards them and kind of collectively ensuring that no one is going in that direction, then we can all kind of keep each other in check. So I do think in principle the problem is solvable. you should always have at least a classifier on top of the system which is you know looking for harmful activities and then kind of shutting down the interaction if something harmful is detected. We could program those AIs to maintain a balance of power. So rather than handing off to AIs that just follow the CEO's commands or AIs that follow president's commands, we can hand off to AIS that follow the law, follow the company rules, report any suspicious activity to, you know, various powerful human stakeholders. And then by the time things are going really fast, we've already kind of got this whole layer of AI that is is maintaining balance of power. Welcome to the future of life institute podcast. My name is Gus Dcker and I'm here with Tom Davidson who's a senior research fellow at Fore. Tom, welcome to the podcast. It's a pleasure to be here, Gus. We're going to talk about AI coups and the possibility of future AI systems basically taking over governments or states. Which features would future AI systems need to have in order for them to accomplish this? What should we be looking out for? Great question. One one thing I'll flag up front is that what what I've been focused on recently is not the kind of traditional idea that AIs themselves will kind of rise up against humanity and take over the government, but that like a few you know very powerful individuals will use AI to seize a legitimate power for themselves. So the kind of phrase that we're we're we're often using is AI enabled coups where the kind of the main instigators are actually people. In terms of capabilities, um, yeah, I think there's there's there there's a few different domains which in my analysis are like particularly important for seizing political power. So there's the kind of skills that that politicians and kind of business leaders use today. So things like persuasion, business strategy, political strategy, just kind of pure kind of productivity at at a wide variety of tasks. And and then there's kind of more kind of almost hard power skills. So in particular, cyber offense, which is already somewhat useful in military warfare and has been becoming more useful. And then you know I expect that as as AI increasingly automates different parts of the military and as AI is embedded in more and more important high stakes processes that will raise the importance of cyber offense is now you know whereas you can't hack a human mind as as we hand off more important tasks to digital systems they will be able to be hacked much more easily. So I expect cyber to become more important for hard power and then you know the the ultimate kind of most scary capability that you know I think ultimately will drive a lot of risk is when we get to the point that AI systems and robots are able to fully replace human military personnel that's fully replace human soldiers on the ground boots from the ground fully replace the kind of commanders and strategists and you know that that that might seem like a long way off today but actually you know just the last few years we've seen a more importance of kind of AI controlled drones in warfare and I expect that trend to continue and what we're already seeing is that you know as soon as the technology is there to kind of reliably automate military capabilities there's you know geopolitical competition drives that adoption and so you know I think it's going to be surprisingly soon that we that we do get AI controlling kind of surprising amounts of of of you know you know real hard military power and then one kind of wrapper for all of all of these things is the the automation of AI research itself. So today there's you know few few hundred um few thousand you know top top human experts that drive forward AI algorithmic progress and my expectation is that you know there's a good chance the next few years that AI systems are able to match you know even the top human experts and their capabilities and that would mean we go from maybe a thousand top kind of researchers to to know millions of automated AI researchers and that could mean that you know all of these different capabilities all of these different domains I've been talking about they all progress much more quickly than than we might have expected um just by naively extrapolating the recent pace of progress. And you know in my view and in the view of many the recent pace of progress is already quite alarming in that you know five years ago we just had really very basic language models that could string together a few sentences a few paragraphs and then went off topic. And now already we're getting kind of very impressive reasoning systems that are doing tough math problems and and helping a lot with difficult coding tasks. So, you know, bring bring that all together. I think there's a lot of kind of soft skills, a lot of hard power skills that they're relevant here, but like probably the most important thing to be watching is how good AI is at AI research itself as that could kind of bring make make them all happen um quite suddenly. Yeah. Could you describe in more concrete terms what an AI enabled military coup would look like? Some example to kind of make this concrete for us. Yeah, absolutely. So you can you can you draw draw an analogy to historical coups where there's you know often a minority of the military launches a coup and then kind of presents it as a fair comple and you know is able to prevent you know so chaos or discord or threaten individuals to prevent anyone from kind of actively opposing them and then in the absence of active opposition it just seems like well they've done it you know this is this is the new state of effects. So you know that that that's a good starting point. Then you know the AI enabled part is is where we deviate. So historically you you needed at least you know a decently sized contingent of humans to go along with the coup. Um and you needed to persuade you know quite senior military officials not to propose it. I think that would change as we automate more and more of the military. And so the most simple way that this happens is just that the head of state, you know, it could be the president of the United States just says, "Yeah, we've got the technology now to make a robot army." And I want the army to be loyal to me. I mean, I'm the commander-in-chief. Obviously, that's how it should be. They're going to follow my instructions. No need to worry about, you know, whether I'm going to order them to do anything illegal. Like, we can put in maybe some kind of normal legal safeguards. Let's not worry too much about that. The main thing is that they're loyal to me. And then to my knowledge, you know, that would be highly controversial or would definitely be against the principles of the constitution, but it's unclear to me that it would be literally illegal. We just haven't had this kind of technology. We haven't legislated for it. You know, the constitution is not robust to this kind of really powerful military technology. And so it's not surprising if you know at best this is just a very kind of unclear legal territory but you've got the head of state pushing really hard for that robot army to follow their instructions and you know the head of state in the United States has a lot of political power and so you know the most simple way is that he just pushes hard for that he gets what he wants maybe he's using you know kind of emergencies at home or you know tense geopolitical tensions to kind of push it through and say that it's necessary. Maybe he's firing, you know, senior military officials that disagree. Maybe he's already got Congress to be very very kind of fervently supporting and and and loyal to him and not, you know, being that kind of careful and open-minded when assessing like that the that the opposition that that people will be making as this has happened. So that that's the kind of the first, you know, really just plain and simple way that that that that we could get this robot army is built. It's made law to the head of state. and I say just instructs it stage a coup and it does it you know robots surround the White House and brutally suppress human protesters and then you know even if people go on strike and stop working then you know you can have then AI systems and robots replace people in the economy. So humans have kind of really lost their bargaining power that they normally have that would kind of strongly disincentivize military coups in in in most countries. Yeah, this is really a change from the um from the normal coups of history where you would have to have buy in from at least some segment of the population that are regular humans and you would need to kind of continually support that buy in and make alliances and and uphold those alliances. But this is this has changed now that you're talking about AI and AIs and robots that can basically be made loyal to a company or or a head of state in a in a way that's more durable. Do you think we have other kind of historical precedents for thinking about how the dynamics of what it's like to attempt a coup um how those dynamics play out? Yeah, just one quick thing on that last point. I want to emphasize how there is a bit of a phase shift at the point in which AI can fully replace other humans, you know, in the government, in the military. when AI is augmenting other humans, you don't you don't have this effect because a leader must still rely on those other humans to kind of work with the AIS to do the work. But there really is this phase shift when AIs and robots can fully replace the humans because then yeah, a leader doesn't need to rely on anyone else. So I think that that's an important one to recognize in terms of historical precedence. You know, the the other big one I point to is recent trends in political backsliding, often called democratic backsliding. So the most kind of end-to-end clear and cut case is Venezuela where you had in in in the 70s a fairly healthy democracy that had been there for decades and then increasing backside increasing polarization kind of like what we're seeing in the US recently and then you know an increasing you know explicit commitment by by kind of the leader that you know he wanted removed checks and balances on his power and that the will of the people you know was being obstructed by various democratic processes and institutions. Um and then you know o over the coming decades you know it has transformed into uh an authoritarian state and you know many commentators have pointed out these trends in the US recently over the past past 10 years and and it even goes back before the past 10 years to be honest in terms of the broad kind of political climate. Um, and then there's, you know, kind of the example of Hungary where where where again kind of elected elected leaders are just kind of removing the checks and balances in their power, kind of buying off the media or kind of threatening media outlets to to be more pro government, not providing them with contracts or kind of litigating them if they criticize the government. all all these kind of standard tools where it's now like a lot harder to to point at one thing that's clearly egregious. But when you add up the kind of hundreds of little cuts, hundreds of little paper cuts of democracy that are being systematically administered, you're you're seeing a real kind of loss of democratic control and concentration of power. And so again, you know, AI could, you know, exacerbate and enable that dynamic. And again, the most straightforward way is you're just replacing human in powerful institutions. You're replacing the humans there with AIS that, you know, very very very loyal and obedient to the head of state. So, you know, think about Doge and, you know, they tried to fire people, there was push back, you know, the state needs to function. Imagine if you could just have AI systems that could fully replace all of those employees and could be made fully loyal to the president. How much easier would it be to kind of push through push push through some of those layoffs or even just create entirely new government bodies that essentially just take on the tasks that were previously done by our bodies and that those old bodies kind of rot away or kind of slow slowly kind of prevent them from making decisions. So and then the other big way is if if the kind of head head of state is able to get access to much more powerful AI capabilities than their political opponents maybe because state is very involved in AI development then that's another way that they could get a head up you know making more persuasive propaganda and more compelling political strategy to like you know embed their embed their power more. You segment the ways in which uh AI can enable coups into three categories where you can talk about singular loyalties, secret loyalties and exclusive access. Perhaps we can run through those and talk about where those would play out. Starting with with singular loyalties for example. Yeah. So singular loyalties is what we we've just been talking about. that is deploying AI systems that are kind of overtly obviously very loyal to just you know existing um powerful people. So you know in particular I'm thinking about the head of state here as the main the main threat. And so I think I think we basically already covered it. The two the two main angles in my mind are um deploying loyal AIS powerful government institutions and in the military secret loyalties is a very different threat model. um it's much more as you would expect secretive. So you know the main threat model I have in mind to make it concrete is that an AI company CEO has you know automated all of AI research. So they could fire their staff at that point because the AIS can just do the work. Instead, maybe they put the staff onto some product work, but you know, the core work of driving AI progress ever further forward, making, you know, increasingly intelligent AI is pretty much just done by AI systems. And at that point, they realize, well, they're in a bit of a precarious position that they're they're controlling this hugely powerful technology. Their power is pretty much unconstrained. Not not literally constrained, but there's very few checks and balances current currently on on these CEOs. And they might anticipate that the government's going to realize how big a deal this is. They're going to kind of lose their influence. Maybe they worry the government will do something unethical with the AI technology. Maybe they worry that they'll be used for for a war or something. And there's all kinds of justifications they could come up with thinking, I don't want someone else taking control of this really powerful technology that I currently control and obviously I'll use it for for good. And so they might, you know, speak to maybe some AI advisers about this and say, well, what, you know, what what should I do here? it seems I'm in a little bit of a precarious position. And then a solution that they might think of or an AI very smart AI system that's helping them might think of is well why don't you just create a back door in in the the super smart systems that we'll be building next month where like if if the government tries to order those systems to do something that that that you definitely approve disapprove of then the air systems will just refuse to do it. those, you know, those those robots, those powerful robots that being deployed in the army will just f refuse to follow orders from the government if, you know, they're they're really awful and ethical and things you wouldn't like. And so then you can have all the future AIs now that are developed by by this organization are going to have this back door or what I call a secret loyalty where though they they appear to follow the law, though they appear to kind of, you know, uphold democracy, in fact, they are secretly still kind of loyal to the CEO. And that could lead to a coup. And and again the most direct route is a military coup where if ultimately this company's AIS are building military robots, military drones, military communication systems, then because they themsel because the AI building all this military stuff are secretly loyal, they can kind of pass their secret loyalties on to to the military infrastructure or they can just insert vulnerabilities in the military infrastructure that only they know about. And you know here we are we are you know I am primarily thinking about you know very superhumanly intelligent AIs you know are able to kind of design complex machinery that like a human can't kind of check and check if there's a back door and that then means that we've now got a military infrastructure that is also ultimately can be controlled by the the CEO. And so at at this point, if someone tries to remove the CEO, if something is really bad, having the CEO doesn't like, they could just be like, "Okay, I'm now in charge. I'm I've instructed my this military infrastructure to do the coup." Again, the robots around the White House. This time it's it's it's the company CEO that that that they're loyal to. And then, you know, the rest of the story can play out just as in our previous discussion. How do these secret loyalties relate to sleeper agents? And maybe you could you could explain what a sleeper agent is. Yes, great. So, sleeper agent is the kind of standard term used in in in the technical literature and today's sleeper agent not very sophisticated. So an example of a sleeper agent is that if this if the AI sees that it's the kind of the any most of the time the AI will just write kind of reliable code as instructed but if the AI sees that the year is 2026 then the AI will write code with vulnerabilities. So there's this kind of trigger that that triggers this malicious behavior. And you know, normally if you were, you know, interacting with CI, you really wouldn't be able to tell because it would just always be, you know, acting in as you'd intended. Um, but you know, because it's this one trigger, it's very hard to find. But then you do get this malicious behavior. Now, I'm not worried about sleeper agents today because to to cause real harm, we would need a sleeper agent to be very sophisticated. you know, it could never accidentally trigger or very very rarely accidentally trigger and it would have to be able to do, you know, very intense complicated tasks like, you know, building a military robot and ensuring that that robot was actually had a back door. You know, that's very very, you know, far beyond what AI today can do. So, you know, I think that sleeper agents provide a basic proof of concept that it's possible for a malicious actor to kind of gain elicit control over a system and then have that system be deployed in the rest of the economy. and potentially without people noticing. Um, but they're not yet scary. And then the kind of secret loyalties is just what I call the kind of scary situation where you now have a very sophisticated AI system that doesn't just have any old, it's not any old sleeper agent, it's a sleeper agent which is specifically logged to one person trying to help them seize power. Yeah. So, what we're imagining here could be, for example, a a US-based AI company integrated into the into the US military. The CEO of the company wishes to to ultimately be in control of what happens. And so he engineers or he instructs perhaps AIS or or human engineers to create a sleeper agent in these systems that can be activated as at his command such that the US US military officials think they're in control of the systems. the systems behave in in ways that they approve of throughout a perhaps a quite a long period until there's some way in until the sleeper agent is activated in some way. And perhaps that would be more sophisticated than changing the date or giving it some phrase. But you you can imagine advanced versions of of sleeper agent that that could actually behave in this way. Do you think that's realistic? Do you think do you think sleeper agents can can get can can can become that advanced? Yes, I do. I think you know we can one analogy is to human spies. You know human spies they they're basically most of the time they're they're kind of doing their assigned job as expected and it's not like one simple trigger phrase or one year makes the spy do something covert or malicious. They are just fully aware of their surroundings. they're kind of constantly processing their inputs and they choose strategically exactly what the the kind of ideal time is to, you know, steal some information, send send an illicit email. And so, you know, in my mind, that's that that's by far the most scary sleeper agent. Not not not one that's triggered by a password, but one that is kind of holistically making a decision about how and when to act out. I mean the password ones are actually quite fragile because you know if you were the military and you're deploying your AI system and you're worried there's a password what you can do is you can scramble all the inputs. You can kind of paraphrase all of the instructions it gets and that might just mean that the password if it ever ever someone tried to insert it would be kind of rescrambled and would just never actually come up. Um, so I I'm not actually worried about the kind of simple password triggered sleeper agents, but again, they're basic proofs of concept and I think that as AIS become as smart and smarter than humans that there's reason there's strong reason to think that it will be possible to build much more sophisticated ones. One one thing I will briefly say is that you know people often talk about misaligned AI scheming and you know this is just the same idea where you know in fact the argument for secret loyalties being worrying is much stronger where you know misalignment you know there is evidence of misalignment we don't yet have you know strong evidence of you know really sophisticated scheming emerging accidentally but if if humans and a human team of engineers or an AI team of engineers were specific ly trying to build a system that that was kind of covertly thinking about when when when when to kind of act out, then it it's much more plausible that it that it could happen. And then you have exclusive access, which is different from singular loyalties or or secret loyalties. Why is that its own category? Yeah. So in my mind the kind of singular or overt loyalties and the secret loyalties both of those threat models go through deploying AI systems in really important parts of the economy. So you know in particular government and military what I focused on but you know for those that models it's you know you you actually need the rest of society to choose to deploy those AI systems and hand off a lot of power to them. Um, and so I I I kind of have this third threat model of exclusive access to think about another possibility, which is that maybe even without people choosing to deploy AI systems and give them a lot of power, even without that, maybe AI systems can be powerful enough to help a small group sees power. So the the prototypical situation I'm imagining here is you know there's a kind of one AI project which is you know somewhat ahead of the others and maybe it goes through an intelligence explosion whereas by which I mean kind of AI can automate AI research and then AI quickly becomes super super intelligent compared to humans and then you know that project maybe has a few senior kind of executives or senior political figures that are kind of very very involved and have a lot of control and they might just be able to, you know, siphon off, you know, 1% of the project's compute and say, "Okay, we're now running these these these, you know, super intelligent AI systems and saying how how can we best seize power?" Um, and then there there's kind of millions of them. They're doing, you know, every single day they're they're doing a month of research. every single week they're doing a year's worth of research into okay, how can we, you know, how can we gain this political system? How can we, you know, hack in into these systems? How can we, you know, ensure that we end up controlling the military robots when they are deployed by hook or by crook? Um, and I think that that that threat model could start to apply earlier in the game. That could start to apply before anyone even realizes there's a risk because, you know, this is just essentially all happening on a server somewhere. But actually, it's possible that the game could be won and lost by the massive advantage that a small group get by by being able to kind of co-op this huge huge intellectual force. Um, and so I think it's worth tracking that threat vector independently. But it does, you know, it does definitely interact with these other other with the singular loies and the secret loyalties because no one strategy that your kind of army of super intelligent AIs may come up with is oh why don't you like use the fact that your head of state to like push for the robots to be loyal to you and like here's how you could buy off the opposition. So confusion another strategy might be oh why don't I just help you put back doors and all this military equipment so that then you could use it to stage a coup. But there might also be other ways, you know, maybe maybe it's possible to very quickly create, you know, entirely new weapons which you can use to overpower the military without anyone knowing. Or maybe it's possible to, you know, gain power in other ways. Yeah. Yeah. I mean one thing that would make this kind of future hypothetical situation different from today is that today it seems that there are leading AI companies but over time capabilities kind of emerge in in in second tier companies and in open source and so there's not that much of a gap between the leading companies and what is broadly available and perhaps what is publicly available. That's something that would change in in the scenarios you imagine. So perhaps explain why the gap in capabilities between the the one leading project and all of the others is so important. A few factors there. So in terms of why it's important, it's just what you've said. You know, a lot of these a lot of these threat models kind of exacerbated if there's one one one one group of people that has access to much more powerful AI than than other groups. If you know, if open source is pretty much on par with with with the cutting edge, then everyone will have access to similly powerful AI. I will say that even if open source is is kind of on par, that doesn't mean we're fine because we could still choose to deploy AI systems in the military and the government and still choose to make them loyal to the head of state. when we're when we're choosing to hand off control to AIs, it doesn't matter if there's 100 AI companies that we're only handing off control to some AIs and maybe you know the government will ensure that they do have particular loyalties. So I will say you know this risk doesn't go away if if we have you know lots of different AI companies and open source close to each other but it does it does become lower because the kind of exclusive access point where one group has access to super intelligence and the other group doesn't have access to much that goes away and I think it's a lot harder to pull off secret loyalties if everyone's kind of roughly equal to each other because it becomes a bit more confusing why your systems in particular end up controlling so much of of the military or so widely deployed and it becomes confusing how no one else was able to realize you were doing the secret loyalties when they were kind of equally able to to do it or equally technology sophisticated and potentially detect your secret loyalties. So I I do think it makes a big difference in terms of why I think it's plausible that that there's a much bigger gap between the lead project and other projects that there's a few different factors. The most plain and simple one is that the cost of AI development is going up very quickly. we're kind of spending about three times as much every year on developing AI. Um, and that's just going to get too expensive for many players. You know, if and when we're talking about trillion dollar development projects, which I do expect, then very few can afford that. And also, there's just only so many computer chips in the world. If you want to have that, you know that currently the number of kind of computer chips produced each year is less than a trillion dollars worth. So if if we get to a world where you know the way to go to the next level of AI is to spend a trillion dollars then you know only one company will be able to do that and maybe maybe we stop a bit earlier maybe we just stop with you know there's two companies both doing half a trillion but you know we would be really kind of kneecapping the level of progress if if we stopped long long long before that and there would just be strong incentives for companies to merge or one company to outbid others in order to like you know really raise the amount of money that's being spent on AI development. you know, this is all assuming that, you know, we can build really powerful AI and it is economically profitable, which for for me isn't all in the background of the scenario. Um, so that that that's the first kind of straightforward reason why I think we'll we'll see a kind of a smaller number of projects and we'll see kind of big gaps because when when you're spending 100 times less on development, then that's that's going to be a bigger gap. That's the first reason. The other reason I've already talked about the the idea of an intelligence explosion um when we automate our research even if companies are fairly close maybe that one is a few months behind the company that's a few months ahead automates our research in that next three months they make massive progress so then there's actually like a really big capabilities gap even though it's still just a three-month lead. So there's a question whether they can use that kind of temporary speed to kind of get get a more permanent advantage. And then the last big reason is just kind of government-ledd centralization. It's already been talk of Manhattan project and CERN for AI and you know I think there's there's there's reasons to do those projects. They can help with safety in some significant ways, but they would, you know, exacerbate this risk because yeah, if you pull all the US or all the United States computing resources into one big project, it's going to be way ahead of any other project and you pull all of its talent and all of its data, then yeah, you'll see a really big gap and that that would definitely um make make it a lot easier for a small group to to to do an AI enabled coup. Yeah, you're you're kind of putting a big a big prize out there for someone who who's interest or who's considering a coup, right? If you're if you're concentrating all of the power, all of the resources, all of the talent into one project, then well, that's where you got to go if if you are if you're a a coup planner. Yeah. And just to be I don't particularly expect that anyone is planning any coups. In fact, I'd be very surprised. I'd more think it's, you know, you want to be powerful. You want to be a big deal. You want to be changing the world. So, yeah, obviously you want to lean the main lead the main project. And then you don't want anyone else to come in and mess you up mess it up. So, obviously, you want to protect the fact you're leading that project. You don't want anyone else to, you know, misuse AI. I think it's kind of step by step. You you just kind of head down that road of more and more power. And then, yeah, you know, often in history that that road does end in just consolidating power, you know, to a complete extent. And I mean it it it can be. So what we're imagining here are times in which AI is moving in at incredible uh speed, right? The the the pace of progress is insane. There's a bunch of confusing information. People are acting under radical uncertainty. And perhaps in those situations, it's tempting to think that you are the person that can that can lead this project. And perhaps you're doing this out of supposedly kind of altruistic reasons. you're thinking that I need to do this in order to prevent other people that would perform worse than me at at at this project. And so you're kind of slowly convincing yourself that it it it would be the right thing for you to do to to take over in perhaps a forceful way. Yeah. You know, I don't think Xiinping or Putin think that they are the bad guys. You know, I think that they have you know, probably have sophisticated justifications for what they're doing. Perhaps here is a good point to talk about the possibility of one uh state or company outgrowing the entire world. This this this relates to the to the problem of exclusive access because if you have uh one company or or one government outgrowing out growing the entire world, then you have that company or government with exclusive access to advanced AI. Um, so how could this happen? How likely do you think it is that growth could be so incredibly fast that one company would outgrow all of the others? Yeah. So there's two possibilities we could focus on. The one that I think is is pretty plausible is that one country could outgrow all of the other countries in the world. So what that would mean is you know today US is 25% of world GDP but this would be a scenario where it is leading on AI it is this is already the case that you know it maintains its lead it maintains its control over compute um and then when it develops you know really powerful AI it prevents other nations from doing the same you know this is already beginning with export controls on China and that kind of you embeds its lead and then it, you know, uses that AI to develop powerful new technologies and, you know, it's in control of that of those technologies. It uses AI to kind of automate cognitive labor throughout the US and maybe worldwide. And, you know, countries that don't use its AI systems will will be, you know, really hard hit economically. And so, we're kind of massively centralizing power in in the US. And if the US is able to maintain exclusive control over over you know smarter than human AI then it seems pretty plausible to me you know very likely that the US would be able to rise to you know strong majority you know more than 90% of world GDP um and there you know there's a few different you know dynamics that are driving that first is that you know labor currently human labor receives, you know, about half of world GDP, you know, just half of GDP is paid out in wages, AI will ultimately and robots will ultimately be better than humans at kind of all economic tasks. And so if if if if the US controls all the AI companies that are replacing human labor, then you know that that half of that that kind of 50% of GDP which is currently going to human workers will ultimately be reallocated to paying to whoever controls and owns those those AI systems are US companies. Um you know there's a wrinkle there because some some some of some of that is is physical labor and you know US doesn't currently have a lead there. you know, physical robots. In fact, China is quite far ahead. But in terms of at least the cognitive aspects of of of our jobs, you know, so we're talking, you know, significant fraction of GDP that would just now be reallocated to to US companies that control AI. So that that already gets them from 25% to above 50%. Then we've got this further dynamic which is the dynamic of super exponential growth. So this relates to kind of previous work I've done on how AI might affect the dynamics of economic growth. But you know the kind of very potted summary is that it's often quoted that over the last 150 years economic growth has been roughly exponential. Um and what that means is that if two countries are growing exponentially and one country starts off you know maybe twice as big as the other country then at a later time still one country is twice as big as the other country. So let's say you know the US economy is 10 times as big as the UK economy then if they're both growing exponentially at the same pace then you know 10 years later again the US will still be 10 times as big as the UK. So that's exponential growth. That's what we've seen over the last 150 years. If you look back further in history, we see super exponential growth. That means that the growth rate itself gets faster over time. So, you know, an example would be that 100,000 years ago, you know, the economy wasn't really growing at all. If you think it was growing, it was maybe, you know, doubling every 10,000 years or something in size. You know, very extremely slow economic growth. Then going from about 10,000 years ago, it seems more like ballpark, there's a there's a doubling of the economy every thousand years. Still incredibly slow economic growth. You zoom back in in kind of 1400, you can begin to detect, you know, okay, more like, you know, every 300 years or so, the economy is doubling. And then in recent times, we've seen that the economy is doubling every every 30 years. So essentially, you know, the growth rate is getting faster, the doubling times are getting shorter. That's exponential growth. And there there's there's various reasons, economic reasons, theoretical reasons, empirical reasons to think that AI and robotics when it can replace humans entirely, we'll go back to that super exponential regime that that has been at play throughout history. What that means is the growth is getting faster and faster over time. And the reason the reason I'm I'm saying all this, the reason this is all relevant is that, you know, go back to that example of the US and the UK. The US is currently 10 times bigger than the UK. If the US is on a super exponential growth trajectory, its growth is getting faster and faster over time and that means that even if the UK is on that same super exponential growth trajectory as they both go super exponentially, the US will pull further and further ahead of the UK because you know maybe the US is doubling in 10 years because it's already it's already bigger. it's already further along the curve whereas the UK you know is is still doubling only every 20 years and so that means that the US you know is now rather than just 10 times bigger than the UK the US is now you know going to be 20 times 30 times bigger in size than the UK so if the US is able to to kind of if if there is super exponential growth and the US is able to kind of be be bigger to begin with and therefore be further progressed along that super exponent cial growth trajectory then then that's another way that they could just you know continue to increase their their size of the economic pie and ultimately you know come come to completely dominate world GDP. So you know ju just to sum up everything I've said today US is 25% of world GDP if it controls and develops AI that that could easily boost it above 50%. I'd be very surprised if it didn't. And then from that point, you know, it's already bigger than the rest of the world combined. If it's able to then go on the super exponential growth path, then it will grow faster and faster over time and pull further and further ahead of the rest of the world that, you know, may be able to grow super exponentially. If they can also develop AI, but you know, we'll we'll still be falling further and further behind because because the nature of super exponential growth. Yeah, this this actually seems quite plausible to me and not very sci-fi. The thing that seems quite sci-fi is the notion that perhaps even one company could could grow at such a speed that it would outgrow the rest of the world. How how how likely is that? Yeah, great question. I think it's a lot harder, but it is it is surprisingly plausible. You know that first part of the argument I gave about how 50% of you know the world GDP is paid to human workers. You know if that went to AI that would be a big a big chunk. It is possible that one company could get a monopoly on on kind of really advanced AI. So I we already discussed some of the dynamics there where again the simplest one is just a combination of an intelligence exposion giving a company a big advantage and then they're kind of buying up all the computer chips that the world is able to produce and out bidding everyone. If if a company does that and already you know it's you know seems to be out bidding other companies on on compute although although Google also al also has a lot if if a company is able to do that they could end up just one company in in control of literally all of the world's cognitive labor you know because human cognitive labor will at some point be kind of dwarfed by AI cognitive labor so at that point that one company could be getting you know all of the all all of all all of GDP which is currently paid to kind of cognitive labor which a large part of the economy as I said you know may maybe as high as 50% but you know certainly as high as 30% of world GDP if all that all that would then you know seemingly be going to this one company that that controls the world supply of cognitive labor so I think that would take time and obviously it's going to take a long time to um automate all the different parts of the economy um there is just a basic dynamic by which one company can now be controlling you know double digit percentages of of world GDP and there's obviously questions would a government allow that would they step in and that's where we get into the you know the these dynamics of like well this company has all these super intelligent AIs on it side maybe it's able to lobby maybe it's able to do political capture to avoid the state stepping in maybe it's able to be like look we're providing like economic abundance for everyone you step in like you know that that that that might not happen. You know we're underpinning your nation's you know economic and geopolitical strength and if you try and you know remove you know step in and nationalize then you know that's not going to happen. We're going to move to another country you know. So you can you can imagine maybe maybe they convince the the head of state to kind of support support them and there's some kind of alliance there but you know it it's not completely obvious that that the company would be shut down. It would it would have certain types of serious bargaining power. So if a company was able to maintain this position as kind of sole provider of cognitive labor, it would be able to get a significant fraction of world GDP and then it's then possible that from there it could it could bootstrap. And this is where it gets a bit harder, but that the tactic it would need to pursue is it already controls most of the cognitive labor, pretty much all of it. thing it doesn't control is all the kind of physical machinery and all the raw materials that are also needed to create economic output. But it can pursue a tactic of kind of hoarding its cognitive um labor so that no one else can ever have access to that and then kind of selling it at kind of really kind of monopolistic rents to the rest of the world because there's there's no one that can match it. is is you know it's offering everyone by far the best deal they can get but just know skimming off know 90% of the value ad from from companies using it its AI systems. So if it's able to do that then it can kind of it can kind of reap by far the the majority of the benefits of trade and then maybe it can kind of buy increasingly buy up physical machinery and raw materials from the rest of the world design its own robots buy its own land you know imagine like a kind of big special economic zone in Texas or something where you know this company is kind of unconstrained by kind of bureaucracy and then it's also now you know got a big arm um somewhere in Siberia and in Canada, it's kind of creating these big special economic zones by doing deals with specific governments. And I I do think it's a bit of a stretch that this all goes ahead without, you know, various other powerful political and economic actors pushing back. But like the kind of basic economic growth dynamics are like surprisingly compatible with with with a company, you know, ultimately coming to control most of the cognitive labor and most of the kind of physical infrastructure that its AIS have designed using all the all the parts that's bought from the rest of the economy. Yeah. And do you think this is a risk factor for AI enabled coups then just because you're concentrating all of the power and all of the resources into either perhaps one country or one company even? Yes, I definitely do. The the more realistic path is that a company kind of starts down this path of outgoing the world gets kind of huge economic power increasingly controls the country's industrial base. It's kind of physical infrastructure, manufacturing capabilities. And then from there, it's in a much stronger position to seize political control because it's got massive economic leverage. And then it can also increasingly gain military leverage because as it you know, as it increasingly controls the country's broad industry and manufacturing, that will feed in to military power. So, you know, some of the, you know, possibilities I discussed earlier where, you know, you could potentially have your AIS be secretly loyal that ultimately design the military systems or you could just instruct your AI systems to start making, you know, a a military that is is not legally sanctioned. But, you know, because the government doesn't have much to threaten you with, it kind of you get away with it. I mean, it gets a little bit tough. you probably need to do that in secret otherwise the existing military could um could could prevent it. But yes, I do think that you know being very rich helps with lobbying. It helps with all kinds of ways of seeking power and then controlling yeah controlling a lot of industry can can potentially give you military power. You mentioned these special economic zones. That's that's one way in which companies could kind of bargain with states uh in order to uh have favorable regulation and to and to be able to carry out their projects without intervention. Basically, another way for them would be to collaborate with non-democracies that are perhaps controlled by a a single a small group or or perhaps even a single person. And in that way, it seems like perhaps it's easier to get something done in a non-democracy and and that is a way to to grow fast and and so perhaps there are incentives for companies to place more resources in non-democracies. What do you think about the prospect of non-democ democracies out competing democracies when it comes to AI? I think it's a really great question and it's tricky because I think I agree like democracies have lots of checks and balances. They have a lot of bureaucracy, a lot of red tape and that will disincentivize AI companies from investing. And then additionally, if there are people really trying to seek illegitimate power, that will be easier to do in non-democracies because they're less less politically robust. So that there there are these various forces pushing towards you know this new supercharged economic technology being disproportionately deployed in non-democracies. Um and I I think that is scary. My my my own view is that probably we should democracies should should should should should to should to should to should to should to should to should to should to should to should to should to should to should to should to should to should to should should kind of do everything they can to to avoid that situation. Make it much easier for AI and robotics companies to to set up shop in in democracies. Remove the red tape. try and you use export controls like are already happening to prevent technologies being deployed in non-democratic countries and that that goes beyond China. There's obviously lots of countries that are not allied with China but are also non-democratic here and the US you know the US is in a strong position because it does have the strangle hold on AI technology at the moment. So I do think it can be done but yeah in my view like it will be really important to to to kind of work very hard to to find a kind of a non-restrictive regulatory regime and it will also be very important to really try and pursue innovative innovations within the democratic process itself where you know democracy is great in many ways. is it really distributes power and it and it has been very good at ensuring good outcomes for its citizens but it's very slow and often you know kind of nonsensical because you have competing interests that are kind of stepping on each other's toes and the resultant legislation is you know garbled mess and so you know AI can potentially solve those problems you can have AI negotiating and thinking much more quickly on behalf of the the the kind of human stakeholders you can have AIS nailing out agreements that aren't a garble mess, but they're like really gave everyone what they truly wanted out of the legislation. And you can still do all of that really quickly so that you're not falling far behind the autocracies that have just got one person immediately saying what to do. And I think if we did that, we you know, democracies could out compete autocracies because, you know, the big thing that often screws over autocracies is that one person is flawed, often makes big mistakes, people afraid to kind of stand up to them. Yeah, that that would be more of my assumption that I would assume here that perhaps democracies with market-based economies have an advantage just because you can you can do kind of bottom up knowledge discovery. You can try different things out. You can see what works. You can have competition between companies and so on. And perhaps in in non-democracies, well, I mean there you can you can have one person or a small group stake out of direction for what the country should do, but if that direction is is wrong, it's probably difficult to change course. Yes, I I think you're probably right. I should have I should have, you know, given more weight to that that that advantage of of kind of democracies in terms of the free market be being, you know, in many ways much smarter. But in terms of autocracies that are good at harnessing free market dynamics, my worry would be that the AI helps them more than it helps democracies because AI will be able to kind of replace, you know, currently, you know, one person just can't think that hard, can't really figure out a good plan. But if if if that one all powerful leader has access to loads of AI systems that can kind of think things through and investigate lots of different angles, then you know that if if they're following it advice, they could get advice which, you know, lacks the flaws that that today systems had and and they could potentially move much faster. Um, but I I I think know you're right that kind of economic liberalism is is still going to be important even after we get powerful AI systems and that could give that could give democracies an advantage. This is a bit of a tangent perhaps, but I'm thinking whether so if you have a a leader of a country that has a lot of power, perhaps complete power over that country and that leader is equipped with AI advisors advising him and and and kind of laying out kind of the landscape of options for him to choose from. Wouldn't his decision- making still be in a sense bottlenecked by the fact that he's a human, by the fact that he has these these flaws that we all have, the biases that we all have? So, even with fantastic advice, I think it's it's quite plausible that that he would still make the same mistakes that we see leaders make today. I think that's true. I think it's also true in democracies unfortunately that you know if there's 10 negotiators and they each kind of still have biases and still refuse to listen to the wise advice they're getting from their AIS that could still gum up the the system. And yeah, it does depend on how much humans come to trust and defer to their AI advisers. There's a possible future where the AIS are just always nailing it. They're always explaining their reasoning really clearly and we are just like increasingly convinced and happy to trust their judgment. If AI is aligned, I think that would be a great future because I do think humans have all these very big limitations and biases which if we can solve the alignment problem, AIs don't need to have. But there's also another future where humans just, you know, want to be the ones making the decisions, have these kind of pathetic kind of motivations that that that that they're still kind of influencing their decisions and that Yeah. that the kind of that that continues to to to to to limit the quality of decision- making. Seeing things from above, right, from kind of like 10 thou thousand feet. How should we think about mitigating the risk of coups here? Is it is it about removing people that would use AI to commit coups? Is it about kind of finding those people in the militaries, in the governments, in the companies perhaps? or do we have ways to reduce the returns to to ceasing to ceasing power? Yeah, I mean from real 10,000 ft up the way I would characterize it is create a common understanding of the risks, build coalitions around preventing them and then the existing balance of power can self-propagate forward. You know, it's in everyone's interest to prevent a coup. Currently no one small group has complete control or close to it. And so you know if if everyone can be aware of these risks and aware of the steps towards them and kind of collectively ensuring that no one is going in that direction then we can all kind of keep each other in check. So I do think you know in principle the problem is solvable and it doesn't require you know solving the risk of misalignment does require solving some tough technical problems. this doesn't in the same way. Yeah, you have a bunch of recommendations for mitigating the risks both for AI development, AI developers and governments and perhaps you know we don't have to run through all of them but you can talk about the most important ones for AI developers. I might characterize this I might kind of talk about it by going back to those three threat models we discussed earlier. So the first one was singular loyalties or overtly loyal AI systems where again you know the main risk there is AI deployed by the head of state and the military and the government that's loyal to the head of state and so the main counter measure that currently appeals to me is for us to you figure out rules of the road for these deployments. you know, obvious things like AI should follow the law. AI deployed by the government shouldn't advance particular people's partisan interests, but should only do like, you know, official state functions. AIS in the military shouldn't be law to one person. No, they should different groups of robots should be controlled by different people. and you know head head of the chain of command can still be head of the chain of command via instructing other people that instruct those robots but they shouldn't all go directly to head of chain of command because that centralizes military power too much. So you know fleshing out basic rules of the road of that kind and then building consensus around them because you know companies might want to say to governments yeah we don't want you to deploy our systems you know if if if if they're willing to break the law but if the government you know if if the the government will have a lot of bargaining power the executive in the United States can can you know it's hard for companies to stand up to them. So what we want to do is you know establish these rules of the road and then get broad buying from Congress from the judiciary from you know other branchs of the military from many parts of the executive. So then it's very then hard for say the president to say yes let's like make this robot army loyal to me and everyone's like obviously not we've all like agreed that makes no sense you know and then the president doesn't even bother trying because it's just clear that it would be a no no go that you know their mind doesn't even go there in some sense this is about kind of implementing the procedures and the transparency rules that we know from democracies today into how we use AI both in governments and and in companies I Exactly. Yeah. Do do you worry here that when so the government is seeing is looking at these companies from the outside and they don't have full insight into what's going on. Do so so there are kind of protections for private companies that that mean that they can they can do things in secret without the government knowing at least as as things stand now. Is that something that would evade these mitigations you're you're thinking of? So, I mean, this for this first bucket, the singular loyalties bucket, it's it's mostly the the kind of heads of state that I would be worried about. So, it, you know, it actually is probably good for the government or at least for the, you know, the head of state themselves not to have full insight into literally everything the company is doing because that would give them too much power. But you know actually having different parts of the government having insight into what the lab's doing I think is very good. I'm a big big fan of transparency. Um and you know we do have a good set of you know government checks and balances from different government bodies that we can deploy to kind of keep the lab in check using these other bodies but also not allow like you know the executive branch and the and the president to get excessively powerful. So that that's the mitigations the kind of singular vert loyalties. In terms of secret loyalties, the key mitigation is is what I'm increasingly calling system integrity. That is, you know, using established cyber security practices and machine learning security practices I kind of preventing sleep agents and back doors in machine learning models. using all of that to ensure that your development process for AIS is secure and robust and that no malicious actor be they an employee in the post- trainining team at a lab or be they the CEO of the lab that is either malicious or is being threatened by the Chinese government to kind of tamper with model development that no person or no small group is able to significantly tamper with the behavior of AI models. and no group is able to get illegitimate access to AIS that would help them seize power. So that's that's this idea of system integrity which is you know essentially a technical project which does just draw on existing practices but is not yet implemented in any of the top labs. Um I will quickly shout out for for non non-lab for kind of people listening that are working at labs. I think there's a lot of really good technical research that could be done on kind of in investigating the conditions under which you can insert a steeper agent without a defense team knowing. And there's there's just loads of research that that that could be done in terms of the different settings there for attackers and defenders which could then inform what what parameters we need to be in place to achieve system integrity. you know, if it turns out that it's, you know, it's very hard to make a sleeper agent accept in the final stage of training, that's really useful to know because then we can focus our efforts within labs at, you know, that final stage. Ju just as a hypothetical example. So that that's the kind of key mitigation in my mind for the secret loies. And then I'll quickly do for exclusive access. That one seems more difficult. I don't know just just from from from me reading and preparing for this interview that that one seems like a difficult one to handle where this is this is this is in a some sense a deep trend uh in history and in in the kind of history of modern economics that you do see faster growth rates and you do see concentration into bigger and bigger economies both in in countries and in in companies. So are you are you in some sense pushing against underlying trends if you're trying to mitigate exclusive access to to advanced AI from from one actor? I think you can you can do this in other ways. So you can have the law require that AI share the AI labs share their powerful capabilities with kind of other organizations to act as a checking balance. So that you know lab should share their R&D capability AI R&D capabilities with eval organizations here you're thinking about giving insight into what they're capable of not not actually sharing those capabilities that would be too big of an ask I think I mean I do mean API access so you know if if a lot of the work in developing and evaluating systems is now done by AIS then we want an an evaluation organizer to also be uplifted. And so we want them to have access to really powerful AI that can similarly stress test how dangerous the frontier systems are. If they're only using human workers, then that's going to be a big disadvantage. So no, I I I do want API access to to powerful capabilities for for other actors. You know, for example, cyber security teams in the government and in the military should have access to the lab's best cyber capabilities. And again that that that should be a requirement by law. So you know generally like even if there's a natural tendency towards centralization of power in one one organization you can still require that that organization share its systems with with the checks and balances. That's one thing and the other thing is kind of preventing anyone at this organization from misusing the powerful AI systems. So the the the biggest thing on my mind here is that today we still have helpful only AI systems where you can kind of get access to the system and then it will just do whatever you want. No holds barred. I don't think there should be any AI systems like that. I think you should always have, you know, at least a classifier on top of the system which is, you know, looking for harmful activities and then kind of shutting down the interaction if something harmful is detected. And, you know, if you have a special reason to use cyber offense, you know, for your job or you have a special reason to do potentially dangerous biology research, you can have that classifier allow certain types of activity, but you should never have anyone accessing a system where, you know, anything is allowed. You know, no one no one has a legitimate reason to access an AI that will literally do anything. So, what I want to aim for is a world where yes, if there's a specific reason why you need to use a dangerous capability, absolutely, you can you can use that system, but that system will just do that one dangerous domain, it won't kind of do anything you wanted because that, you know, that that's a that's a very scary situation where there's, you know, there's a hundred reasons why the CEO could ask for access to a helpful only system. you know, maybe the guardrails are annoying, maybe maybe wants to kind of, you know, do do something which which the model is reluctant to do. But today, when you ask to remove some guardrails, you're removing all of the guardrails and now there's no holds barred. So, you know, instead, you know, we should we should be flexibly adjusting what guardrails are there, you know, by the use case and just, you know, never have a have a situation where where where there's no guardrails. I think that I think that could go a long way towards helping if if that was that was robustly implemented. With all of these mitigations for both secret loyalties and exclusive access and singular loyalties, you would worry that they would be disabled by the group planning a coup, right? Say, say that for example, you are the CEO of an AI company and you're giving API access to to evaluations organizations testing your model trying to see what they're capable of. Maybe you just cut off access before you get to the really powerful model that could actually be the model that that helps you conduct a coup. Do we have ways of making sure these mitigations are entrenched before in in such a way that they can't be removed by the group planning a coup? This is a great question. It is it is pretty tricky. CEOs by default have a lot of control over their organizations and similarly heads of state, you know, including US president has a lot of control over the military and over the government. So yes, there's a risk that one of these powerful individuals realizes that maybe they want, you know, more influence by gaining control over AI and notices that there's these kind of pesky little, you know, processes that prevent that and it's like, okay, well, let's remove them. I can give you know easy say you know productivity reasons to prevent them red tape reasons and you know if they can make a plausible argument then it could be hard to oppose them so I do think it's a big issue but but I'd say a few things firstly something I mentioned earlier I don't think that anyone is today planning to do an AI enabled coup the The way I think this works is that people are faced with their kind of immediate local situation, something they want to do over the next month and the blockers that they're facing to doing that specific thing. And you know what tends to happen is people, you know, tend to want more influence that that helps them get get stuff done. And so people will kind of bit by bit kind of move in the direction of getting more control over AI, but they won't be kind of thinking, yes, I I need to make sure that I remove this whole process because that will allow me to do an A enable C. that's kind of unrealistically galaxy brain. And so what we could do is we just set up a very efficiently implemented and very reasonable set of mitigations that doesn't really prevent CEOs from doing what they're they're trying to do. Um and so the CEO doesn't find in their day-to-day they're wanting to kind of like remove these things that are holding them back. But because these mitigations are here, the CEO never gets to a place where they're anywhere close to being able to do a coup or or where there's any kind of pathway in in their mind to be able to doing a coup because they're they're constantly prevented from getting access to kind of really powerful AI advice that might point out ways in which they could do this because they're like surrounded by colleagues that like you know strongly believe that these mitigations are sensible and reasonable and in fact they are well implemented and you know there aren't many downsides. maybe an environment where they kind of get kudos for the fact that they've said, "Yep, I obviously I'm not going to get access to helpful only systems. That's crazy." And then that's kind of like something something that that that makes them seem good. So that that that's one thing to say. Um, another thing is again going back to this point that there are currently checks and balances and there is not currently a situation where one person has power. You know if the if if the entire board of a company and other senior engineers recognize the importance of the mitigations know about this threat model then they will notice if the CEO is is moving that direction. Um and you know similar similarly within the government there are checks and balances and and they they could be activated if people people are looking out for it. Do do you think these traditional kind of oversight mechanisms like a board being in in control of of the CEO being able to fire the CEO or the possibility of Congress or the Supreme Court kind of overruling or constraining the US president? Do you think those will persist in environments where AI is moving very fast and it is is the AI capabilities are growing at a rapid pace? It's a great question here. Here's one story for optimism. Today things are moving fairly fast, but those checks and balances are somewhat adequate at least preventing really egregious situations. By the time the AI is moving really quickly, we'll have handed off a lot of, you know, the implementation of government, the implementation of things in the AI companies, the research process, we'll have handed it off to AI systems. And when we do that handoff, we could program those AIs to maintain a balance of power. So rather than handing off to AIS that just follow the CEO's commands or AIs that follow president's commands, we can hand off to AIS that follow the law, follow the company rules, report any suspicious activity to, you know, various powerful human stakeholders. And then by the time things are going really fast, we've already kind of got this whole layer of AI that is is maintaining balance of power. like the whole AI gum bureaucracy, the whole AI kind of company workforce, they are like better than humans today at standing up to misuse potentially. They are like less easily cowed and and and intimidated and they they could actually make it harder for someone in a in a position of formal power to to kind of get excessive influence. So this is like the flip side of you know this the singular loyalties where you you potentially deploy these AIs that are like explicitly loyal. You can actually kind of instead get kind of singular law following and balance of power maintaining AIs that that you deploy. And so the hope is that by the time we really things are beginning to go kind of crazy and we're really seeing speed ups from AI. we've already kind of set set ourselves up in in an amazing way to maintain balance of power and there's this this critical juncture where we are handing off to AIS and it's just you know what what are those AIs you know what are their loyalties what are their what are their goals and you know I think we can gain a lot by making sure that those AI systems are maintaining balance of power reporting you know illegitimate suspicious activities um and are not kind of overly loed to to any one How do you think the risk of AI enabled coups interface with kind of more traditional notions of of AI takeover? So just a a misaligned highly capable or advanced AI system taking over in in kind of contrary to the wishes of the developers or or the governments. Yeah, I mean there's there's there's some close analogies. The you know perhaps the most analogous case is the case of secret loyalties where you know you've got you got these AIs that have been told by the CEO to have the secret goal of seizing control and then handing control to the CEO. That's just very similar to AIS that wanted to seize power for themselves secretly. And you know all the same stories could apply where the eyes kind of make military systems and then they control the military systems and the robot army and then they seize power. The only difference is were they seeking power because it just kind of accidentally emerged from the training process which is the misalignment war or were they seeking power because the CEO programmed them in that way you know but that's the kind of seed of the power seeking. But then with the secret loyalty split model the the rest of the story is you know pretty similar. Um, you know, there's still differences. You know, in the secret loyalties case, the CEO might be doing more to help the AIS along with their plan. You know, maybe even in the misalignment case, the AIS have managed to kind of manipulate the CEO into into doing similar things. So, that that's the case where it's most most analogous. I think the, you know, another difference that that's salient to me is that if there are lots of different AI projects, then a coup seems an A&E coup seems a lot harder because you need like lots of different humans to kind of coordinate to kind of seize power together, which seems, you know, while while I can totally believe that um one one one person might might try and seize power, does seem less likely to me that there'll be loads and loads of humans that would want to to do that from lots of different labs. Um whereas from the misalignment story, it it is, you know, more likely the case that if one of these labs has misaligned AI, then maybe, you know, lots of them have misaligned AI. And so then it's it's more likely that you'd have, you know, maybe 10 different AIs colluding and then seizing power and taking over. And so that kind of collusion between multiple different AIs is is more likely in the case of misalignment than in the case of an A&E coup. Just because if there's one misaligned AI, then there's something about the training process for AI systems that that are causing a misalignment and then it it would be a common feature among among many companies. Exactly. Whereas just the fact that one CEO instructed a secret loyalty would not to the same extent make you expect that other CEOs have done the same. So you mentioned this possibility, but do you think it's Yeah. What do you think of the prospect of a president or a CEO of a company being duped by a misaligned AI into conducting a coup on its behalf? So you can imagine a president or or a CEO kind of thinking that that he's conducting a coup to remain in control but he's actually acting on on behalf of a misaligned AI. Yeah, I think it's an interesting threat model and you know some people who think about AI takeover threat models take it pretty seriously and it's just you know it's just a case where we're just completely mixing these two threat models together. You know, people who are worried about AI takeover for this reason should be very supportive of the kind of anti- mitigations I'm suggesting because if we implement checks and balances that prevent any one person from getting loads of power, then that AI will not be able to convince them to try because they just won't be able to succeed. So, you know, I see I see this as like, you know, an an additional reason to worry about air enabled human coups and to try and prevent them is that yes, even if no human wants to do this normally, you know, mislined AI might might might make them try. In terms of how plausible I find the threat model, you know, honestly, I think that if a human tries to seize power, the main reason is that that human wanted power. Like this is just something we know about people. We know it about, you know, heads of state today. You know, it's very clear that that that, you know, many heads of state in the most powerful countries in the world are very power seeking. We know it about CEOs of, you know, big tech companies. you know, we know about, you know, about about some of those leading leading AI companies that we we do know that they're very power seeeking, those CEOs. And so I don't think we need to to theorize that like they were massively manipulated by the AI and convinced to become power seeeking. Like I think I think it's more likely that if they seek power, they just they just did it for the normal human reason. I I do think AI will get ultimately get good at persuasion. Um, I don't particularly expect it to be hypnotic level persuasion though, though, you know, obviously there's massive uncertainty here. And yeah, I do I do think that like a very smart AI where there's a human that's already kind of kind of interested in seizing power and it already kind of makes sense for them to maybe do it, the missile AI could totally nudge them in in that direction and then could implement that in a in a way that actually allows the AI to seize power later. I think that is that is very plausible. When we're thinking about distributing power and and kind of having this balance of power, we can imagine the models being set up via post training, via the model spec, via various mechanisms to kind of obey the user unless unless uh the what the user instructed to do is is in conflict with what the company is interested in. and perhaps obey the company unless what the company is using the model for is contrary to what the government kind of permits. But when we set it up in those levels, you ultimately end up with the government in in control in some sense. And I guess that exposes you to to risk of of a government coup. Then if you have at the ultimate top layer of the stack, here's what the go here's what the models can and cannot do according to to the government. Well, I'd say a couple of things. First is that the government isn't a monolithic entity. And so that government decision of of what the bounds should be could be informed by multiple different stakeholder groups and then ideally, you know, it's ultimately democratically accountable. I do think that democratic accountability becomes more complicated in a world with where there's massive change in a 4-year period just for for the simple reason that there there's no election during a period where where massive change is happening. So so the feedback loop is too slow. Exactly. You know, I think the risks of AI enabled coups will probably emerge and then, you know, be decided within a four-year period as in like it will be resolved whether or not it happens or doesn't all without any like intermediate election feedback. You that doesn't mean that democracy can't have an effect because politicians anticipate, you know, what what future elections will will find and want to maintain favor throughout their terms. But it does it does pose a challenge. But but sorry I was I was kind of saying even absent that there's there's many different stakeholders in the government and so you know they would it would have to be a large group of government employees that were kind of trying to do a coup and then they would have kind of the companies would know that they were setting these these odd restrictions on on on the on on the kind of behavior and so the companies would know and they have leverage and power and then you know it could go public. So I I do I don't I don't think it would be that easy for the government to do a coup. Perhaps there's a difference also between allowing the government to set restrictions on what the models can do and then allowing the government some kind of access to to commanding the future AI systems in certain directions. So it's kind of setting setting limits versus versus steering the systems. Yeah, exactly. I mean, the distinction I was going to highlight was between specifically making AI systems loyal to, for example, the head of state and they're setting very broad limits where there's just like you can pretty much do whatever you want except for these obviously bad things where that second option doesn't really enable anyone to do a coup. it just enables everyone to do whatever they want and then you've kind of blocked out all of the kind of coup enabling possibilities through through those limits, you know, as as long as you haven't made those systems loyal to a small group. So given that there's this obvious option to just put in these limits that block coups but don't enable coups and given that there's you know a wide range of stakeholders that could potentially feed in to what what what the AI's kind of limitations and instructions are I think it's very very feasible to get to a world that where there's robustly not centralization of power um there's obviously big uncertainty over whether we will actually get our act together and and get those limits put in place in the right way. Yeah. When do you think these threat the threat of a AI enabled coups materialize? Is it at some specific point in AI capabilities or is it simply scale with the systems getting more advanced? When do you think the thread is at is at its peak? It's a good question. For the threat models that I've primarily focused on, they require pretty intense capabilities. So that for example, the secret loyalty threat model more or less requires AI to do the majority of AI research. So we're talking about, you know, fully replacing the world's smartest people in, you know, a very wide range of research tasks and coding. That's that's pretty intense. And then a lot of the threat models that I focus on root through military automation. That is AI and robots that that can kind of match, you know, human boots on the ground. And that's, you know, that's that's pretty advanced. Again, that said, I I I think, you know, you can probably do it with with with less advanced capabilities than that. So, like we, you know, drones today are already pretty good already providing, you know, making a big difference in in some military situations. So it it's you know not out of the question that you know more limited forms of AI and robot military technology could could be enough just to facil facilitate a coup. It's a bit harder because if if they're limited then there's a question of why the existing military doesn't just kind of seize back control after a bit of time. And so probably that scenario also has to involve things like maybe the current president, you know, supporting the coup and therefore like pressuring the military not to intervene or some other source of legitimacy for for the coup beyond the kind of the the AI controlled drones. And then there's also kind of more kind of typical types of backsliding um you know like like has already been happening in the US that I think you know could be exacerbated through AI enabled surveillance and you know AI kind of increasing state capacity in other ways and again you know that that that backsliding doesn't require you know super powerful AI you know you could probably do a lot of monitoring a lot of kind of content moderation on the internet, a lot of surveillance with today's systems. Um, it doesn't get you all the way to one person having complete control where they can just quash any resistance with a robot army and replace everyone in their job with a with an AI and so no one has any leverage. So I think you know to get to that real intense this is like the most intense form of concentration of power via AI that requires really powerful AI. but to just kind of significantly exacerbate existing trends in in political backsliding and you know to make it easier to do a military coupe I think you know more limited systems um would suffice. Yeah, we discussed earlier the the possibility of of one country or one company outgrowing the rest of the world and kind of concentrating power into into those entities. Now you mentioned one person. Do do you think that's actually a plausible scenario in which you have say one CEO of one company being being the person in in control of the world by a concentration of power and then a coup 100%. Yeah. I mean you the story I told earlier about secret loyalties you know meaning that now we backdoor wide range of military systems meaning that you can seize power. That's that's one route. And then you know again there's this other with the company masses you know masses amounts of economic power by kind of having a monopoly on AI cognitive labor and then you know leveraging that leveraging that to get more economic power more political influence. Yeah I mean I do I I do think it's possible you know again there's this big shift once AI can fully replace humans where today no one person can never have absolute power. They have to rely on others to implement their will. And this is what makes kind of current exist currently existing dictatorships unstable where there's always a threat of kind of internal revolt or outside factors threatening the dictatorship. But this this could potentially change. Yeah. Yeah. There's always a threat a revolt and then to to guard against that threat, the dictator needs to share their power to some extent. Has to compromise. But yeah, you could get it all concentrated in one person with sufficiently powerful AI. Do you think we we move through a period of increased threat of AI enabled coups and then reach some kind of stable state or do you imagine that there is a a a constant kind of risk of AI enabled coups in the future? I think we move through it. Yeah, it's it's it's this point about once we have deployed AI across the whole economy, the government, the military, if those AIs are maintaining balance of power, then we could fully eliminate the risk of a enabled coup. You know, it would just be as if, you know, our whole population was just, you know, so committed to democracy, would never seek power, you know, never help anyone else who who who wanted to undermine any democratic institution. Um you you know you could we already have strong norms you know favoring democracy but you know they're far from perfect and they have been eroded over recent decades but you could you could just get rocksolid norms that they're programmed in you know they cannot be removed except by the will of the people. I mean there's a there is a bit of a question because you still want to give the human population the ability to change the AI's behavior and its rules. So the human population could always choose to move to an autocracy. Um so I suppose I shouldn't say that we could fully eliminate the risk because you know we we we could we could we will always have that you know democracy there's always the this point that democracy could you know vote to stop being a democracy but I do think you know we could get to a point where it absolutely cannot happen without most people wanting it to happen. And so so you would get to a point in which kind of future AI enhanced societies, you could say, are more stable than than current democracies. And they're more they're less at risk of of coups than or democratic backsliding than than current democracies. Much much more. Yeah, you could get much more robustness there. I mean, just there there's this constant dynamic in today's societies where people care about democracy, but they also care about a host of other things. you know, their own achievements, various other ideological commitments. And so, you know, depending on how dynamics play out, depending on how technology evolves and what people's incentives are, sometimes people push against democracy. You know, that's what the Republican party's been doing in some ways. That's what Democratic Party has done as it's kind of increasingly put, you know, pretty ideological people in in powerful institutions. So with AIS you can you get much more control over those dynamics because you can just you know you can just make it much more the case that that that democracy is not being compromised. Are there any ways for us to are there any kind of risk factors we can look at if we're interested in in in predicting coups? Do you think there's something we can we can measure or something we can track to see whether we are at risk of of of an AI enabled tool? It's a great question. I don't think I have an amazing answer, but some things that are coming to mind the the capabilities gap between top AI labs and then the gap again with open source. the degree to which AI companies are sharing their capabilities with the public and if not with the public then with you know multiple other trusted institutions you know like sharing their strategy capabilities with kind of US political parties and parts of government um the the extent of economic concentration you How much how much what are the revenues and net worth of particular companies you know particularly AI companies. It's another one. What is the extent of government automation and military automation by AI systems? And when it when that automation is happening, how robust are the guard rails against breaking the law and guard rails against other forms of illegitimate power seeking? um how much transparency does the public or the judiciary or the Congress have into how dangerous AI capabilities are being used by by AI companies and by the executive branch. So you know take the example of military R&D capabilities that is you know really smart AIs that can design super powerful weapons. It's scary if companies can just use those military R&D capabilities without anyone knowing. It's also scary if a small group of people from the executive branch can use those capabilities without anyone else knowing how they're using them because they could be designing powerful weapons and making them loyal to a small group, you know, so transparency in into these like high stakes capabilities and how they're being used by a by a broad group. Doesn't have to be public. um probably shouldn't be public, but you know, we have checks and balances already. So you know another another kind of question is as as as these high stakes use cases start occurring or they become possible do do we know that there's transparency requirements in place you know as we as we increasingly see AI companies contracting with Palunteer and other military contractors we can kind of begin to see they're making increasingly powerful um weapons. Is there is there a process of oversight? Do we know that if someone was trying to, you know, make AI military systems loyal to them that they would be spotted? That that that's another indicator we we can look at. You can look at, you know, all the kind of standard democratic resilience indicators that kind of the social scientists have come up with. There's various things about kind of free and fair elections, about civil society, about freedom of press that, you know, have been getting worse recently in the US. Um, but there there's various indicators here. You can look at the degree of government censorship of over freedom of speech or what's on the internet and the degree of surveillance that the government's doing. If you if you take all of these things into account, yeah, how do you think about the the risk of an of an AI enabled coup in the next, you know, 30 years, say? Next 30 years, I I think it's high. I think the risk is high. I would guess it's 10% or something. And that, you know, to be clear, you know, if if it was just existing political trends ignoring AI, I'd be, you know, maybe a few percentage, maybe on like 2% or something. You know, there's definitely a risk of that. And I'm thinking about the US here. A big a big part of my my current worries are not about the indicators but it's about my expectation that AI capabilities will keep will keep increasing quickly and even more quickly and then the kind of absolute lack of interest in regulating AI companies right now in the US and the the difficulty that we will have of constraining the executive under the current situation where you know president is using you know sophisticated legal strategies to increase their own power and is is you know succeeding on many fronts. Um you know the US is is not doing a great job at constraining the executive. So you know companies are unconstrained executive is poorly constrained you know those are the key threat actors here. So, you know, with fast air capabilities progress plus that lack of constraint, lack of transparency, you know, the default is that a lot of those indicators I said get worse and you know, none of the indicators get better like transparency and so that that that makes me think this is this is very plausible. Yeah, I mentioned 30 years, but what about five years? Five years. That's tough, isn't it? It's really tough. I mean, yeah, I I think there's a risk. I I wouldn't think there is a risk if it wasn't for the AI research causing an intelligence explosion angle, but AIS are a lot better at coding and cognitive kind of research related tasks than they are at, you know, for example, you know, controlling robots and stuff. And so even if the F ultimately comes through robots or comes through, you know, crazy levels of persuasion, it's just you you really can't rule out a scenario where yeah, AI research is is is automated in three years time. Then in four years time, we've got super intelligent AI controlled by a few few people. Maybe it's got secret loyalties. Maybe it's being deployed in the government and being made overtly loyal to the president. And then, you know, a year later, you know, it's it's backsliding or it's political capture or it's, you know, robot soldiers. Yeah. How how do you think about the badness of the outcomes here? How how much does the badness depend on the ideologies of the people who are who are conducting the coup or Yeah. What should we look out for if because I mean I guess we can rank cues by badness which is not an exercise I think we should actually attempt but we can we can kind of talk about the factors involved about what would be the worst kind of coup and what would be a slightly better kind of coup slightly less bad kind of coup. Yeah. So we could you know we let's imagine it's one person that sees power actually know that's the first distinction to draw. If there's a group then even 10 people is better than one person and and why and why is that? Yeah. So so 10 people you get a diversity of perspectives. So more kind of moral views represented and there's more kind of room for compromise between those perspectives. there's more room for kind of reasonable positions to win out as there's kind of some some deliberation as as actions are decided upon. There's slightly less intense selection for psychopaths than if it was just one person. So yeah, if it's just one person that's bad, that's particularly bad. 10 people still very bad. You know, 100 people still pretty bad, but you know, it's that that there's big differences there. big differences. If if if we're now just thinking about, you know, one person or or the average person in a group, then we could think about how competent they are and then we could say something about kind of how how kind of virtuous their motivations are. Um, well, I do think competency is important. Like I think it's probably underrated in most political discussions how important it is to just be really, really, really competent. You know, thinking about something like responding to co or thinking about something like, you know, trying to deescalate a conflict, Russia, Ukraine, or trying to deescalate Israel conflict, like actually just being very competent and very good at getting things done is important. And as we as we mentioned, if you're just willing to rely on AIs and you, you know, align those AIs in the right way, anyone could be really competent, but you know, that's not guaranteed. You know, people, you know, may may really want to to cling to their current views without without changing their mind. You know, let's let's take the example of Donald Trump. If a really smart AI system told him, look, tariffs are definitely bad for the US economy. they're definitely bad and won't give you what you want. Would would he change his mind? I would I would guess no. So, you know, lots of lots of smart people have already been saying that and you know, him and his supporters. I I don't actually know like the economic details here, but like my understanding is that most people think that they're pretty bad. And, you know, it'll still be the case that, you know, Trump will be able to find people telling him that what he thinks is is good and he'll be able to program his AIS to keep telling him that if he wants to. So there's no guarantee that that that he will become super competent or that whoever sees power become super competent. So there's there's this kind of like there's a there's a form of loyalty that actually undermines competence just because you're you're not you're loyal to such an extent that you're not providing feedback that's that's useful because you know negative feedback feels bad to receive and so there's there's there's that kind of loyalty. I mean, maybe this is a bit contrived, but do you think there's a sense in which in the singular loyalty scenarios, the AIS could be so loyal that they are they're kind of undermining the competence of of the person that is that that they're singularly loyal to? Yeah, it's a really great question. I haven't thought about this but yeah in a way the most extreme version of singular loies will just agree with whatever the most recent thing that you know the the the dictator has said it's like you know it's a version of sick fancy which we already see without questioning and and we'll do that even when it's not in that person's interests um because that's the kind of type of loyalty that's demanded where there's a more kind of sophisticated type of loyalty where you're still completely loyal but you're also willing to challenge them when when you think it's in their in their best interests. So that that that's a really nice distinction. And yeah, I suppose one way of you know thinking about competence is thinking about what kinds of loyalties the dictator would demand from their AI systems. Um another another way of thinking about is how much they would listen to the AI advisor. like even if the AI has the kind of sophisticated type of loyalty and is trying to tell the dictator what to do the dictator could just ignore them and you know you see that again you know AI is a fairly sickantic they will also challenge you sometimes and then you know it's up to you whether you listen so that's that's all the confidence bucket which I think is really important and I do think there are differences between potential coup instigators on that on that front which could be significant where yeah I guess My expectation would be that that kind of lab CEO crews would be would be more competent than than heads of state. But you know even even within lab CEOs there are some that are more dogmatic than others. And I think that dogma would get in the way of competence. That's competence. The other thing I mentioned was kind of kind of broadly what are your goals? What are your values or or kind of more character? And here, you know, what I one thing I think is really important is being open-minded, being willing to bring in lots of different, you know, diverse perspectives into the discussion and empower them to, you know, really represent themselves and grow and flourish. Um so I think a very bad thing would be you know a particular person becomes a dictator they implement their vision for society end of much better would be you know empower all the different kind of ideologies and ideas to kind of you know become the best versions of themselves and then you know we can kind of collectively grow and improve our understanding of how to how to run society. Um, so you know, sometimes people focus when they're thinking about values on like, okay, are you are you a this type of utilitarian or oh no, I hope you're not, you know, deontologist or, you know, it can get very kind of specific and fingerpointing. You know, my view is more that, you know, we don't really know what the right answer is. And the most important thing is is is being pluralistic and, you know, letting a thousand flowers bloom. M H so we we discussed the possibility of getting to stable state in which we've avoided an AI enabled coup and now we have say we have aligned super intelligence kind of that where the risk of coup is is very low do you think this is something that happens for one country and then that one country is is is in control of the world to such an extent that it's that it's this is not a process that other countries are undergoing to be more concrete here. For example, if the US goes through a risk of of cues of AI enabled coups but manage to to kind of stay to to remain a stable democracy. Is it the case that Russia or China will go through a similar period of risk of of cues? It's a great question and it will depend on the US's kind of posture towards the rest of the world geopolitically and it will also depend on you know whether the US has gained a huge military and economic advantage you know like outgrowing the world or just developing powerful military technology as we were discussing previously but you know you can imagine one scenario where you know the US isn't that that much more powerful than the rest of the world yet and isn't that like kind of inclined to to intervene which has been kind of the recent trend and then you know China developed some really powerful AI a few years later and using Ping uses it to cement his control over China. So then you now have one kind of AI enabled dictatorship that is extremely robust and then you have the kind of US which you know has has avoided that risk and now they're kind of you know maybe they're competing against each other and kind of you know the cold war I'm trying to kind of cold war sorry trying to outgrow the world or may maybe they're striking deals because you know they they recognize this it's it's not good to compete and you know China just kind of indefinitely remains a dictatorship and you know that that's just a permanent loss for for the world but but you could also imagine a different scenario where the US is very far ahead and maybe you know it just wants to really secure its position geopolitically and so it you know it in instigates air enable coups in other nations where it's really putting kind of US representatives up on top of those those nations. That could be through secret loyalties. It could sell systems to, you know, sell sell AI systems, let's say, to to India that are secretly loyal to to US interests or it could give give some particular politicians in India accessive access to to super intelligent AI to help help them gain power. So, you could, you know, you could apply those same threat models we've discussed, but with the kind of US pulling the strings. Or or you could have the US just kind of taking control of other nations in more traditional ways. Um you know just you know military conquest and kind of really leaning heavily on kind of extracting economic value out of other countries as they go the world. So you know yeah kind of wide range of options here really. Yeah. Yeah. As a as a final topic here, perhaps we can talk about what listeners can do if they want to help try to prevent AI enabled coups and specifically where to position themselves. Should should they be in AI companies? Should they be in governments? Should they be in perhaps email organizations? Where's where where's the position of most leverage? Great question. I think being at a lab is a great place to be. I talked about system integrity kind of robustly ensuring that AI don't have secret loyalties and behaviors intended. That's something that companies need to implement. So if you have interest or expertise in sleeper agents or back doors to AI models or cyber security then I think being part of a lab and helping them achieve system integrity is an amazing way to reduce this risk. Um, another thing you can do at labs if if you you know if you're interested in if if you're worried about kind of the the risk of you know government you know heads of state deploying law AIS and seizing power is you can help labs develop terms of service where when they sell their AI systems to governments they have certain mitigations against misuse um that maybe you know one way to frame this is look we're we're you're using really powerful AIs and we can't guarantee the safety of those AI systems unless we have some degree of monitoring to ensure that the AI systems aren't doing anything. that monitoring could then be sufficient to allow for the prevention of of of coups because you been monitoring not only for kind of accidental misaligned AI behavior but you know that will also thereby mean you're monitoring for you know a bad human actor giving them illegal instructions so that you know labs will be drawing up contracts with governments terms of service they will be thinking about you know the the guardrails any that go on are placed on the systems that they that they sell to governments. But I think you know there's very careful work to be done thinking through okay how can we structure those guard rails how can we explain them in a way which is you know very unarguable and it doesn't seem like we're kind of trying to like you know constrain the government you know private companies don't you know it's not really legitimate for them to kind constrain the government but you know I do think there's an important an important thing to be done here in preventing air enabled coups so kind of threading that needle there's another thing you could do in a government but you could in a lab but you also do that kind of work for a think tank or for a research organization that's kind of interlin with government like like Rand I think could potentially do some of this kind of work thinking about you know what should you know be being be in the terms of services between labs and governments let me think I I think another big thing is that for for for this system integrity yes we want labs to implement it but we also want there to be some external organization that can certify that currently no external or organization is working on this meter is not working on it Apollo's not working on it I don't think any evaluations organization is but there will be at one point in time a very strong demand for for for guarantees that the systems that these companies have developed don't don't have seer agents because if you're deploying a system in you critical infrastructure in in you know important government roles in the military, you want to know that it does what it says on the tin. Um and so I think that you know a company can can make a wise investment by developing that capability, the ability to understand how to insert secret loies, how to detect them, being able to assess a company's process to see whether it's sufficient. How how close would that work be to kind of interpretability work? Isn't isn't there overlap there kind of like looking into what is the system actually doing you know find kind of reverse engineering some some process to see how things are how things work is detecting sleeper agent like a specific constraint problem yeah it's a great question it's a specific type of problem so you could use internals work like interpretability to detect steep agents. That's not the only method you could use. You could use behavioral tests. You could look at the training data. I think a mixture of those is useful. It's also it's a specific adversarial setup. So in interpretability, we're just often interested in all kinds of kind of facts about the the model's behavior and its cognition. With with secret loyalties, we're we're specifically concerned with one red team trying to produce a secret loyalty. and then a blue team trying to detect it. And so it's it's adversarial in that you you know the red the blue team has a tough task. The red team will be purposely trying to hide the sleeper agent. And so that that that you know that will raise a whole host of new considerations and important things to dig into. So you can imagine being like allowing the red team you know full control over all aspects of the development process. Can they create a secret loyalty? Maybe that maybe they can. Maybe it's pretty hard to test. Okay. But now you only allow them to intervene at one part of the development process. Or you allow the the blue team to look at all the data that they use for training. Or you allow the blue team to look at all of the code that they wrote and you could get potentially quite a sophisticated understanding of the conditions under which the blue team win, conditions under which the red team win. This research doesn't need to be done in a lab. It it could be done by an external organization. I think it's, you know, big big missing missing focus of of of today's technical work. Um, and ultimately that that would inform the assessments of of the lab's attempts to do system integrity. So for technical researchers out there, I' I'd really highlight that possibility. Another another another kind of piece of work for for the right person would be beginning to understand the kind of existing military thinking around autonomous autonomous systems that this is already obviously a live issue for militaries. They are increasingly deploying AI. It would be nice to marry up that existing expertise with these kind of risks about more powerful systems enabling coups and kind of get to a consensus within that military community of basic principles like law following like distributed control over military systems and you know figure out a kind of a military procurement process which is both practical but also robustly prevents this kind of stuff. Um, so you know, if there's anyone listening that that hasn't has a way in, I think I think that's that's potentially pretty pretty valuable. Although there's also risk of poisoning the well if it's done badly. So, you know, see with some with some care. Yeah. Yeah. Perfect. Thanks for chatting with me, Tom. It's it's been great. Yeah. Real pleasure. Thanks so much, Gus.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs