Signal Room / In focus

Back to Signal Room
Future of Life Institute PodcastCivilisational risk and strategyFeatured pick

Why the AI Race Ends in Disaster (with Daniel Kokotajlo)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through Why the AI Race Ends in Disaster (with Daniel Kokotajlo), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 69 full-transcript segments: median 0 · mean -4 · spread -308 (p10–p90 -170) · 10% risk-forward, 90% mixed, 0% opportunity-forward slices.

Slice bands
69 slices · p10–p90 -170

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

  • - Emphasizes alignment
  • - Emphasizes safety
  • - Full transcript scored in 69 sequential slices (median slice 0).

Editor note

Anchor episode for the AI Safety Map: high signal, durable framing, and immediate relevance to leadership decisions.

ai-safetyflicore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video V7Q3DJ9V5CQ · stored Apr 2, 2026 · 1,871 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/why-the-ai-race-ends-in-disaster-with-daniel-kokotajlo.json when you have a listen-based summary.

Show full transcript
Think about the smartest humans, the best humans at any given field like John Vanoyman. Their brains are not very big and their brains were not even trained on that much data. That proves that it's in principle possible to have a relatively small rack of GPUs running a simulation of a John Bonman level intelligence. If the company had published here's how powerful our AIS are getting, here's all the eval results of what they're capable of, the goals and values that they're supposed to have. Here's a description of our alignment technique. Here's some stuff on like how we're going to check if it's working. If they just like published all that stuff, they have that stuff internally, then outside scientific experts could read it and critique it. But if instead you just make these sort of vague announcements about how for national security reasons blah blah blah blah blah, then like they don't have anything to work with, you know, can't actually contribute. Daniel, welcome to the future of life institute podcast. Thank you. Happy to be here. All right. Why do you expect the impact of AI to be enormous over the next decade? Several of these companies, uh, Anthropic, OpenAI, Google Deep Minds, are explicitly aiming to build super intelligence. Super intelligence is an AI system that's better than the best humans at everything while also being faster and cheaper. That's why I think the impactful AI will be enormous. I think that if you just meditate for a bit on all the implications of them succeeding at that before this decade is out, it will in fact be the biggest thing that's ever happened to the human species. I think is there a way to express the magnitude of change here? Well, it'll possibly be the end of the human species, for example. And if it's not the end of the human species, it will be a transition from humans basically running the show. One core to this prediction of rapid AI progress is the notion of AI beginning to speed up AI research itself. How much should we expect AI to speed up AI research? We are quite uncertain about this. AI 2027 represents our sort of best guess quantitatively what this would look like. The way that we think about it is we break down AI capabilities into a a sort of ladder of of capability levels and then we ask how long will each level come after the previous level. And I forget exactly what we have in AI27, but it's something like 6 months to go from autonomous superhuman coder to a autonomous agent that can completely automate the AI research process as well as the best AI researchers while also being faster and cheaper. So like six months there and then like a couple more months to get to super intelligence for the domain of AI research. So qualitatively better than the best humans at everything related to AI research while also being faster and cheaper. Uh and how much qualitatively better? Well, we we said something like two standard deviations, I think. Or maybe we said twice as far above take the gap between the best human and the median human researcher. And I think we said twice that gap. And then broad super intelligence would be like that except for everything, not just for AI research related tasks. And I think that's like, you know, a month or two beyond that. I I forget exactly what we say. If you're interested in the actual numbers, you can go look at you can go read 7 and you can go look at our attached takeoff forecast which has little back of the envelope estimates for all of these things. Again, with lots of uncertainty, but uh quantitatively it's something like that. The bottom line being we go in about a year from AI systems that are able to operate autonomously and therefore and successfully automate the job of programmers. Basically, you can treat them as like a remote worker who is a software engineer and who's really good at their job. It takes about a year to go from that to super intelligence according to our best guess. But it could go, you know, five times faster, for example, or several times slower. Um, how much does it matter whether it goes five times faster or say it takes twice as long for for the end date of of reaching super intelligence? Uh well the takeoff speeds is very important for overall for the dynamics of how this goes down right so like le let's say we fix the the date of super intelligence as January 1st 2029 and then we vary like the takeoff speeds such that in one world we get to the autonomous superhuman coder milestone two months before and in another world, we get to the superhuman coder, you know, several years before, such as by the end of 2025. Those worlds are very different. In in the first world, it's just going to hit humanity like a truck and the the president might not even know that the AIS of automated AI research within the company when the super intelligences already exist. You know, in fact, theoretically at least, the company might not even know. they might still think, oh, there's this really exciting project where we've like taken our latest coding model that still hasn't been released to the public and we've had it, you know, do a bunch of AI research and then, oh, whoops, super intelligence, you know, now it's hacked its way out of the servers. Now it's taking control of everything, right? That's what the sort of two-month world looks like. And then the like, you know, fouryear world looks completely different. Obviously, it looks like this uh crazy race between companies much like today where everyone can like plot lines on graphs and see that their AIS are incrementally getting better and better and more autonomous and closer to closing the whole research loop. And then they do close the full research loop and they're completely automating the research, but it's not immediately getting them to super intelligence. They're sort of like watching the lines start to bend upwards on all the graphs, but it's going slow enough that humans are like able to watch it and like talk to each other about it and, you know, make products and make announcements to the public. And, you know, there might be whistleblowers and the there might be multiple companies that are sort of reaching similar levels and watching those lines go up. And my guess is that most people are basically keeping their heads in the sand. Most people at the companies and in the government are basically keeping their heads in the sand about the first world and just sort of like telling themselves it's not going to happen. Uh and are instead planning for something more like the second world. So I think AI 2027 is a a bit faster takeoff than I think most people are in the government and in the companies are planning for. Yeah, it's a less convenient world I think. But yeah, how does the AI research and development multiplier work? Because at various points in the timeline, you have AI research going at say 100x the current pace or 250 times the current pace. Could you explain how this could possibly happen? Good question. So it's a bit I'm not sure if it's technically quite right to say like 200 times the current pace. The thing when we the multiplier is relative to a counterfactual in which you didn't use AIS for the research. So when we say they get you know they we think that the the superhuman coder milestone would be a 5x multiplier roughly and the superhuman AI researcher would be like a 25x multiplier. And then it it doesn't stop there. as you ascend to higher levels of super intelligence, you get up to like, you know, 2,000x multiplier and stuff like that. Um, but what that means is that imagine, you know, take the situation where you have the super intelligence and then imagine somehow that like it was banned from doing AI research and you brought in the regular human corporation to like pick up where it left off and keep doing the research. you know, then things would go 2,000 times slower is is the idea, you know, and and so it's important to note that this like like we're not saying for example that like like take take an take any given trend in some metric like for example compute efficiency like how much training compute it requires it takes to get a model of GPT4 level performance over the last like five years let's say that's been like cutting by like a third each year or something and then we get like this 2000x multiplier. We're not saying that it's going to go through 2,000 cutings of a third over the course of one year such that you end up with like a one parameter model that's as good as GPD4 or whatever whatever the math would work out to be. Like no, obviously there's bottom there's like diminishing returns like this particular metric of compute efficiency would like hit diminishing returns after a few more orders of magnitude and top out. But that doesn't matter for purposes of calculating the the multiplier because the multiplier is relative to if to how long it would take for the human scientist to do it if that makes sense. So in other words like you'd hit the you know you you you'd top out the diminishing returns in compute efficiency in like a few weeks instead of in like a few decades for example. Yeah. I I guess that's that's a good way to get an intuitive grasp of what it would mean to to speed up the pace of AI research. So one question here is how long do you think it would take with unassisted human AI research to reach super intelligence? 95 years to go from SI A to SASI 19 years to go from S to SI A. So, so S is AI system that can do the job of the best human AI researchers but faster and cheaply enough that you can run lots of copies and then super intelligent AI researcher is vastly better qualitatively is like that but qualitatively better than the best human researchers. Um, so we were thinking 19 years to go from S to SI A if you were just using ordinary human human scientist progress and then an additional 95 years to go all the way to artificial super intelligence. Obviously massive uncertainty about all of these numbers. These are these are our our guesses as to sort of what the pace of ordinary human scientific progress would look like. Now, to be fair, part of the reason why we did this, part of the reason why we we set things up this way is that, you know, it's it's it's impossible not to have some subjective guesses in a model trying to predict what the singularity will look like because we just don't know what the singularity will look like and we don't have enough evidence to sort of pin down exactly what it's going to be like. So we have to we have to like pick some variables and and make some guesses about them. And my thinking is that our intuitions about how long it takes ordinary science to accomplish things are at least somewhat grounded in you know the last 50 years of human science and the last 20 years of of artificial intelligence science and stuff. But I definitely wouldn't put too much weight on them. So what's happening in AI 2027 is that you have what would otherwise have been decades of AI progress being compressed into several years. That's right. Like like literally the way that the model works is we think like okay we query our intuitions for how long it would take ordinary human scientists working in ordinary human corporations to get from this milestone to that milestone. And it's like oh maybe like 20 years. You know this feels like a substantially more powerful type of AI than this type. But they are getting more powerful fast. I mean look at the last five years. But still this is like a big gap. So like maybe it takes 20 years you know and then we're like okay but then the multiplier shrinks it down a lot. Yeah. How can this happen when So won't computational power be a bottleneck. Won't it be the case that until you can get to next level AI you'll have to build a cluster? You will have to source the chips. That all takes time. It's a physical process in the world. estimates of how long it would take are supposed to be based on supposing that scaling up stops. So it's so it's like supposing that there's like sort of like an AI winter and they stop massively improving their their amount of GPUs and they basically keep at the same amount of GPUs they have now which would slow things down but we're we're we're sort of imagining that hypothetical and the reason why we use that hypothetical where the GPUs are slowed down is because that's the way to make it an applesto apples comparison to the case where it's AI's speeding everything up because they're not able to speed up the acquisition of new compute very So the thing they're speeding up is something like algorithmic progress. How do we know how much how much extra AI progress can be made from speeding up algorithmic progress? I mean it sounds like you're asking a question about the limits of of algorithms. And I would say those those limits are extremely far away from from where we are now. We're nowhere close to the limits of of what you can do with compute. So to and so here we could talk about the analogy to to biology. So think about the smartest humans, you know, the the the best humans at in given field like John Vanoyman or like you know their brains are not very big and their brains were not even trained on that much data you know and so that proves that it's in principle possible to have a relative like you could have a relatively small rack of GPUs running a simulation of a John Bonman level intelligence and you could train it with a relative atively small training run at least in principle you know we have that existence proof and if only you figured out like what was going on in in in John's brain you know what the hyperparameters were basically that like made him learn so fast and you know and so forth and so so that's that already proves that like you could get to something that is I guess I guess this would be like our superhuman AI researcher milestone But but then it's like also like John Bonma's brain had all these issues, right? Like it was a wet wetwware machine that had all these like extra physical constraints that it had to work around that you wouldn't have to deal with if you were had a more free design space of an artificial simulated brain. You could for example just add like 100 times more parameters you know which would be a big deal. You could also you know you wouldn't have to worry about like being able to heal damage and stuff like that as much. And like there's so there's so many ways in which you could probably amp up the power uh more than than starting from the John John Bonome brain thing. Also the the the way that they can have many copies to then learn from each other's experiences. Like that's that's a huge deal that John Bonomi can't do but artificial brains can't do, you know. And then Yeah. So, so, so I I'm I'm pretty confident that even without acquiring any more compute, just using the existing GPU fleet, it is at least in principle possible to gradually work your way towards something that qualifies as true super intelligence. It's it's definitely not the case that you like literally physically need more compute to have a super intelligence than this. One key question here is how much the world changes if we get to something like superhuman coding abilities. How much is that able to affect what happens outside of the data centers is one way to phrase the question. Normally normally innovation happens kind of gradually and you need this you need you need it to spread throughout society in a in a in a in a broad sense. um and that takes a lot a lot of time before you can have the kinds of transformations that you're forecasting in AI 2027. Why is why is why is what you're forecasting different from what has happened historically? If you want to have get a more better sense of what this looks like, you can read AI 2027. But the summary is partly due to just the speed of this transition and partly due to the fact that the companies will be focusing on doing this intelligence explosion faster rather than transforming the economy. And so they're going to be doing training runs and stuff that focus on teaching AI research skills instead of on teaching, you know, lawyer skills or therapist skills or whatever whatever other skills you'd have in the economy. The result is that the economy just mostly is looking the same as it is today with a few exceptions when there's an army of super intelligences that's been created. And then you know the army of super intelligences goes out into the economy and transforms it. But it's it's it's it's less of a gradual it's not very gradual. It's going to be like getting hit by a truck so to speak in terms of the the the scale and rapidity of the transition. An analogy I think would be in in some parts of think think about the history of of colonialism and there might have been some parts in in the world where it was quite gradual and like first you know they came on the ships and they set up trading ports and then they gradually like you know did a lot of technology transfer and maybe some immigration and then like gradually centuries later there's like this integrated society that contains a bunch of European settlers and also a bunch of natives and also the technology level has risen and it's all integrated. But then there were other parts of colonialism where it was like the Europeans came, they conquered, they brought their own people, they built their own cities, they set up their own factories like and then they just like pushed the natives out of the land, right? And I think it's going to be looking something more like that because even if it's peaceful, even even if it's like completely nonviolent, you know, you've got the army of super intelligences, consider some random industry that's, you know, like B2B SAS or like, you know, like machine machine engineering for for manufacturing and 3D printing or like whatever your pick an industry and then it's like all of a sudden there's this army of super intelligence. How are you going to compete with that? You're not going to compete with that. Like it they will just wipe the floor with you in so far as they devote any attention at all to competing with you, you know, and they'll just be limited by how much compute they have to to do all of the stuff. And they're probably not even going to bother directly competing in most industries because that's not even their best available option. the best available option is to just build a completely new self-sustaining economy, you know, in special economic zones where they don't have to worry about the red tape and they don't have to worry about all the fiddly little bits of competing in the industry and they can just like bootstrap to their own robot factories, robot mines, robot laboratories to do more experiments so they can get better robots, etc. And of course, they'll still be interacting with the human economy, but it'll be more like it'll be more like they accept raw materials as input and some manufactured goods so that they can go faster and in return they give IUS of various kinds, you know, like promises of of equity or whatever. Maybe they they do some software products that are cheap for them but utterly transformative for the for the human economy. Maybe they do some hardware stuff if they really need to. But um yeah, why do you expect this all to end badly for humanity? Again, you can read a 2022 if you if you want the answers to this, but after the army of super intelligences is in charge of everything, it becomes really important whether they were actually aligned or whether they were just faking it. And unfortunately, it's quite plausible and I would say even probable that they will just be faking it because our current techniques for understanding and steering and aligning AI systems are quite bad. They don't they don't they don't even currently work on the current AI systems. Current current AI systems lie and cheat all the time, even though they're trained not to do that. And if the future paradigm looks anything like the current area paradigm, we won't actually be able to tell what goals they actually have. We'll just be sort of looking at their behavior. And unfortunately, no matter how nice their behavior looks, that doesn't distinguish between the hypothesis that they just actually have exactly the goals that we wanted them to have versus they have some other goals and then they are playing along because it's in service of those goals to play along, right? And there's a lot more to say about this topic than this, but but I guess one one other thing I could say is that because we can't sort of like read and write to the goals directly and to the to the sort of inner thoughts of the AIS, we we are stuck on the outside doing this sort of behavioral training where we look at how it behaves and then reinforce it based on that. And it's just so incredibly easy. It's the default outcome to have a training setup that doesn't reinforce exactly the things you want to reinforce, right? Like you're you're trying to to whack it whenever it is dishonest, but since you can't actually tell what it actually thinks, you're you accidentally whack it sometimes for saying that things that it actually thinks is true, you know, and and sometimes you reward it for saying things that it it didn't think was true. And so you're actually not training it to be honest. You're training it to be dishonest in a certain sort of way, right? This is like very normal and just like this is unfortunately what you're stuck with if you're if you're if you're doing alignment in anything sort of like the current paradigm. Um there are lots of ideas for how to improve on this to be clear. Like you can go talk to alignment researchers and they'll have all sorts of ideas for for how to fix these problems, but the ideas tend to come at a cost. they tend to come at, you know, the cost of it takes more compute and you get an AI that's like somewhat less capable, for example, if you employ their fancy technique. So even if the techniques work and we don't know that they work yet, we would have to like test it a bunch and not even sure how you know if it was working but but but even if they are techniques that are going to work like you have to politically convince the relevant leaders to to take that hit and make that trade-off and slow down basically. Yeah. And that's that's I guess the difference then between the race scenario and the slowdown scenario in AI 2027 where basically whether there whether we have time to implement new alignment techniques or to do this properly where we are we're making sure that the the AIS are acting in our best interest that is in in in that is in turn determined by whether we are in a race between companies and between countries. I mean, I wouldn't say it's determined by it. Like, I think that people can still be ethical even if their incentives push in other directions, but I wouldn't bet on it. Yeah. Which which you yourself have proven basically, I think. Perhaps. Yeah. But thank you. But but um back to what we were saying. Yeah. like like in AI 2027 there's this sort of like one choice point where the story branches and the choice point is basically do they slow down to implement some costly alignment techniques or do they just sort of like implement the least costly alignment techniques that don't actually show them that much but also don't work but they don't know that they're not working you know and so that's how you get the like the world where the AI are secretly misaligned and then the world where the AIs are actually aligned And you know, we get into some technical detail in A27 describing the nature of the choice that they're making and the the particular alignment strategy that they do, but like the thing I'd want to say here is that actually it's going to be multiple choices like that. There's like basically there's going to be an extremely exciting and stressful year where a a whole series of choices like that are given to are made by by the leaders of the AI project. choices that basically look like we could design our AIS in this way which would make them, you know, really smart, really fast, etc. Or we could do this way, which is safer because it's more interpretable or something like that, but they're not as smart, not as fast, you more expensive, etc. There'll be a whole series of choices like that. And part of my pessimism about how this is all going to go is that I just expect them to basically pretty much consistently make the the the choice to go faster rather than the choice to slow down because of the race dynamics and because of the character of the people uh running this show. I think that that's sort of like the type of choice they've made thus far and and I expect that to continue basically. So I'm not at all saying that alignment is impossible technically. I think it's it's it's definitely a solvable problem and there's a bunch of good ideas lying around that people are working on for how to solve it. But 2027 and I predict that it won't in fact be solved because the leaders of the relevant companies will be too busy trying to beat each other. How much does transparency matter here? And I mean transparency from the of the AI companies themselves. So public insight into what they're doing, how they're doing it, how well they're succeeding at their stated goals. Uh I think it matters a lot. is my go-to recommendation for what governments and companies should be doing now. AI27 illustrates a situation where basically all the important decisions are being taken behind closed doors by CEO of a company possibly in consultation with like the president's adviserss who might be looped in but not in consultation with like the scientific community or with the public or with like other companies or with outside expert groups and nonprofits and so forth. And the reason why they're not in consultation with those groups Well, I guess partly they just don't feel like they have to, but partly also it's that they've been keeping things secret by default. You know, they've occasionally published a new product or occasionally made an announcement and and at one point there is a whistleblower, but but broadly speaking, the default is of course we don't tell people about what's going on inside our data centers with all the AIS automating the stuff and whatnot. like that would leak information to China and to our competitors and and we don't want to do that you know so so things are sort of secret by default and what that means is that people on the outside including the scientific community are stuck sort of guessing as to what might be going on inside and can't meaningfully contribute on a scientific level to making it safe right like if if the company had published like here's what we're doing here's how powerful our AIS are We're putting them in charge of all these processes. We're also putting them in charge of these processes. Our plan is to have them do more AI research. Also, here's all the eval. Also, here is the spec that we're trying to train them to have and the goals and values that they're supposed to have. Also, here's our safety case for like here's here's a description of our alignment technique and an argument with assumptions and premises for for why we think our alignment technique is going to work. And maybe also like here's some stuff on like how we're going to check if it's working. And you know if they just like published all that stuff they have that stuff internally right this is you know documents like this exist internally for for managing the whole thing. If they just published all of that then outside scientific experts could read it and critique it and could say oh like this assumption is false or like I see a way that that this could be disastrous like what if the following conjecture is true. then you know your evals would would come back positive even though it would be a false positive or you know even though actually things are dangerous right so there could be all this like in scientific progress being made if you only roped in all these people on the outside but if instead you just make these sort of vague announcements about how our AIS are getting very powerful and for national security reasons blah blah blah blah blah and that's why we're doing this merger with you know then like they don't have anything to work with you know can't actually contribute is this already happening So to what extent is is the frontier of AI development already happening in secret? It is already happening. I mean this this is like AI27 is is basically what happens if we don't change things dramatically from from the current status quo right like currently by default everything is secret in the companies and then they can sometimes choose to publish things right they might publish a paper on on some alignment technique that they tried or whatever but and that's and I'm happy that they are publishing some things that's better than nothing but I think we have we have a lot more that we need to do if we want to actually you know if you think about like humanity has you know what is it 7 billion people in it and maybe something like 700 people who uh have expertise in super alignment you might say 700 people who've like actually spent at least a year working on how do we understand control steer align AGI level systems and above um and maybe something like 70 people who are like really good at it as opposed to just competent And meanwhile, how many people are going to be actually at one of the companies? Like each company has only a tiny fraction of those people, you know. And moreover, it's a sort of biased group, right? It's it's not like a representative sample of all 700 people. It's particularly people who are, you know, at the company. There's a more of a group think risk and a sort of incentives risk. And it's very easy to imagine even if all the people, you know, there's something like 10 people working on this at the company, it's very easy to imagine them all sort of falling into a sort of group think trap um and and being biased towards an overly optimistic conclusion. So, so both quantitatively and qualitatively, I think we'd be much more likely to figure out the technical stuff if there was this transparency. And then that's not even taking into account the fact that like governance-wise things are a lot better if there's transparency. So that was just on a technical level like how much human brain power do you have trying to look at the warning signs, look at the evidence and figure out good techniques for keeping things aligned. But also I think that if there was transparency into what was happening then other groups like Congress would wake up and you know demand more answers and start to negotiate regulations and treaties and things like that. Right? So you'd be much more likely to get an actual change in the incentive landscape for the race and to get an actual sort of easing up of the race conditions and a bit of a slowdown that enables more time to solve the technical problems. if only people knew the stakes and sort of knew what what was happening inside these projects. And then finally, even if you're not at all worried about the alignment stuff and you think that that's all just going to be trivial, there's the concentration of power stuff. So like it's an incredibly important press like who controls the AIS and what goals and values does the army of super intelligences have and like whose orders are they listening to? And unfortunately right now it's it's we are on a trajectory for it to not even be not only is it the case that like CEO gets to decide but it's also the case that like it can be secret you know there can be literal hidden agendas that the AIS have and this has happened I think at least twice that I know of sort of splashy scandalous examples that that that you've probably also heard about. One was the Gemini racially diverse Nazis thing and the other was Grock being instructed to I think not criticize Elon Musk I forget exactly what it was or Donald Trump. So those are both examples of the company putting in a hidden agenda into the AIS, you know, and and having them pursue this sort of somewhat political agenda and not tell the users about it, you know, and that's all fine and funny when we're just talking about chat bots, but if you have a literal army of super intelligences, it's deadly serious if the CEO can be giving orders to that army and Nobody knows about what those orders are, you know. So, so also for the constitution of power reasons, it's really important that for example, companies be required to to publish their spec of like what are the goals and values that we're putting into the AIS or that we're at least trying to put into the AIS, you know, and like what's our command structure for who gets to say what to them and you know, probably we should log interactions with AIS as they get smarter so that so that if there's a paper trail If if someone was basically using the AIS to try to accumulate power over their rivals internally, would you expect more models to be deployed only internally at the companies? Why are there incentives set up such that only employ deploying internally is is the most valuable option? Yeah. So this this is about takeoff speeds basically. So consider the like two-month takeoff that we talked about. In that world, you you don't really need more investment. And if you did get more investment, it wouldn't even be helpful. Like if you if you managed to get some some very rich clueless investors to give you another hundred billion dollars, you couldn't actually translate that into more compute on very short time scales, you know, because it takes time to negotiate these deals and you need to build the new the new data centers and then you need to make sure that they're secure and integrated into your network without without causing vulnerabilities and stuff. So basically like there'd be like almost no need to to try to raise more money in that world. And so you would actually like be incentivized to like stop selling products, you know, like like why have half of your compute serving customers when you could instead be using that compute to do research and to go even faster, you know? Um, by contrast, in the like five-year world, then things are slow enough that like you you kind of need to keep this this flywheel going of making products, making money, attracting investors so you can get more compute, so you can make more, etc. Um, I think that AI 2027 depicts this more intermediate world where it's sort of unclear how it would go, but I think it could go either way. And I think that we could very well see companies basically devoting more and more of their compute to internal R&D rather than to serving customers. even though the like profitability of serving customers would be rising even even though like their new AIs would be really powerful and could make a ton of money on the market just because it's even more valuable to use your the the compute you have and the models you have to further improve uh AI. Exactly. Even even if these models are only deployed internally, how do you manage them? How do you control them? How do you how do you oversee a team of of AIs that are thinking much faster than you are that are simply better at coding than you are? I mean, do you mean like what are they going to try or is it going to succeed? Let's let's let's talk about both. Perhaps you can explain what they're what they're likely to try and then why perhaps that that won't succeed. So, they'll definitely probably have lot well they'll probably have lots of monitoring, right? where they have like older AIs looking at all the transcripts of actions taken by the newer AIs and trying to flag anything that looks suspicious. Um, so there'll be this sort of like AI police state of AI watching other AIs, etc. And humans will be sort of embedded in that at some level. They won't have enough capacity to actually look at everything, far from it, but they'll be reading summaries and and investigating particular warning sign particular cases and stuff like that. So maybe that's an answer to the the monitoring question or the oversight question. And then like as far as the alignment techniques, like I mean in so far as you see examples of egregiously misaligned behavior or ambiguously misaligned behavior that could have just been an innocent mistake, what do you do with those examples? A very tempting thing that you can do is basically just optimize against them or train train against it. And the classic issue with that is that you're simultaneously training you're just training the system not to do that sort of thing which which could easily result in the system not actually being aligned but instead just being better at noticing when you know when it can get away with stuff and when it can't. Right? That's the sort of stuff that I expect to be happening by default. And then there's the question of will it work? And again my answer is probably not if that's all you're doing. If if you're just sort of if if you're if you're doing basically only the stuff that doesn't cost you, that doesn't slow you down at all, I don't think that's going to work. It's going to look like it's working because the AIS will be really smart. And so at some point basically as the AIs are getting smarter than you and as they're developing sort of like longer term goals and they're able to sort of strategically think about their situation in the world and how they can achieve their goals, then it's going to look like it's working because it's in their interest to make it look like it's working. You know, like yeah, company management wants to go as fast as possible to beat their various rivals. And so they want all the all the checks and warning signs. They want the warning signs to go away and they want all the like eval to come out like all systems go. And guess what? The AIS are going to want the same thing because they also want to go fast and be put in charge of stuff and to be given more power and authority and trust. Uh or they they will if they have these sort of like longer term goals because then it's it's useful for achieving your goals. If you if you have more power and authority and so forth. And so they'll make sure that all the all the red flags don't appear and that the and so forth. Does the fast pace of AI research and and progress in in general that you project in in in AI 2027, does that depend on these superhuman AI coders communicating with each other in ways that we can't understand? Yes, sort of. I mean, it would still be a quite fast pace if So, for example, in A27, they they go back um in in the ending where they survive, the humans switch back to a they make they make one of these costly trade-offs and they go back to a faithful chain of thought architecture and they actually do additional research to strengthen the faithfulness properties of the chain of thought so that they can actually trust that they're reading the actual thoughts of the AIS. Um, and that comes with a performance hit. It sets them back a few months and it sort of slows things down, but it works. And you know, compared to what a lot of people are expecting, it's still overall a very fast pace of progress and it still counts as an intelligence explosion. They still get the super intelligence, you know, in a few months. Do do you think we got lucky with chain of thought? Is it is it helpful for us to be able to read what at least seems to be the inner thoughts of of AIS? Oh yes, it's extremely helpful and it's quite lucky and we best make as much use of it as we can while it still lasts and hopefully we can try to coordinate to make it last longer. Unfortunately, we think that, you know, the industry will gradually move on and find more efficient methods and paradigms that are more powerful but don't have this lovely faithful chain of thought property and then there'll be economic pressure to to switch to that stuff and that's what we depict happening in A27 and that's this is just finding some way to communicate between copies of AIS that is more information dense than than watch than than writing in English for example. There's a cluster of related things. So one there's a a single AI talking to its future self. Currently there's a sort of natural language bottleneck where as it's like autogressively generating text, it literally cannot communicate with itself after a certain distance in the future except through the text itself. And so there's this like compression. There's this bottleneck where like it has like all these incredibly complicated thoughts in its, you know, billions of parameters, activations, etc. But then it can't just like send those thoughts into the future in any form. It has to have those thoughts produce some tokens and then only those tokens go into the future, right? So, so, so there's that dimension in which they could be potentially a lot more capable if they didn't have an English chain of thought basically or not one that was capturing their real thoughts but was instead just like a layer of icing on top of the cake of their real thoughts, you know. But then there's also communicating between, you know, different agents in different parts of the company that are working on different tasks, but they can send messages to each other. Do the messages have to be in English or can they send like highdimensional vector messages? Right? And then there's the question of like, well, what if it's not a high dimensional vector? What if it still is text, but it's not sort of like legible English text that actually means what it says it means? Perhaps it's some sort of hyperoptimized text that's basically in some alien language that's more efficient than English that they've learned over the course of their training. And that's sort of like an intermediate case because it's probably easier to interpret than the highdimensional vectors, but it's still like alien language that needs to be interpreted. Um, and so that raises issues. And then there's another version of of the loss of faithfulness which is that even if it is in English if the model is smart it can you know use euphemisms and and in sort of be discreet about what it's saying so as to make it the case that humans and monitors looking at it don't notice some of the subtext, you know. And so that's a way in which the chain of thought can be unfaithful even if it's in English, right? And so more research needs to be done into that to try to like stamp that out and make it not as possible as it currently is. In the scenario in which we're in a race, the researchers at at OpenBrain try to automate alignment and it fails. Why why do you expect that automating alignment would fail? Um I mean the whole problem is we don't trust the AIS. So if you're putting the AI in charge of doing your alignment research, there's a sort of cart before the horse problem. So that's that's the first thing. It's not an insurmountable problem. So for example, like maybe you do have some AI that's like so dumb that you trust it and then you can try to like bootstrap to smarter AIs that you can also trust because they were you know, controlled or aligned by the the previous AI. So there's there's stuff you can do there, but but the core problem that you need to have a really good answer to is like why do we trust these like yeah well somehow the trust has to transfer from the humans all the way through to the super intelligence. And then there's a there so there's another issue which is even if you are doing some sort of strategy like that where you have like the dumber AIs that you actually do trust there's a question of like well maybe they make mistakes you know it's it's one thing to train AIs to solve coding problems uh it's another thing to train AI to solve alignment like how do you get the training signal for that you know you can like throw all this text at them of like all the stuff that's been written on alignment so far But it does feel like a a domain that's like less checkable than than normal AI research for example. So so it's it's it's more possible to get a good training environment with a good training signal for getting your AIS to do AI research than it is to get them to do alignment research. So I think there the answer to that is to do some sort of hybrid thing where it's like you have human alignment researchers who are sort of managing and directing the research and making the judgment calls but then you have AIS that are like rapidly writing all the code for example that feels like something that's definitely doable but but the point is that you you still need those that like in that world you're bottlenecked on the quality of the human researchers basically uh the human alignment researchers as you mentioned you also bottleneck on the fuzziness of of the concept of alignment itself where it's quite difficult to specify and write down what you even mean and and yeah train on that as a as a reward. Yeah. So, so there's definitely like I'm not hopeless about this. Like I think there's lots of good ideas to try and things like that and I could say a lot more about it, but the sort of again like on like a meta level a big part of the problem that we face is that we are in a domain where there are silent failure modes. I mean in in most domains there are some silent failure modes. Like if you imagine like you're designing a car and you want it to be safe, most of the ways in which your car can be unsafe will be immediately apparent in even basic testing, you know, like the engine catches on fire when when you try to start it or something like that. But then there are some ways that your car can be unsafe that don't appear in testing. like you know the the metal that you used was like a bit too brittle or something and so after 10,000 miles it like starts to wear down and then this component breaks or something like that that's like harder to discover through through testing. With AI alignment, it's like there's this whole category of plausible silent failure modes where your AI is, you know, not actually aligned, but pretending to be or it's not even pretending yet, but like at some point in the future, it will realize it's misaligned and then it will pretend, which is even harder to fix because like if you look at its thoughts right now, you would see nothing wrong, you know? So there's this whole there's all these like possible silent failure modes, but then unlike with the car, we can't just afford to actually fail sometimes. Like with the car, it's like, okay, you actually killed a bunch of people, but you just recall it and like fix the part and so forth. But with the AIS, if halfway through the intelligence explosion, as your AIS are automating all the research, including all the alignment research, if they decide that they're misaligned and they decide not to tell you about that, you're just screwed. like you you're not going to recover from that, you know. Do you think the main danger and risk here is inherent to AI as a technology or is because we're developing it in a specific way, specifically under these conditions of extreme competition between companies and between countries? I guess it's a bit of both. like the technical difficulties would still be there if we were developing it without a race condition, but we'd be much more well positioned to solve the technical difficulties if not for the race. I highly recommend looking into AI 2027. There's a lot of details that we couldn't possibly cover in a podcast like this. Maybe you can tell us a bit about the work that went into creating AI 2027. So, it took almost a year. It was me and the AI futures project team which is Eli Lifeland, Thomas Larson, Romeo Dean, Yonas Wulmer, and then we also got Scott Alexander, the famed internet blogger, to rewrite a bunch of our content to be more engaging and easy to read. Uh, and I think that was uh, pretty important for the overall success of A27. We did a a couple early drafts that we just completely scrapped to get ourselves used to the methodology. The methodology of forcing yourself to write a concrete specific scenario at this level of detail that represents your best guess rather than simply trying to illustrate a possibility. For example, like we we were we weren't just trying to illustrate a possibility. We were trying to like g give our actual best guess at each point of like what happens next? What happens next? What happens next? And so we did like one or two versions of this over 2024 that we were basically just for practice. And then we had our final version that was then undergoing multiple rounds of like feedback from hundreds of hundreds of people or like a hundred people or so and was being heavily revised and rewritten by Scott and then revised again and so forth. And then our initial draft of this, we just had one scenario and it ended. It was basically what you now see as the race ending and we wanted to have a more like optimistic good ending, but we so we we then made the the branch point and we we tried to think like, okay, well, what what would it look like to solve alignment, but still within the same sort of universe as the first scenario? And then you know what would that outcome look like? And so that was what the branch point was. Yeah. We also did a bunch of these war games or tabletop exercises as part of our process for for writing this. So we would get a bunch of friends slashexperts in the room maybe about 10. And then we would assign roles you know you're the leader of this company you're the president you're the leader of China. And then we would start in early 2027 and we would roll forward and ask everyone what you what do you do this month, what do you do next month and so forth and we did about like I mean at this point we've probably done close to 50 of them um because people found them uh people like it quite a lot actually. So we keep we keep keep getting all this inbound request for us to run these war games with different groups and it really was a good sort of like writer's block unblocker to have done all of these rollouts with all of these different groups of people. It it helped us to like have more ideas and also feel like we had a better sense of like what was possible and what wasn't when we were writing the race ending and then the slowdown ending. What do you learn from doing these these war game exercises? Because I'm I'm thinking if you're playing the role of the American president or the leader of China or the leader of of the of the top AI company and so on, it's quite difficult to simulate what they're thinking and if things are moving very fast and the the the leaders have a lot of of of of power, they're kind of the specifics of their psychology can matter a lot. So, how do you how do you think about simulating uh decisions made by these people? I mean, it's certainly like a very low res untrustworthy simulation, but the question is, is it better than nothing? Uh, and I think in moderation with grains of salt, yes, it is my guess. The thing that I usually say is, I mean, yeah, the future is really hard to predict. Who knows what's going to happen? the default strategy is to not think about it very much at all and uh that's not so good because it feels like it would be extremely important to have a better sense of what might happen and and what we might do and so forth. So then like the next default strategy after that is to think about it but in a sort of unstructured way of like you know you're at the cafeteria and you're chatting with your buddies about like what AGI might look like and so forth and that's cool too but this is a bit of a more like structured and organized way of doing this basically where you have 10 people and instead of just a free form conversation where like people can get into arguments about like X and Y and Z you say okay let's let's let's talk with this scenario and then we'll do the next two months and we'll do the next two months and so forth. And you can still think of it as a collaborative conversation where everyone's talking about what they think happens, but there's a sort of division of responsibilities like you talk about what you think this actor would do. You talk about what this actor would do and so forth and then so far as people disagree, then you argue about it and then we make a decision based on what the aggregate of the group thinks. We take a vote, right? And so then the result of the war game can be thought of as like an aggregate of what the people in this room think would happen having thought about it for a couple hours and sort of talked it over step by step in this structured way, you know, and then there's the question of like, well, what do these people in the room know? Probably not that much. Maybe it's not not super representative of what will actually happen. And it's like that's true, but this is this is a start, you know, especially if you get people in the room who are actually relevantly similar to the people who will be making decisions. Like you get people who work at the AI companies to play the AI companies. You get people who do technical alignment work to play the alignment team. You or andor the AIS. Um and you get people who work in government to play the government. You had this essay in 2021 that was quite successful in predicting five years out in the future. What's the what's the lesson you took from that? Is is it that you need that we need to when we make forecasts, when we make predictions, we need to trust kind of trend extrapolations more than we would intuitively think we should. I mean maybe for me I already sort of trusted the trend extrapolations to what I would think is an appropriate amount which is why I made those predictions and why they were correct. I guess if if someone is wildly surprised by how I managed to be so correct then you should probably update more towards my methodology being a good methodology but I wasn't that surprised and so uh so it was less of an update for me. I think one of the biggest things I got wrong with what 2026 looks like, which was my earlier blog post, was the aggressive. So I I predicted a sort of change in the social media landscape that seems to be sort of happening but at a slower pace than I think I predicted. So in in 2020 21 when I wrote this I my reasoning was basically language models are going to be amazing at censorship. They're going to increase the quality of censorship allowing sensors to like make finer grain distinctions amongst content with less false positives and false negatives while also reducing the cost of censorship dramatically. and they'll also be good at propaganda, but that's less important. And so my prediction was that the internet would start to sort of balconize into, you know, I sort of cynically predicted that that the the leaders of tech companies and the leaders of governments would would not coordinate to resist the temptation to use censorship and propaganda technology, but instead would like quickly slip into it and end up aggressively using language models as part of their social media recommendation algorithms and stuff like that. um to sort of like put their thumb on the scales and and advocate for the political ideologies that they that they like. Um and I predicted therefore that the internet would sort of balconize into like a western left-wing internet and then a western right-wing internet and then maybe like a Chinese internet and maybe like a couple other clusters as well. and that, you know, people unhappy with the censorship and propaganda on some social media platform would then move to other platforms that have that cater more to their own tastes with the type of propaganda and censorship that they like. And this has in fact happened, but I would say probably not as fast as I thought it would or something like that. Like right now we have True Social and we have Blue Sky and there's a lot of people sort of like self- sorting into those and there's like Elon purchasing Twitter and so forth and changing it from a sort of blue thing to more of a red thing. I guess it's hard to say like I I didn't have like a a a good quantitative metric for like measuring the extent to which this is happening, but it does feel to me like it hasn't gone quite as far as I thought. And the lesson that I take from that is one about being careful about about the syllogism of like this is possible, people are incentivized to do it, therefore people will do it. I think it's like that's sort of true, but it like might take longer than you expect for people to actually do it. So you you described this technique of iterative forecasting that you used in in in both your 2021 essay and in in AI 2027. So when you do iterative forecasting, you lay out what h what happens say in in one year or or in one month and then you base your next forecast on what you've written down. What what do you do if if if the forecast begins sounding crazy? How do you how do you take a sanity check of what's happening? Because this seems to me like something that could easily veer off into into fantasy land. In some sense, you're constantly doing sanity checks. And that's why it took us so long to write this is that we would like, you know, write a few months and then we'd be like, "Wait a minute, that doesn't make sense." Like, "They wouldn't do this." Like, like, let me try to think of some examples. Yeah, that would be super useful actually because I think I think I think in one of the earlier drafts we had something like, "And then the US like does a bunch of cyber attacks against the Chinese AIS to to destroy their project or something." And then we were like actually that doesn't make so much sense because probably they would have really strengthened their security by now like this is already late 2027 like you know I don't think offense defense balance works that way or something like that. So then we like undo try again you know I think there was another example of I think in one earlier draft we had in the race ending where the where it was misaligned AIS on both sides. We had them basically just like make a deal with each other to screw over the humans. Um, but then we were like, well, that doesn't make sense. How would they enforce such a deal? How do they trust that the other side is doing it? And so we had to sort of like rethink things and then ultimately we ended up with something similar, but we had like a lot more to say about exactly what that deal would look like and how they would enforce it. I'm not sure if that's an answer to your question, but yeah. So like we were constantly doing this sort of like does this make sense sanity check and we were constantly getting feedback from external experts and stuff criticizing various parts of the story as unrealistic and then trying to incorporate that feedback as best as we could into changing things. One weakness of this method is that like if you get all the way to the end and then you realize there's you made a mistake at the beginning, it's rough because you have to sort of throw away that entire you basically have to just because then you like wasted a lot of work if you built on this this false premise or something. And that's just like the price you pay I guess for doing this methodology is that you you run a risk of of some wasted effort. Although then again I think that it's not like other methodologies don't have problems either. I think of this as as a complement to other methodologies rather than a substitute. What what you're doing is extrapolating the the compounding effects of of AI progress. And there I'm wondering what happens to the to the reliability of the forecast or to the uncertainty of the forecast over time when you do that. Oh, it massively like every additional, you know, chunk of time that you forecast, you're layering on additional choices. And so the probability of the overall thing can only go down as you make it longer. Like every is every every additional claim you add to the conjunction lowers the probability of the whole thing. So honestly, it's quite amazing that that the first thing I did, what 2026 looks like, was anywhere as close to correct as it was because there was so much like so many like sort of conjunctive claims being added. And I'll be quite pleased with myself if 2027 is as close to correct as as what 2026 looks like was because yeah, it's sort of being more ambitious. Yeah, you're forecasting out to 2036, I think, in both scenarios where and you're also forecasting much much grander changes to to humanity than than you did in in what 2026 looks like, right? Then the second part is much more important like the I think I think the relevant complexity thing is the relevant thing is like how much like radical change are you are you how many how many stages of AI capability are you sort of going through. Yeah. So AI 2027 is something that could be falsified quite soon. It it could we could get to 2030 and the world looks very different from from what you've you've projected here. What would be some clear signs that we are not on the path that you forecast in AI 2027? The best one would be benchmark trends slowing down. So most benchmarks are useless, but some benchmarks measure what I consider to be the really important skills. Why why do you say that most benchmarks are useless? because they don't measure something that's that important or that predictive of future stuff. So, so and or because they're already getting saturated like a lot of benchmarks are like multiple choice questions about biology or something and it's like well it used to be useful to to have benchmarks like that but now it's like we know that the AIS are already really good at all that stuff really all that world knowledge stuff and if they're not it's quite easy to make them good at it. So, so like multiple choice questions are basically like a solved problem almost. What I think is the new frontier is agency, long horizon agency operating autonomously for long periods in pursuit of goals. And so there are benchmarks like meters rebench and there like horizon length benchmark that are measuring that sort of thing. Agentic coding in particular there's you know there's agency in all sorts of different domains. I think that Pokemon is a fun sort of like benchmark for a long horizon agency as long as the companies don't train on Pokemon at all because then it's an example of generalizing to something completely different from what they were trained on. But anyhow, so so there are some benchmarks I would also mention maybe like Sweet Bench verified and stuff like that, but they're not as good. And I think OpenAI has a paper replication benchmark. There's a couple others, but basically aentic coding benchmarks, I think, are where it's at. And there's been rapid progress on them in the last six months. And if that rapid progress continues, then I think we're headed towards something like a 2027 world. But if that progress starts to level off or slow down, then my timelines will lengthen dramatically. And contrary contrary wise, if if the progress accelerates even more, then timelines will shorten. Do you think we're seeing an an an acceleration of of the trend lines with reasoning models? Yes, but it's the acceleration that like I predicted and so it's we're sort of on trend, you know, like when so the the meter stuff came out after we had basically finished AI 2027, but we went ahead and plotted the graph and sort of like fit it to what AI27 was claiming and it was like more or less on trend and then we like went ahead and just like made it actually part of the prediction. So we we we we we assumed a 15% reduction in difficulty of each doubling of horizon length and then that made a reasonably nice line that we extrapolated and we like included that in the research page as part of our timelines forecast. And so so there's like a canonical so AI 247 makes a canonical prediction about performance on that benchmark over the next couple years. And so will be very easy to tell if it's going faster or slower. I think there are other things to say besides this, but that would be the main thing. What are those other reasons? So there's the the next thing to say is that it's possible that we will get the benchmarks crushed, but then still not get AGI. So we could get to the point where yeah in 2027 we've got coding agents that can basically autonomously operate for like months at a time doing difficult coding tasks and that are quite good at that and yet nevertheless we haven't reached the superhuman coder milestone. That's a bit harder to reason about and unfortunately it's going to be harder to find evidence about whether or not that's happening. You basically have to reason about the gaps between that benchmark saturation milestone and the actual superhuman coder milestone. Like what is it that that could be going on there? Well, maybe it's something about like AI is getting really good at checkable tasks but still being bad at tasks that are more fuzzy. I I think that's possible, although I wouldn't bet on it. I think that by the time you're really good at month-long checkable tasks, you probably learned a bunch of fuzzy tasks along the way basically. But yeah, we'll see. We'll see. As a last question here, what is next for the AI futures projects? We are doing a bunch of different things and then we'll see what sticks. So, a lot of people have been asking us for like recommendations. You know, A27 is not a recommendation. It's a forecast. We really hope it does not come to pass. But so now some of us are working on a a new branch that will be the what we actually think should happen branch which will be exciting. We're also going to be running more tabletop exercises because there's been a lot of interest in those and so we'll keep running them and see if that grows into its own thing. I'll continue doing the forecasting. So, we're actually about to update our timelines model. Good news, my timelines have been lengthening slightly. So, I I now feel like 2028, maybe even 2029 is better than 2027 as in terms of a guess as to when all this stuff is going to start happening. So, I'm going to like do more thinking about that and publish more stuff on that. Yeah, miscellaneous other stuff. Oh, yeah. responding to all we have a huge backlog of people who sent us, you know, comments and criticism and alternative scenarios and stuff like that. So, I'm going to work through all of that and respond to the people uh give out some prizes. Sounds great. Daniel, thanks for coming on the podcast. Yeah, thank you for having me.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs