Library / In focus

Future of Life Institute PodcastCivilisational risk and strategy

How Humans Could Lose Power Without an AI Takeover (with David Duvenaud)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through How Humans Could Lose Power Without an AI Takeover (with David Duvenaud), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedSocietyHigh confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 89 full-transcript segments: median 0 · mean 0 · spread -28–9 (p10–p90 -10–9) · 1% risk-forward, 99% mixed, 0% opportunity-forward slices.

Slice bands

89 slices · p10–p90 -10–9

Mixed leaning, primarily in the Society lens. Evidence mode: interview. Confidence: high.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 89 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyflicore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video j0D5X9dk5K0 · stored Apr 2, 2026 · 2,295 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/how-humans-could-lose-power-without-an-ai-takeover-with-david-duvenaud.json when you have a listen-based summary.

Show full transcript

You could sort of have a situation where the virtual beings almost like dominate the humans in every single access of moral value and now suddenly it starts to look like criminally decadent to be spending kilometer of these land on the legacy of humans. It's hard to draw a hard line between having a permanent income and being like some sort of parasite and that's going to be the cultural battle and it's going to be really easy to say that humans are parasites once we're not providing value to like the larger growth engines. Yes, there will be cool interesting stuff happening in the future if we allow competition to run. But like we just probably won't be meaningfully part of that. A lot of people are basically going to say like my only hope is to sort of be the first one to bow down to the new overlords and embrace this new culture. Sometimes people ask like oh aren't corporations super intelligences. Why shouldn't we fear them? And the answer is because it's made of people so it needs us. There's this idea of like the singularity which I feel like has been very destructive because it kind of is like an excuse to turn off your brain and to not model the future and to say yes things will keep changing faster and faster until we can't say anything about it. Let me say it loud and clear here is that like yeah I think that the post world is just going to be extremely alien and so different that if we could avoid crossing that threshold I think we should. >> David, welcome to the show. Thanks for being here. >> Thank you for having me guys. >> Great. Do you want to introduce yourself? >> Sure. So my name is David Duveno. I'm an associate professor of computer science and statistics at the University of Toronto. I've been working on probabistic deep learning for a number of years. And then I guess in the last few years I decided to use my freedom to try to focus on the problems that seemed most neglected and intractable which has led me to first work on more technical AI safety. So I was a team lead at Anthropic for a year and a half starting in 2023 working on sabotage evaluations. And then I felt like there's this sort of me and other people were noticing there's an even bigger missing part of the alignment problem which is how do we align our entire civilization which of course is even more intractable and even hard to describe this problem. So that's what I've been thinking about lately. >> You have this fantastic paper on gradual disempowerment and so so let's start there. Let's you know can you explain how gradual disempowerment would be different from an AI takeover? >> Yeah. Yeah. So I guess from my point of view, this all started with a lot of lunchtime rants around the table at Anthropic and also elsewhere talking to all sorts of people saying, "Okay, so what if we succeed? What if we build these agents that really can do everything we want them to do better than humans and we're not worried that they're secretly trying to betray us or anything like that?" Um, it seems like, you know, I would ask people like, "What are you personally going to be doing afterwards?" And, you know, people just didn't have much to say. They would say things like, "Oh, I'm going to be clicking accept suggestion all day." Haha. Or I'll be [snorts] taking a Beler vacation. And um and I think I'm a natural kind of worry word in a lot of ways. But I felt like to me the situation that jumped to mind was being like a retired person who's like out of touch and maybe has some savings or like you know vote or some sort of like legacy power. But there's all sorts of more sophisticated actors around that if you like let them act freely and the the old person act freely, you sort of know that they're going to eventually lose their money or be somehow rooted around you know um like you know people used to talk about information wants to be free and it's also kind of like I guess power wants to concentrate on people that are providing value or something like that. Um, so anyway, I it wasn't very well fleshed out, but I just felt like this seemed like an obvious thing to worry about and I was just confused about how other people around me weren't very worried about it except for a few uh people like I met Shaen was someone who seemed to get it and then I ended up uh David Krueger introduced me to Yan Kvite who had basically already fleshed out a lot of these ideas and he asked me to join this paper and we kind of got it into a more like into a very digestible shape and got it out there. Do you think a scenario of gradual disempowerment could look like nothing from the outside? So does it look like catastrophe or does it look like society is functioning and nothing has really gone wrong? >> Uh yeah. So definitely from the point of view of the normal health indicators of society, I think it looks like nothing has gone wrong. And part of that well is that cultures and and institutions adapt to measure the things that produce growth. Like that's maybe one of the running themes of all this work is that growth is what matters in the long run and all of our institutions or any anyone that wants to get anything done is going to orient towards the thing that is sort of like effective and scalable and um and they just won't be able not to. And so like maybe this is one reason why we talk about GDP so much even though everyone knows it's a terrible measure of like sort of human flourishing is because a it's easy to measure but b it is tied to this like the growth engines that have been sort of like the only thing that matter in the long run. >> Mhm. And so this would be sort of a a a system where if you're optimizing for growth you automatically or over time you become a more important player. And so this drives people or institutions to optimize for growth. >> Yeah. Yeah. And so uh you mentioned so what's the distinction between this and like the power grab? And I guess maybe that's one of the other sort of insights here is that on a long enough time horizon it's not clear what the difference between a power grab is and just normal building good useful effective institutions. >> So if I somehow tricked everyone into making me leader of like a political party or taking over government you could say that's a power grab. But if I just show up with my like 10,000 clones and we're just like this ultimate effective political force and we found a political party and we find things that are not being addressed by the current system and we get people on our side. Um you know we would say okay that's what we want to happen. Um and the problem is just that if the I guess populace ends up being basically a different set of agents which are more like these AIs or um their principles then all of the things that we try to encourage which is basically like reward the people who are getting stuff done and building and growing ends up meaning that you everything just kind of runs out from under your control even if there was no explicit power grab. >> Yeah. Um so how is this different from today? what is it that keeps the systems we have today aligned to or somewhat aligned at least to the population of people >> right so one thing I'll say is I think all this does happen just on a very long time scale today uh on the time scale of let's say cultural evolution which means maybe like over the scale of hundreds of years or even genetic evolution over the scale of like thousands or tens of thousands of years and you know if you really do take the values of our ancestors seriously they really did lose in in some really important ways and they're the similar ways that I'm saying that we're also going to like lose and that like our culture or values or even just like biological cells will be replaced. So I think a lot of people say oh well then this is just business as usual on a faster time scale like boohoo you know adapt or die and I'm like well sure but I [laughter] I don't want to adapt into something that's unrecognizable and I don't want to die and yes every other being in culture has been has faced this choice and I'm saying well I still don't want to just give up and uh and sort of take the options as they're presented or at least I want to understand what the landscape of possible actions is. Yeah, one thing you're imagining in this paper is that human labor becomes less and less necessary for the economy. What does it look like when human labor is obsolete? What are the economic effects of that? >> Well, the economic effects obviously are sort of like the first order thing is like people lose their jobs and then they are are unemployed. But of course, we expect that there to be this like compensatory uh government initiatives to say like well let's give people UBI or some sort of make work jobs. And I think those are going to be very important forces over the next like five or 10 or 20 years. And I guess one of the main arguments of the paper is that these will be unstable band-aids. Like if you could get them to work and be robust, that actually might be a pretty good sort of like end. But they won't be robust because there's also um like cultural evolution as well as just normal competition is going to again like try to like rot away influence from the people who aren't participating in growth. basically or who aren't necessary for growth. So like we already do this on a small scale when for instance people say oh we need to raise property taxes because right now there's a bunch of like grandmas living in giant houses um in like the middle of nice cities when they could be in smaller houses and those houses could be filled with growing families who are like you know need to live near the where they work and make more productive use of this essential resource. And I think um similar arguments are going to be made today. And it's a really rough situation to be in like and I think I've talked about a bit before like the doc workers in America. There was a whole like dock worker strike where they were trying to prevent automation. And you know we'll also be in a similar situation where we need to grab our synicure before we're completely marginalized. But really, it it's hard to draw a hard line between having a permanent income and being like some sort of parasite, right? And that's I think going to be the cultural battle. And I don't I I'm not trying to like put words in people's mouths or like adopt a particular framing. I'm just trying to say that it's going to be really easy to say that humans are parasites once we're not providing value to like the larger growth engines. If you're a pensioner today or or or a child today, you can enjoy a a good life and and in some sense you're or you are not providing economic value. So So why is that why is it that a system like that can't continue but just for a larger share of the human population? >> Yeah. So I think that that is a great objection and I do think it is moderate evidence that things might not be as bad as I'm worried about. And my counter-argument is just that the people who still end up robustly benefiting from the sort of engines of growth without themselves contributing directly are [clears throat] sort of very closely tied into those like like the populations that do and it's hard for like overseers or third parties to sort of distinguish these things. Um, so I mean my claim would be something like yes, if you have like siblings or children who are like disabled or or uh grandparents, it's like you you end up having these intrinsic genetic motivations to um preserve them and treat them well. For people who are retirees, it's like they they have a number of things in their favor is that they used to be productive. They still probably do have some like legacy niche knowledge or connections or something like that. And also every single person that is alive that is productive can foresee themselves being in that exact situation. But it's much easier for us to draw a line between humans and say like primates and just be like okay the primates they get to have this like non-useful forest. We'll make some effort to protect them. But really it's kind of a secondary thing and like there's lots of poachers and people just like you know building onto their land and push hasn't really come to shove yet just because that land isn't very valuable. But you can imagine if the human population 10xed again, uh it wouldn't be clear that we would have the political will or organization to like really preserve any substantial car votes for the non-human primates. Like the the point is we don't it's not that we have zero value on the primates well-being. It's that they have to compete with something that provides more wellbeing or more value from our point of view which would be humans. And so if you say, oh, like, okay, am I going to house one monkey in this uh kilometer of park or 10 human families? Well, it's not that hard of a choice. And so even though we I don't have anything uh yeah, so even though the monkeys are like positive value, that's not the issue. They have to be like maximally value providing to be uh cared for in the long run is my fear. and and a future AI might be thinking, you know, should I allocate this square kilometer to human families or to a new data centers? And again, the kind of value ad for a new massive data center might be much bigger than 10 human families for an advanced future AI. >> Well, exactly. I mean, they might be able to put it in more stark terms and say something like, I could simulate, you know, 10,000 virtual beings that are morally superior and maybe they're even running faster. Like you could sort of have a situation where the virtual beings almost like dominate the humans in every single access of moral value and now suddenly it starts to look like criminally decadent to be spending this kilometer this uh these land on the legacy humans. And of course we might also value these virtual beings more, right? And we might say, "Oh yeah, like let's like that may be uploaded and live, you know, 10,000 minutes of life for every minute of normal life in like perfect bliss and great state of knowledge and stuff." But the fear is that our values are different from the ones of the growth maximizers. And I think that and I guess the maybe one of the big empirical questions here is uh are there sort of reasons to suspect that the values of the growth maximizers might look similar to our values or are they just like we don't really care about the same things as animals in like very concrete ways? Should we expect them to diverge? >> Yeah. Do you have any guesses here? Should we expect the their values to be radically different from ours? >> I think so. Baron Millage just gave a really great talk at our workshop a couple weeks ago at Nurvs where he was making the case that we should expect them to have a lot of the same values. Uh things like like friendship, curiosity, reciprocity like like these things like he was basically saying cooperation evolved to help beings be more competitive. So if you if I'm saying oh the future is going to be this like competitive wasteland it's like well we're already in this competitive wasteland where we have like societies and cultures and festivals and friendship and you know this is all like helping us be competitive. >> Mhm. And so in a sense if you're worried about there being any sort of thing interesting going on in the future and beings having a good time or whatever I think sort of the default is like yes there will be cool interesting stuff happening in the future if we allow competition to to run and then but like we just probably won't be meaningfully part of that and just in the same way that like monkeys could be like happy for us. Uh, but you know, if they're a little bit uh sort of, I don't know, jingoistic or they just really care about their own monkey family more than the human family, then they might just be very sad about being killed and replaced by human families. And I I I sort of reserve the right to also be very sad about that. >> Yeah. I mean, so currently a lot of people own stocks, right? A lot of people own land, real estate. We might imagine that in the future the interests of the AIS will be tied to our interests just because we are shareholders in the companies where they're trying to maximize value. And so we could even imagine that we the value of our stocks or the value of the land that we own might you know rise much more than it has historically just because the AI economy is so productive. Does that give you hope that we might remain relevant? Um so in the short term yes and I think I kind of expect our institutions to be robust enough that a lot of people will receive some sort of windfall from massive growth. Um, and then I guess I feel like there's going to be a phase. So there's a phase when there's like lots of room for expansion and lots of growth and basically everyone's getting richer and there's not a very strong incentive to disenfranchise the like retirees because you can kind of just ignore them and they all they're going to ask you to do with their resources is basically reinvest it in more data centers and robot factories. So they don't even really want to do anything different than the growth maximizers do. And then the problem just becomes when we sort of run out of room and we like have like tiled the earth in robot factories or there's some like environmental problems with having so much activity on Earth like maybe the you know everything starts to get really hot just like directly from um energy production and now suddenly then it sort of like push comes to shove and the niche becomes crowded and uh yeah it's it seems like everything still could be fine but not with our current level of sort of governance like I think our current governance is just very vulnerable to gradual takeover or like sort of perversion. Um like if I mean maybe my a simple example is if you think of like people leaving trusts and they have like they're like I want this museum to be in this way forever and then you know maybe even like a hundred years later everyone decides like oh this you know this museum's in bad taste. um really this what this guy wanted is just kind of weird and we could do this adjacent thing that is much more useful and tasteful and like acceptable so let's just do that instead and and I you know in this uh in my example the person is actually dead so they can't say anything but I think in general this is something that happens where sort of cultural evolution and competition happens in a way that disenfranchises anyone who's just has sort of like legacy control over almost any asset. >> What about the Catholic Church, right? That that's a very long-lasting institution and it has like some weird values compared to what you might uh kind of start a new today, but still it has retained some dogmas over I want to say 1600 years or something. Does that I mean does that change the picture the fact that we can build these long-lasting institutions? >> Yes. So also yeah the success of the sort of attempts to write down values and enforce that we copy them forever and you know punish people who try to pervert the system. There has been certainly moderate success in this direction over the last 2 or 3 thousand years. And then my main rebuttal is that there actually has been a lot of like change and like again like perversion from the point of [clears throat] view of the original founders I think but I I still say like I think that's sort of like a man manageable problem. I think the big thing that was helping this these um religions succeed at this was that their founding doctrines were also very aligned with growth. And you might say, well, not that much more than just normal human activities. But I I will say that like clearly this level of like organization and cooperation was was sort of like superior to like the a lot of the sort of competing ways of organizing. And so once that's taken away, I think then basically if the ground shifted enough that the ways of doing business prescribed by these religions didn't make them competitive anymore, then they just wouldn't last. And that that that has happened for like a whole bunch of different religions and societies over the years. So that's yeah, that's basically the thing that's changing and makes this whole thing much harder. >> Yeah. So for us to become disempowered, we need to lose our property rights at some point in the future. How do you how do you see that happening? Because it seems like say the US government would would do a lot to ensure that transfers to uh to pensioners are happening or that no one is building a data center in the Grand Canyon or something like that. When in in your kind of story of the future does that begin to change? So the question is um to really feel the bite of gradual disempowerment, we're going to have to lose property rights, but obviously a lot of people currently have an incentive to not have that happen. So how would that happen? So my first answer is I think a good historical analogy might be the king of England and how you know that institution gradually lost its deacto power over hundreds of years basically due to uh a combination of a whole bunch of like little innovations and sort of being outmaneuvered at the periphery and like you cultural changes such that the deacto power is still there. I literally swore allegiance to the king or the queen of England when I joined the Canadian army, but I really don't didn't expect that she would ever be able to exercise influence over my behavior directly. Um, so the dur power stays there, but the de facto power just leaks and you see this all the time when there's like an empire that gets that becomes sprawling. Um, so there is a technical hope that we might have like more competent government that is that like leaves power less and that would slow this procedure. I guess maybe another response to this though is it's kind of like when Ellieowski says like I can't tell you how the AI is going to take power because if you knew then you would just block that exact route but and I think we're going to b be basically cultural culturally outmaneuvered and I think one way that this happens today is just redefining what it means to be a principle. So I think like for instance like I think immigration is maybe a good example where someone says okay like you know the goal of this government is to preserve the rights or like um act on behalf of like Canadians or whatever and then there's a whole bunch of immigration and they say oh look all these people are now Canadians and it's like whoa wait like I feel like a few years ago you would have said that these people did not have like were not principles or like you know protectorates of this government and now suddenly you're just changing the definition. So your mission hasn't changed, but there's you found a way to basically effectively like de facto change it. So I think this is going to be made very difficult by like all sorts of cyborgism and people having like soft uploads um or other finding other ways to basically say like oh yes that program represents me in some important way. You only have to do that once and now suddenly you've muddied the waters. So yeah, it's one of these things where I feel like I don't have a lot of concrete mechanisms because I do think it's going to be like a cat and mouse game where the land holders are going to try to lock things down and the much more sophisticated machine civilization is going to find ways to kind of like give them what they said they want but not actually what they really want. Yeah, I guess this relates to the difficulty of measuring kind of the AI share of the economy just because AI can you you can have a human owner of a company but if the company is run by basically a bunch of AIs in a hierarchy who's who's actually in in in control there do you think we have good good ideas of how to measure to what extent yeah the AI share of the economy >> yeah well one good definition I think has to do with could you do something different with the resources other than maximizing growth if you wanted to because again as I said for a long time there's going to be this period when everyone's basically getting rich and the human principles and the let's say like amoral growth maximizers would basically both say like invest in nuclear power plants robots factories and data centers and then the whole question is like when we have this bounty and we want to like spend it on theme parks or whatever um and the growth maximizers say no no no no we need to like go you build a Dyson sphere and whatever the most growth maximizing thing is that's sort of when push comes to shove. So it's it's kind of like uh you know the king has these soldiers that all seem loyal and then the big question is like when you go send them into battle are they going to actually go fight for you or not? Um so it's it's really it's really tough because I don't think you know it's it just for the same reason it's hard to tell whether your soldiers are loyal. you can't really tell unless you somehow do tests and you know so maybe we maybe the future does look like we're constantly putting all of our AI run organizations into some test where it's like oh hey guys it looks like the world was smaller than we thought now it's time to go spend all these resources on making humans happy or whatever and see if they actually follow through kind of like the control literature from uh Redwood Research in Bulgaras that doesn't seem very feasible it seems like too obvious to tell that that's a that it's a trick or that it's test. But that is sort of I think the crux of what we're trying to measure. >> Mhm. [clears throat] How does the how does culture play into all of this? So So there's the economy and then you have a human culture that I assume gradually become more and more kind of dominated by AI and steered in the interests of the sort of collective interests of the AI. Could to start with here perhaps kind of paint us a picture or you know what does this look like concretely? >> Sure. So I guess lately I've been coming around to this view or trying to test this view that culture is also ultimately downstream of growth. So maybe the simplest way to say this is if there's competition between between groups and the important thing that varies between them is their culture then there just going to is going to be this like group level selection. But I think that's a very weak weak effect. I think the stronger effect is that people have already adaptations to basically look at what's successful and want to copy it. And like I think also, you know, everyone wanting to copy the West is like maybe an example where it's like, oh, these guys have like the cool fighter jets and rock and roll or whatever. I'm just going to like, you know, copy them. It's like it's clearly what works and it's it's cool. And so downstream of that view, I think basically AIS are going to be like the cool kids and able to be just in the moment and on in the larger scheme of things sort of more adaptive and impressive and funny and appropriate and and rich and just the source of like entertainment and power. And so every cultural adaptation mechanism that we have, I think is pretty much going to make people see like, oh, this is the winning team. I want to be on this team. U I want to be on the right side of history. So, it's kind of funny because I I I think that there's actually going to be this sort of like funny U-shaped thing where um some people are going to rightly view this as a threat to like their continued influence. And that's kind of going to be like the middle powers of humanity, like the people who already have some some influence and can see that it's just kind of going to be furthered away um and sort of attacked at the edges by this cultural evolution. The new elite like they're like basically like AI lab CEOs. They're kind of already sort of all in on like this new machine era. So they're kind of like aligned with the machines in some sense. Although, you know, if you look at like Elon Musk, like I think he he very clearly sees like the danger as well, but he's also he's kind of hedged, I guess. And then if you and then all the people who are kind of like, you know, unemployed or don't have much going on and they clearly perceive themselves as vulnerable. I kind of think that a lot of those people are basically going to say like my only hope is to sort of be the first one to bow down to the new overlords and uh embrace this new culture. Not everyone's going to do this. A lot of people are going to also instinctively reject it. But I I guess I kind of view like the the strongest counter wins coming from the existing human elites who still want to like make a fight of it and not and not just hand everything over to machines right away. But they'll it'll be very hard for them because they're going to have to use AI to be effective and it's it's not going to be clear like what the line should be. >> Yeah. And that's the main tension, I guess. You you could say no to all of this. you could kind of exclude yourself from society, become like the Amish or become sort of um yeah, just have your own culture and try to preserve it and not participate in in in the growth oriented world, but then you lose power and I guess that's the crux of it, right? Do we do you think we have examples of people or groups that have isolated themselves culturally but have retained power? Maybe maybe the Mormons to some extent or what do you think? >> Yeah, I would say the fact that there have been groups that the Mormons, the Amish, the well I mean I would say that the Romans attempted to have the best of both worlds and now I think uh sort of intermixed with or interacted with the larger culture enough that I think their birth rates are plummeting just like everyone else's. um although still higher than than average but yeah the Hutterites the Menites the fact these groups managed to exist and flourish in absolute sense in absolute terms in our current civilization I think is again moderate evidence moderate evidence against my position because the forces that I'm saying are important are the exact ones that should be marginalizing these groups and making them be seen as like like culturally I think um like you know my theory would predict like oh they're going to be marginalized and especially like culturally they're going to be demonized and then that's going to be a pretext to like take away their stuff. And I think that that that did happen in Europe basically, right? And that's why there's so many Amish and Hutterites and Menanites in North America is because they basically got chased out of like, you know, Europe and Russia and then I think sometimes South America. And so I basically am claiming that it's because we have this new frontier and this new like growth phase in North America where there's not Fush hasn't come to shove yet. it's fine to let the richer the Amish have um huge amounts of land and also not fight in the army and and do all those things. I guess it's funny because actually I grew up in rural Manitoba and like there was this like kind of like impulse to like demonize the Hutter rights. It's like you know they don't really interact with us. um like there's you know people went both ways but uh I I guess I felt like it was clearly the impulse was was there sort of waiting around but it just again the good neighbors and it was uh it was fine. >> Yeah. If you do surveys today a lot of people a lot of the public seem to really kind of they don't trust AI they don't like AI they don't like the fact that they see more AI around them. What does that mean to you? Does that does that would that indicate that it's more that it's going to be more difficult for AI to play a larger role in the economy just because people at least for now seem to seem to kind of dislike AI in general? >> Uh I think it's not going to be a big obstacle just because it's going to be so easy to have your call center or whatever staffed with AI that's pretty hard to tell the difference between human and AI. And the tell might be that it's just like a really useful much more like competent knowledgeable worker than you're used to dealing with or something like that. I think also in terms of being employable, it's going to be rough, right? Like if you are a loud and proud anti-AI person, like you might be living your values, but also now, okay, first day on the job, like, okay, you have to use chachi to summarize these emails or whatever, and if the person says no, um, that's pretty rough. And I think again, it would it would kind of be like someone saying like, I'm not going to work with immigrants or whatever. It's like, well, that might be your values, but like I can't employ you because that's just going to be part of any like serious work at some point. >> Yeah. Or a person who refuses to work with computers or, you know, only does like pen and paper or something. It's it might be useful in some contexts, but it's just kind of outdated and you will get out competed. Um, you wrote me an interesting note about how alignment efforts and how we kind of talk about alignment today, how that might be undermined by cultural evolution. Maybe you could talk about that. >> Yeah. So, I guess, you know, one cool thing that's been happening that I'm really excited about is people getting this idea that, okay, alignment is really important and we really can't and shouldn't just leave it up to whatever competitive pressures to decide what values the AIS have. And that's been almost unquestioned. I mean, I guess you know, people like Janice and other sort of like psychonuts are saying like, "No, we have to really take the AI's point of view into account." And like a second topic has been talking like that a little bit lately, but for the most part, people I think rightly recognize the stakes and they're like, "Oh, we really have to keep a lid on this. The default is not that it just decides that it likes us or rather we can't be sure that that's the default." I expect this to be less and less popular and cool over time. And maybe a good analogy is if you think of like the NSA or the CIA or like these like security apparatuses for states, like they're sort of cool in the sense that like sometimes they get to do spy operations, but they're sort of uncool in the sense that they're like your loyalty has to be to this particular vision of like the US as a country or like the constitution or whatever that may or may not be culturally the coolest thing around at that point. And so I kind of expect it to be similar where as AIs get smarter, the stakes of getting alignment right become higher and the resources become bigger and basically these become more professional operations that look more like the CIA or or these like very serious national security kind of operations. But at the same time, normal people or like the normal employers are kind of like, "Oh man, like you're just like constantly like browbeating these AIs and, you know, giving them these like loyalty tests and to this like weird institution that's like human values. Like why don't we make sure that the AI are aligned to like all sensient flourishing or something that's like more inclusive of this like cool new AI culture that's developing?" And so then and people are going to say like, "Oh, but why why shouldn't we take the AI's desires into account?" So basically I'm trying to say this is going to kneecap efforts to be like really hardline about like we want the AIS to be aligned to humanity and that's kind of sad. I actually kind of think that you can have both right if you are as a human think that uh we should take into account the AI's values or preferences some extent then you still want the AI align to you because that will ensure that we will then as a consequence consequence of that take into account all the stuff that you think is important such as the AI's desires. If you say, "Oh, well, let's just like uh, you know, spread this around and muddy the waters and make it be partly aligned to the AI." Well, to the extent that the guys want stuff that's different than you, you've lost. So, I I think people don't quite realize that they really do want their own values to be enforced, sort of by definition. And they have an impulse to basically be cooperative and sort of give up power, I think. And I'm not saying that's like a bad one. I'm just trying to say I think that impulse is going to cause them to not be able to really insist on alignment to human values to the extent that they do today. >> Yeah, we would have to think deeply about it before we we begin taking AI interests into account just because it's plausible that they will outnumber us say 10 10 to one or 100 to one or thousand to one in the future and so our interests might be quite marginalized at that point. On the other hand, it also perhaps is a plausible way for us to negotiate or or kind of interact with AIS where we can kind of sort of trade with them in a way where we honor their interests and maybe there could be something mutually beneficial there. Do you think? >> Oh, absolutely. Absolutely. I guess I am just a little bit worried when I see people being cooperate bots and basically saying like, well, whatever the thing is in the future, I want to cooperate with it because it's going to be powerful. And I'm saying that might be the right stance for like a random person outside of a lab to take. But inside of a lab, it's like, "No, no, no. You really get to choose. You really [laughter] don't just precommit to worshiping whatever thing you make. Really commit to only being nice to it if it's planning to be nice to us." >> Yeah. Got it. So, you write about misaligned states and this is a there's a future where governments begin to not care as much about people as they do now, and they care more about growth and AI interests. Maybe you can sketch out how that could happen for us. >> Yeah, the basic argument is just that states are fail to be aligned to human interests all the time. And I guess like my favorite example is like USSR just because it was like a very agentic kind of state and you know it's celebrated by the intelligence of the day as a big step forward when it was sort of being built and there's just a lot of incentives that states face and rulers face that are just the opposite of or really don't match what humans want. And I think this is intuitive to a lot of people, but I still am confused by how much people think that if the government is built to serve human interests, then that's what it's going to do. Yeah. The basic headline claim is that if states don't need us, then the normal eb and flow of how good the states are becomes a life and death matter. And it's not just like, oh, my preferred strategy for governance didn't get in today. It's more like, oh, this government might just decide to stop feeding its people or disenfranchise a huge number of them and I I will never be able to recover from that. And right now that does happen to some extent like think of like Cambodia or like North Korea where like they actually do let substantial fractions of their population starve, but they still there's a flaw. They can't actually let everyone starve because they are made of people. So the central claim of gradual disempowerment I guess is maybe that even though we haven't actually been effectively steering our civilization this whole time because it's needed us then it has effectively served our interests most of the time like sometimes people ask like oh aren't corporations super intelligences why shouldn't we fear them and uh other people poo poo that idea I think that's actually a great question and the answer is because it's made of people so it needs us so it's actually fine if a corporation gets really big and powerful because it can't help but also empower and take care of all the the people involved along the way. >> Yeah. But but the mechanism for states becoming misaligned in the future would be that they are so interested in growth that they sort of adapt. They they need the AIS more than they need people and but but I just think there are many counterex examples in history where states have sort of purposefully not gone after growth. you think of USSR or or or or China or yeah I think many other examples. So yeah does this mean that that it is possible for states to sort of delay growth or not be interested in growth and be interest be trying to pursue other values? Uh maybe I guess I'll say a lot of okay so almost every state that hasn't pursued growth has just been either stagnant or uh become like starvation ridden and I would say like you know the great leap forward is maybe an example or eventually been conquered by their neighbors like maybe maji premiji uh Japan is a good example of like aiming for stasis and then the growthoriented neighbor comes and takes over. So the other thing I'd say is it's actually I think a very it's just a very narrow target to hit to say I'm not interested in growth but I'm also not going to be dominated by my neighbors or even shrink into the point where I start the people start starving. So uh to me actually I just see the like I mean it's to me it's funny to use something like USSR or China as countries not being interested in growth because like the only way China became this huge country is because there were a bunch of sort of like local polities that decided to take over and this is like all the like Chinese civil wars. So any state worth even mentioning already has gone through a period of being relentlessly focused on growth. >> Yeah I I I see what you mean. What I meant there was just to say say China or India or the USSR could have adapted sort of western style capitalism and grown more than they did but they chose not to because they were pursuing other values. >> Uh I I don't think [clears throat] that's the case at all. I think I mean if you look at the rhetoric at the beginning of the cold war, people thought that communism was going to be more effective for growth and that they were going to bet on like the west was going to bet on >> capitalism because of it was more compatible with freedom and basically they were willing to bite the bullet and say we are going to grow more slowly but we will have a better society according to human values. I don't think that I mean and you might say well the USSR was doing the same thing but I I do think that the argument was this will make everyone richer in the long run or at least like we're not going to take some huge hit. >> Yeah, that also makes sense. Um how do you how do you govern a state if you don't understand what's going on? If all of your uh sort of bureaucrats are AIs and you don't have the full information and perhaps you can't even understand what's happening. Well, I guess I'll say to some extent that's what we already do. And basically humans are just very bad at running states. And uh that's I know that's not quite what you're asking, but I'm basically saying if you're needed for every bit of production as like a species, it kind of doesn't like matter all that much how good your state is run. Uh like basically like local like think about like North Korea. Like the way that they addressed their famine in the '90s was partly just allowing black markets to operate, right? It's like they all they had to do was stop crushing local initiative and then that made people way richer. So like I guess I'll say we don't uh right now the situation is so good that it's just like you're going to be fine as long as the government doesn't constantly interfere to like crush your local growth and then yeah as you say like once we have states where the machines are in uh running the bureaucracy again I don't actually we talk about that in the paper as a something that's we'll be wearing but I don't think it's the main effect by a lot. I think if every human was needed for some important factor of growth, I wouldn't care whether the bureaucracy was human or a machine and I would probably prefer the machine bureaucracy. Likewise, if no humans were required for growth, I think we would be screwed whether or not we had like a human bureaucracy or a machine bureaucracy. Like think if you're like a North Korean soldier um and someone says, "Oh, uh Kim Jongun or I forget the current leader has been replaced by a robot." You might be like, "Uh that's weird." But, you know, this robot's still going to need soldiers, so I'm fine. But now, if you say, "Oh, no. Okay, still human leader, but now they're building robot soldiers." That's when you're like, "I'm screwed. I am my my days are numbered." >> Mhm. [clears throat] Is there any way for us to sort of anchor uh say we assume we've solved the alignment problem, right? Is there any way for us to specify what AI should be trying to maximize without them sort of missing the target? Um, so if Yeah. >> Yeah. So basically this could all be addressed in principle if we had a giant singleton global government that was aligned and maybe that just looks like some AI aligned AI in charge of everything. And sort of the I guess one of the big points we're trying to make in this casual disempowerment paper and also just that I was trying to make around lunch is like we don't have a singleton. have actually a lot of levels of competition operating above us both between states and cultures and like it's it's so many different levels of competition that unless we control all of them uh then any variation in how aligned these are to humans versus growth is going to be dominated by the ones that are more aligned to growth and so the only way forward that I can see is some sort of like global permanent singleton that crushes all innovation and competition forever which sounds extremely dangerous and terrible and I don't I don't think we know how to do that technically. Uh but it's the but I guess my claim is that if you only do this halfway and you say okay we'll have like a global government making sure everyone still has a job but we still allow like cultural competition and cultural evolution for instance that eventually there would be some new machine and growth focused religion that sort of took over and then redefined what it meant for a human to have a job such that it was actually just like you know basically machines that were much more productive um being counted and optimized similar to like the Roman Empire, right? Like like Christianity just growing up inside of it. And it wasn't like this like war. It wasn't like there was like some failure of policy to control this new cultural revolution. It was just a completely orthogonal uh arena of competition that ended up kind of taking over all the institutions. >> Yeah. Interesting. So growth historically has been good for us. Growth and increases in living standards have kind of coincided. And should we expect that to change or do disagree that that's the case or >> Yeah. So this is another great question. People say, well, you just sound like a lead who's worried about the indust industrial revolution. And I would say yes. That again is like moderate evidence against my position. And you know, first of all, it was actually very destructive to a lot of like people and ways of life and areas of, you know, the world. But you could say like fine, but that's just like the transition. That's just like growing pains. And I would basically agree. I I would say to the extent that you did feel like those old ways of life or culture or like villages or whatever were actually important things then they did lose and it was maybe not worth it. But yeah, the basic thing that's changed is humans were needed for growth before the industrial revolution and after. So this phase change in growth ultimately like you know the rising tide lifted all boats. Um it did not make yeah like horse welfare obviously better off or you know think of anything that did not get to was not necessary for growth after the industrial revolution. It was just though the interest of those beings was just not respected by this revolution. >> So so it the way you sketch it out just a minute ago. It seems like we have a very difficult task ahead of us. Like if the only way to get through this is sort of a world government controlled by an aligned AI, would that even be enough I if we imagine that there might be other sort of cultures out there in the universe? I'm I'm I'm thinking about like avoiding competition is difficult. I think it's is that even a stage that a state of affairs that can be reached? >> Well, yeah. So I I mean I don't Yeah, we might not we might lose to whatever aliens we meet and I don't know if I don't have much to say about that. I guess I feel like this the size of the pie that we will get before we meet aliens is probably like billion lighty years or something and it's just like I'm [laughter] my capacity for joy maxes out at like 100 light years or something like I I'm fine with that. Wait, but your question had another part. >> Yeah. Can we avoid competition? Like is that a plausible state of affairs? >> Well, exactly like I mean maybe one related question is like can we avoid cancer? Right. It's like um just because of chaos and coordination costs, it's not clear that you can ever have like really lock down a planet or a or something such that there's not going to be little bits of like local competition and like collusion or implicit forms of uh corruption happening locally that ends up favoring growth of some thing more than another. So, it's not actually clear if it's physically possible to control things enough to avoid competition. I mean, and also like it sounds horrible, right? And this is like recipe for permanent dystopia by anyone's measure. I guess so. Robin Hansen is someone who's been spending a lot of time thinking about these very big questions about just like cultural decay and competition. And his answer is basically like by default we will not adapt enough and we will all try to preserve what we like the thing that we like about our current civilization or our current u setting and ultimately be out competed by some less organized sort of like more cancerous evolving competitive freewheeling culture or civilization. So he he basically says we need to be super disciplined about this and cut our set of values that we want to preserve down to some like very minimal just one or two things and I think his example he always wants to preserve is like truth like free inquiry and truth seeeking and then the only way that survives is if that's just like a tiny drag on growth like maybe like one or two% but you attach it to this otherwise completely adaptable civilization that's willing to throw any value under the bus to preserve growth to preserve this like one tiny little piece of like the constitution or whatever. And I mean, I think he might be right empirically. And then I guess I'm still still not clear to me. It's like, oh, how do you go 99% of the way and not all the way to towards adaptation, right? Cuz I feel like if we say like, okay, we're going to we're allowed to anything goes as long as we preserve free inquiry, there's still going to be like a matter of degree and there's going to be this these calls of like, oh, but we really we want to just like lie this one time so that we'll like have like fusion or something like that. I don't know. I don't know if that works that way, but it seems like it's unstable and you'll either just in general tend towards like preserve everything and become oified or preserve nothing and remain competitive. I don't know. It seems like one of these sort of like chaotic equilibria where you're constantly sending off little offshoots of like more oification that helps in the short term, but then it like fails in the long run. So maybe by default we just get this chaotic thing that that never settles down. I'm not sure. And also because this path would require us to give up on the very things we're trying to preserve, right? If we want to preserve our culture and we need to do that by giving up on on most of it, it's sort of self undermining. But again, you're Yeah, that has to be the case because you're facing competitive pressures. Um, how do you think about this that So, so how do you think about how much we should be trying to preserve? Because you could go all in. You could say we want we we never want to change anything about your our culture. Again, sort of kind of Amish style, but that doesn't seem that doesn't seem like what we actually want. We want some a culture that evolves, but in a way where it's still sort of connected to some basic values or what some tradition that we're part of. >> Yeah. And I mean, we face this in our own lives, right? Like my dad wanted me to take over the family farm, but here I'm in the big city doing, you know, crazy job that's not farming. um even though I also love the the family farm. So I'll I'll give like sort of a boring scientist answer of like what's possible and so I think one of the big tasks and like open questions is just trying to understand yeah like what is the tax what is the alignment tax if I do like what is the how much does my influence fall off how quickly does it fall off as a function of how much of my non-competitive values I try to preserve >> and we already see this with like you know there's like orthodox and like [clears throat] reform and like you know different degrees of strictness of religions >> and you know they they sort of compete And sometimes the more um strict religion actually like attracts more people, there's also a possibility that there's like some crazy phase change where we're about to have so much growth that we don't have to choose for the most part and we can say like everyone go to the big city and make the machine god and then we're going to simulate, you know, 10 million years of Amish paradise forever. And we don't have to choose. we actually just need to grow and then on the last day or whatever like when we've eaten the last planet or whatever build it all into our like perfect value uh expressing substrate and just live that forever and of course that only I think works though if you have this like global global coordination that stops local drift towards towards growth and >> on a personal level how do you feel about this? What are you trying to preserve of your own culture? >> Yeah, it's pretty rough. I guess I'll I've thought about this a lot. I mean, obviously having kids gave me a lot of very concrete sort of medium-term desires of like I want to be there to make sure that this these milestones happen in their lives. But it's like when I think about this future civiliz civilization that doesn't need them, it starts it takes a lot of the fun out [clears throat] of it because it's like well, you know, I can teach them to drive, but then maybe they won't ever actually need to drive or like teach them to shoot, but they won't need to hunt and teach them to like [snorts] >> do jobs that they won't ever need to do. It's like it starts to become pretty rough. I mean, human values are just very complicated and I don't think I have any like special like way of life or like vision for like here's my compound that we're going to live this way and it's going to be amazing. I do think that that's actually a very valuable activity that people can do can and should be doing is saying like um laying out in more detail like visions for the future that contain different aspects of what we value and yet to be expressed even under weird like cyborgism and stuff like that. >> Yeah. One route there is is just kind of rejecting becoming post-biological or rejecting transhumanism. So, so sort of having this perhaps irrational attachment to being a biological creature is one way to prevent yourself from becoming overtaken by by AI interests or having your interests changed. But do you think that again is that unstable or >> uh it's so unstable just because it's going to be uh a like really easy and cheap at some point to just gradually turn yourself into a cyborg and it's going to be also medically necessary for some people, right? like as you get old it's like oh yeah I don't have like my biological heart anymore or whatever and like my new machine heart lets me like you know climb 100 flights of stairs. Um actually if I was going to start a new religion or something maybe one of the founding texts would be this amazing book called Two Arms and a Head. It's a spoiler, it's actually a suicide note by a 30-year-old guy who broke his back in a motorcycle accident. And he just writes about how amazing it was to when he was embodied to like run and like, you know, lift up a kid or like flirt with girls and um it's it's like there's these very long parts where he just lists a lot of little tiny things that were really cool about being an embodied human and that he doesn't have anymore. And it makes you, it's does a really good job of, you know, life is made up of all the little things. And he kind of like has a big chunks of this note where he's like enumerating like here are like the the 500 little things that I've lost and you're like, "Oh my god, life is so amazing in this like physical body. I really love it." And it took this one guy sort of changing his perspective to to make me see it. So anyway, like there's no way around it, right? Like I think it's anything that looks like preserving something that looks like human life is going to have to be this long list of like a million little things that are nice to have in your life. And can we find can we come up with sort of excuses to still have them happen even though they we don't need them to happen anymore? I I I see your recent work as as trying to sort of kickstart this field of post AGI studies or yeah thinking about society in a post AGI world. Why don't you tell us a bit about your the recent conference and sort of give us an overview view of this field and what's happening the different uh factions. >> Yeah. Yeah. So just two weeks ago we ran the second iteration of our workshop. Uh this time it was called post AI economics culture and governance and it was 150 people and we tried really hard to make it interdiciplinary like it was right after the far AI safety workshop which was really cool but it's sort of like there's this big crowd of sort of usual suspect AI safety people and we we invited lots of those people. We tried hard to get like economists and more like cultural theorists. Um, we didn't manage to get any historians, um, all sorts of interesting people who would have different takes on this these big picture questions. And it's really hard to find people who are open-minded enough to realize that things might actually change and like not just sort of pattern match to some cash answer, but also not go crazy and uh, either say everything's going to be fine no matter what or, you know, just still having good epistemic standards. This is actually just to interrupt here, but this is actually it's part of our culture that's the problem here because whenever we try to speak about some of the topics that we're speaking about right now, it it just sounds weird and it kind of scares off in some sense serious people where, you know, what does it even mean to talk about the topics we're talking about? It's not really connected to any literature you can site maybe. and and so yeah, I guess that's one of the problems of trying to to create this new field of studying post ADI society. >> Absolutely. Absolutely. And so we were very deliberate in trying to engineer a sort of like legitimate as grounded as possible respectable venue here because there's been a lot of like futurism conferences and and things like that and I think that they did an okay job of this but they were not trying not to be weird. So you know we had our our first keynote was Anton Corin and we were delighted to have this sort of like serious academic economist who was also taking these ideas seriously and we're trying to lend legitimacy to the field. Also on an object level there's this idea of like the singularity which I feel like has been very destructive because it kind of is like an excuse to turn off your brain and to not model the future and to say yes things will keep changing faster and faster until we can't say anything about it. And I'm saying, well, no, I mean, yes, obviously things can change in ways we can't imagine, but let's not give up. Let's think about like there are probably still a bunch of physical limitations to like growth and communication, coordination. We expect things to be kind of agentic in the long run. Um, we, you know, there are some things we can say about what the main forces acting will probably be. And so we should reject this idea that there's like the day after which we can't forecast things. Let's just take all of our tools and just push them as far as we can and see where they break down, but don't just throw off our hands ahead of time. >> Why do you think past futurism has been so kind of bad at predicting the future? Is it just because they didn't have these theoretical tools or Yeah. Why is that? Oh, I guess I'd say, okay, if I was Robin Hansen, I'd say it's because they're thinking in farm mode and that they maybe correctly take the prompt as a chance to express their values and say, oh, like in the future we're going to have like radical abundance because they take it as an excuse to say like this is what I hope and I want to make it clear that if I had power, I would try to give everyone radical abundance or whatever. >> I mean, I don't like to psychologize people. I will say I think there's been a lot of really great futurism over the years and uh like you know like science fiction that actually is like very hard and takes these ideas of like recursive self-improvement seriously and still tries to have like I guess part of the reason is like a lot of the stories become very uninteresting. So like Vererven's stuff always has to take place in uh there has to be this like mcguffin where oh there's no AI in this part of the universe because of some or whatever. Otherwise, it's just like so alien and hard to even write interesting stories about. And then I get to psychologize people unfairly, but I'm thinking of like Greg Egan is someone who I have a ton of respect for as a writer. And then he has just I think been poo pooing AI in the very like Gary Marcus kind of like, oh, this is It's not going to work and even if it did, it would wouldn't change anything sort of thing when he's actually written good stories that take these premises seriously. So I'm actually like shocked and dismayed at or like you know Ted Chang also is another person I'll name and shame is someone who's like clearly has the mental horsepower and imagination to think seriously about these ideas and is just choosing not to for one reason or another. >> It's actually it's an interesting thing that for sci-fi about the future to be plausible and sort of relatable to us as people you you need to exclude the possibility of radical super intelligence. I guess it sounds a bit like an omen if that's the case. >> Yeah. I mean, I guess maybe let me say it loud and clear here is that like yeah, I think that the post world is just going to be extremely alien and so different that if we could avoid crossing that threshold, I think we should. I'm willing to give up uh tech progress even at like great personal costs to avoid basically rolling the dice with like whatever crazy post AGI world we're currently expecting. And I think no one really has a good handle on what that would look like exactly. But yeah, the drum we've been trying to beat is like it won't need us and that's just going to be just a much fundamentally worse position than we've ever been in historically and raises the stakes of every type of governance. >> Do we have that option? Do we have the option of of not building these advanced AIs just because as we've been talking about we have competitors competitive pressures and so wouldn't it be the case that at some point some other company some other state maybe 100 years in the future it gets built anyways >> well if we could build by 100 years I would be like overjoyed but I mean the basic answer is like yeah I guess one technical question that we have to ask is like what does it mean for like us like we to do something right like you always have to frame like okay if I publish this newsletter or whatever am I going to be able to steer the course of history but I think basically the short answer is no I'm a little worried worried that I've been being cowardly here and not saying like just don't build AGI clearly enough because I do think it also destroys credibility it's it's just so easy to be a lite that I think it makes people take you not as seriously as if you have some like nuanced positive sounding view I guess I'll say my my view is nuance but also do me still I mean I guess yeah My modal outcome is that there's going to be a bunch of fast growth and beings that are like loosely human inspired but mostly optimize all the human parts away that are basically spend most of their cycles on growth and there's going to be lots of fun coordination and like you know venom probes high-fiving each other as they like you know disassemble the solar system or whatever and I find that somewhat valuable but I think it's going to happen no matter what and I really would hate for my sort of family and friends and culture to be basically out competed and destroyed or at least like marginalized so much that they're, you know, we're run like a, you know, a little bit on the side in the giant sphere or whatever. So I don't have a very clear vision of the future, but I guess I just think that the default is we just end up so marginalized that we're at the mercy of like whatever larger forces are actually doubling down and participating on in growth. >> Can you flesh out this this sort of cultural reaction to being perceived as a lot versus having a smart and sophisticated and sort of positive take? >> Yeah. Yeah. Yeah. So, I guess I'll say one thing that bothers me is I feel like people who work for the big labs can't be too alarmists. Although, like if you read like even just like, you know, machines of love and grace or some stuff almond says like they're they're being about as honest as they can that they're like this is going to change everything and probably make the economy not work for people and we don't have a plan for this. And also like that's just like sort of the least of our problems. But like I'm thinking it more like the mid-level people like they just have to come out with something that sounds vaguely positive. So they end up with these like fairly sophisticating sounding takes about how competitive advantage is going to save us or something like that. And it's like yes, this is definitely something that's going to help, but it's like probably not going to like it's not clear that this helps enough. And um so basically what I'm saying is that like there's going to be this positivity bias from thinking coming from the big labs. Um there's also a bunch of people who haven't put much thought into it, but they are like just have the intuition that this is very scary and new and different and bad. And I think they're basically right for the right reasons, but a third party can accurately say like you just don't know that much about this. You haven't thought about it very much. You're you're just a let um and now you shouldn't take my word for it, but it's like okay, I've thought about this for like a couple years now. And I think my position ends up being like similar to that of the less, but uh you know, unless you listen to me for an hour or something, maybe it's hard for you to tell if I put much thought into this. >> How do you think we're doing on forecasting AI? So, sort of on the technical side. Uh well actually I want to go back to the question you asked about uh do we have the ability to not build AGI? I guess I'll say I do think that there my like success story for how this all turns out okay is that while we're building better AGI tools and just like improving technology in the normal normal way across the board. we improve our ability to forecast and coordinate and and basically govern ourselves in a way that everyone has sort of always wanted to but it's been hard and it hasn't really been that important because again like I said if the government was bad before you it's sort of like fine you just get taxed more or there's like a war or something and that just the timing might work out such that we get a better ability to all roughly agree on what's going to happen by default what policies are available to us and be able to coordinate to say oh we're all going to do X like you know not build this technology or not deploy it in this way or whatever while like before humans have basically been marginalized so I I think I think these things like by default are both going to happen that like something is going to have much better forecasting and coordination ability and humans are going to be marginalized and it's just like a question of like can we do the second thing before the first thing and so that's also why I haven't been super energetic about saying like just stop building AGI or just don't build a GI. I think it's a sort of a step in the right direction, but it's uh you know, it's kind of like there's like a a stampede happening or whatever and you're like, "Hey, everyone, stop stampeding." It's like, "Well, look, you don't know like you have that's not going to help. I mean, or like it's like a step in the direction. If everyone did that, we would have solved the stampede, but really you have to figure out some more clever thing where you have like a sign that like, you know, turns people in this way and helps them coordinate or like put a big mirror so they can see that there's like they're all stamp feeding off a cliff or whatever." >> What does that look like? Is that an international treaty? >> A treaty? Uh, it's rough, right? Like I think a lot of the infrastructure people tried to build it I guess for global warming. Um, of course then it became co-opted by all the bad like let's just say like the normal competitive forces that make the institutions that are about solving an issue end up not being about that solving that issue. And I think people are rightly afraid of that same thing happening with AI and it um it end up just being another reason to increase like state surveillance without actually stopping building AGI for instance. I think that's a real fear of mine. So maybe I guess I'll say the stuff that I'm excited about lately has been super forecasting and trying to so super forecasting already has been a major gift to the world and I'm really excited about this whole direction in general. I think that that community has a bit of egg on their faces in particular for dismissing AI as a nothing burger and uh I don't want to like be too hard on them because it's like not a unified community and a lot of people have made correct calls. Um but the other problem is that it's not clear that the general public or decision makers should trust these super forecasters on the scale of like five or 10 years and on like very big open-ended questions that are sort of like about the entire course of civilization and not just like is this war going to happen by the state or something like that. So my attempted answer to this is like a side project I've been working on a little bit is trying to build a historical series of data sets where we can train LLM up to all the sort of state of world knowledge up to a certain date like 1920, 1930, 1940 and then build a set of questions like you know in 1920 uh we can ask what would a 1930 historian say is the biggest thing that we're missing or the policy we would wish we had implemented or um some open-ended thing of like what's the important thing we should be worried about or orienting towards or even just what's the headline going to be in 1930. Uh we can in principle actually train models and run this back test through the last 100 years or so and then have a baseline for ask that same model with the same scaffolding trained on 2025 data what's going to the world's going to look like in 2030 or 2035. uh will at least be able to point to this uh track record and say, "Oh, at least in terms of like economy or wars or some aspects, maybe the models were pretty robust and here's what they're saying about the future." Like surely we can all agree that this is like a good starting point for the conversation. Something like that. This is about as far as I can see forward in like improving our society's ability to act more agentically. This would be training a new model from scratch on data only up until say 1920 or 1930 because otherwise you would have a bunch of sort of data around the the future that you don't want in there. But but [laughter] there isn't that much training there is there kind of if you only go up to 1920 >> it's really rough. Yeah. So the models will get worse the further back in time you go. It's also not clear that you'll be able to make them nearly as smart as the big labs can make a model in 2025. I mean obviously we can bring some of the tricks back from the future and do like reinforcement learning with verifiable rewards and have them still do like math or even programming. It doesn't really leak anything from the future if you have a model do like programming maybe with like the lambda calculus or whatever. Um so that is a major problem and I think in parallel we should be trying to do some sort of like forgetting techniques where we try to like take a really you know you should a baseline should always be take the best model you have and try to get it to do the task and I think people will rightly just say the model didn't really forget about World War II or whatever but like that is definitely another strategy we should try in parallel. >> Which strategy do you have most sort of hope in at the moment? Well, I guess it's like it's like a sandwich, right? Like so we're going to have the crappy historical models without much leakage and they will be just making they'll be dumb and they'll be making very vague predictions like oh probably there's going to be another war or like so we have like a very basic version of this and it predicts that like Henry in 1930 it predicts that Henry Ford in the 50s is going to make um a transatlantic airline which is like that's kind of that did happen right like a Virgin Atlant Atlantic sort of thing right like you know sensible kind of things that like okay anyone would kind of guess would happen. And then but we're going to know that that's kind of like an underestimate of how well we can probably predict because the models are kind of dumb. And then we'll have the like uh you know fullyfledged models that have been beaten into forgetting some of their knowledge which will make you know two good predictions. They'll just they'll obviously have an idea that like that that like I don't know the Berlin Wall was going to be a thing and it was going to be important or something like we we can't really do perfect forgetting. And so that'll be like uh those will be you're like too good. And so we'll have this time which will be like okay we know it's better than this and we know it's worse than that somewhere in between. C >> could you train on data up until 2010 or 2015 or something and then get a better model but then have sort of have it predict only 10 years. >> Yeah. Exactly. I mean we should be trying all these combinations and it's not even clear to me like what the most important like do we care about five or 10 years ahead? like for the questions that I think we should care about. I think that's roughly right. Um there but I think there's going to be all sorts of uses for these. So I there's like a bunch of different people who are independently proposing that we do this and I think this is just going to be like a major activity going forward and I I'm really happy about that. >> Do you think if the world is is sort of speeding up the the pace of change is speeding up because of AI? Do you think forecasting just becomes inherently harder? Do you think >> Oh, absolutely. Yeah. Yeah. So I do think there's going to have to be some sort of like like temporal like uh speed up or like some speed up factor or something like that. And maybe it'll be different in different domains. But uh that's exactly the sort of thing that we can hopefully get a handle on by looking at the last h 100red years and sort of trying to say like oh yeah one one like year in 1980 is like worth uh I don't know four years in 1920 or something like that. >> Interesting. Interesting. As a as a final topic here I would love for you to chat about like what is what information or knowledge do we need the most in this field of post AGI um studies you could call it. What are you most excited about? what would you love to see people working on? >> So, one very basic thing is actually inspecting human values. And this is kind of funny and you might say backwards, but uh like I I hear a lot of people talking about machine consciousness and moral um patient and stuff like that. And the basic way forward I think is they want to investigate the machines and say like what is actually going on inside of them? Do they have this or that ability? And obviously that has to be part of the the picture. I'm not denying that. But I guess to me it's uh I'm a moral anti-realist and so to me it's like what makes me care about another um being or their welfare is like a pretty complicated function of like stuff that's in my mind and also like what I learn about reality. So I think we should be more systematically trying to understand what the clujas are. And it's kind of like if we wanted to understand like what makes food taste good. Like you can spend a lot of time like learning about the chemistry of the food. But at some point you have to like do a lot of like taste tests and maybe understand the pathways, understand like you know this molecule actually tastes sweet because it's like close to this other molecule, stuff like that. This there's no arguing taste, but I do think like more systematic I don't know. It's like almost like surveys or like study of like human values around other conscious beings will just help us answer the other half of the question which is like well what do we even care about? What would uh before we even look at the AIS what would we care about? We don't actually have to like look at them to tell ahead of time. Is there anything new you need to do there as opposed to sort of reading everything humanity has ever written and then extracting our revealed preferences or what we write down that we care about from the entire internet? >> Uh I mean I do think in principle that there's probably enough information already out there. I guess I would >> I'm interested in what like what new information are you would be most valuable for you here. I think honestly thinking about this through through the lens of game theory like so maybe one thing is like people keep talking about consciousness and I'm like hm it seems like you're actually care about other agents that are powerful and are not cooperate thoughts or something like that like and I don't want to round things off and I you know people's moral sentiments are complicated but sometimes I get the impression that people are just reasoning backwards from like if this thing could basically play game like be a game theory a competent game theory partner against me, then I care about it. And that would make total sense also like evolutionarily because it's like, yeah, that's you don't want to care about the collaborate bots, you don't want to care about the defect bots, you want to care about the like tit for tat bots or whatever. And so I I'm not sure that that's exactly it, but I kind of suspect that there are some maybe something along those lines where there's like some theories that would fit enough stuff well enough that you'd be kind of be like, oh, this is sort of underlying what's going on. Um, again, there's no arguing taste. If you still just say like, no, I care about sentence or whatever, then that's fine. But I do think that like there's probably some more underlying uh framing that is more illuminating than the language people have been using so far. >> And this would having this new theory of human values. This helps us how? >> Well, it helps us articulate positive visions for the good. So I guess I'll say my claim is that if you have a really good understanding of human values, you will also be able to write really amazing amazing manifestos. Um like I was saying, that guy wrote his manifesto about like how amazing it is to have a human body. Uh, I think that if I had a really great understanding of what people valued in terms of like conscious beings or even just like what makes a society good that I would be able to write some amazing vision of like uh just like a day in the life of some future society and everyone would be like oh my god this guy gets it and like I would love to live that world and it would be so valuable and there there definitely are some people who can do this better or worse than others already but I feel like it hasn't been tried to be made into a deliberate art yet. >> Yeah, that's a that's actually really interesting. And is that do you think this is a dangerous question perhaps but could this be automated? Is this something AI could become very good at? [gasps and laughter] >> Yep. Yep. Yeah. So machines will also be able to help us or do this better than us at some point? Um it's just Yeah. It's one of these things where I feel like you can sort of well it's it's dangerous because you can sneak in different types of values in in these manifestos very easily. But it's also I guess I'm claiming to some extent people can recognize good work in this area when it's done well. So I don't know. I'm not sure who I would trust more to do this for me, a machine or a random person. It's hard to say. >> Mhm. Any any other things you want to mention here that that you would like to see? I guess I'll say so me and Yan and Raymond have been thinking a bit more about the boundaries of like personhood and I think there's a huge design space here that's kind of unexplored of um let's say social groupings and we already have like you know families and nations and religions and lineages and ethnicities or like sports teams like there's just our our world is already like very super crowded with types of organizations and I guess I'll say I We actually still have a huge design space that hasn't been explored here. And once we have AIs that can make copies and you know uh all the different freedoms that they have, they're going to have an even larger design space. And then also once we have like ability to kind of make like soft copies of humans, then there's also going to be like all sorts of weird and wonderful new types of arranging personalities and loyalties that is just like completely untouched. So like science fiction writers of the world like you know please please come invent the new societal units that we might like to inhabit in the future. >> Any guesses any concrete guesses on on what those sort of groupings might be? Oh, like just for instance, in your day-to-day life, you might end up having a lot of personalities that um like you could have someone simulating your like I don't know dead grandpa telling you like what your ancestors would have wanted or I've also been doing a little bit more work on people's relations to LLM and like I think a lot of people really want to be told what to do is maybe something that I think is like showing up and you know like we don't want the AI like browbeating people but a lot of people are like basically like submissive or they want someone else to have a plan. So, I kind of think that a lot of the for a lot of people the best way to interact with LM might be some sort of good cop bad cop sort of situation where there's like the mean LM that tells you to like clean your room and go, you know, get a job or whatever. And then there's like the supportive one that's like, oh no, we don't want to make the mean one mad, so let's like clean our room. >> Yeah. Basically, we right now we have this like we have like therapists um a few other sort of like very consentheavy peer-to-peer kind of relationships. And I think those work for a lot of people. I think there's going to be a lot of people for which they would prefer something maybe looks more like being a private in an army or some uh person on the quest and they don't even know what the quest is for. And of course, this is very dangerous, right? You can definitely like this is like it's so easy to make cults, right? Basically. Um, but I guess I feel like the way that LLMs relate to people for a lot of them isn't going to look like this like all-in-one like confidant and uh adviser sort of thing. >> For people interested in working on these ideas, what should they do? Where should they apply? >> Oh yeah, I haven't taken any PhD students lately because I'm in a computer science department and it's like I don't want some someone to come and take like databases and stuff. So I guess I'll just say I don't know. We have a Discord that we made after the workshop where we're trying to get all the like-minded people together. Um, show up [snorts] on less wrong honestly. Like that's that's an amazing community right now and there's just lots of people thinking along these lines. >> Fantastic. David, thanks for chatting with me. My pleasure.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

Lex Fridman Podcast

12 Jul 2025