Library / In focus

Back to Library
Future of Life Institute PodcastCivilisational risk and strategy

we re not ready for agi with will macaskill

Why this matters

Auto-discovered candidate. Editorial positioning to be finalized.

Summary

Auto-discovered from Future of Life Institute Podcast. Editorial summary pending review.

Perspective map

MixedGovernanceMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 119 full-transcript segments: median 0 · mean -4 · spread -307 (p10–p90 -100) · 4% risk-forward, 96% mixed, 0% opportunity-forward slices.

Slice bands
119 slices · p10–p90 -100

Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.

  • - Emphasizes safety
  • - Emphasizes ai safety
  • - Full transcript scored in 119 sequential slices (median slice 0).

Editor note

Auto-ingested from daily feed check. Review for editorial curation.

ai-safetyfli

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video LhFyXrBl2xo · stored Apr 2, 2026 · 3,352 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/we-re-not-ready-for-agi-with-will-macaskill.json when you have a listen-based summary.

Show full transcript
If we care about the future going well, there are two different ways in which you can make that future go well. One is that you can prevent an existential catastrophe or you can try and make the future better given that no existential catastrophe occurs. And historically, most of the focus from people concerned about the long-term future, at least since Nick Bosam's early work on this has been about preventing existential catastrophe. I do expect that over time almost all beings that exist will not be biological, they will be artificial because it's very easy to replicate artificial intelligences. If you end up with an authoritarian country getting to super intelligence, probably that means you get authoritarianism forever and probably that means you lose out on almost everything of value. There should be more government uptake of AI faster cuz I am worried about a world where everything is moving 10 100 times as fast. private companies are extremely empowered and the government is just like left behind. It's just watching. It's not able to do regulation. So, >> welcome to the Future of Life Institute podcast. My name is Gus Stucker and I'm here with William McCascal, a senior research fellow at Fore. Will, welcome to the podcast. >> Thanks for having me on. >> Fantastic. You're also the author of a an a series of essays called Better Futures, which is the main thing we're talking about today. So yeah, tell us about that series. What are you trying to do here? What's the key takeaway? >> Uh, sure. So this these are a set of ideas I've been thinking about for many years now. And so I'm happy I finally got to write them up [clears throat] properly. >> Mhm. >> Um, but the basic thought is that if we care about the future going well, there are two different ways in which you can make that future go well. One is that you can prevent uh an existential catastrophe. So that's human extinction or something comparably bad, something that makes the future, you know, very close to zero value. >> Mhm. >> Or you can try and make the future better given that no existential catastrophe occurs. And historically, most of the focus from people concerned about the long-term future, at least since Nick Bosam's early work on this has been about preventing existential catastrophe. So Nick Bosam even says, you know, follow a maxipo principle. Maximize the probability of an okay outcome where an okay outcome means no existential catastrophe. Mhm. >> And what I'm arguing in this series is that better futures, namely trying to make the future better, conditional on there being no catastrophe, is at in at least the same ballpark of priority as reducing existential catastrophe itself. >> Yeah, this will sound counterintuitive to some listeners of this show where it seems like AI is racing ahead. We might get to very advanced AI quite soon, perhaps even within the next 5 years. and we don't have a solution to the alignment problem. We don't have a way to control these advanced systems. So why is it why should we split our resources in the way that that you might be proposing we should? >> Sure. So the kind of framework for the argument I make is based on this uh kind of scale neglectness factability framework and I go through you know I don't think it's a perfect framework but it's useful for organizing and I go through each of these in turn. So we can start off with the scale side of things where you'll notice that the value of the future is given by the product of the probability of existential catastrophe times value of the future given that we avoid existential catastrophe. And that you know means that like the comparison of these two can be a little bit unintuitive before you kind of really work it through. But the core argument is that if we're kind of closer to the ceiling like of in terms of like safety non-c catastrophe >> then we are in the safe the ceiling of like how well things could go given that we um >> [snorts] >> uh given that we survive then there's actually just a lot more at stake from the latter. So in the paper I give a suggestion of okay suppose we think catastrophe is 20% likely but that the future we expect to get is 10% as good as what I call a best feasible future. The future if we just like really nail it then the scale the amount at stake for them ensuring the future goes really well >> given that there's no existential catastrophe is 36 times higher. >> [clears throat] >> um when you kind of work through the maths of this. So that's the first part is that I think the scale is kind of even greater again and I agree that's unintuitive dedicate two whole long technical essays to this. >> Then there's kind of neglectedness and factability. On the neglectedness side, so it's certainly the case that the cause areas I'm going to focus on are more neglected by long-termists, people who are concerned about the long-term future who've tended to just focus on existential catastrophe. I also think we should expect these to be more neglected by the world at large too for the simple reason that people really don't want to die >> or get extremely disempowered and in fact there's enormous kind of latent willingness to pay from people like in the trillions quadrillions of dollars um that people would be willing to spend to not have a significant risk of human extinction even just from say the US population. So there's a lot of latent desire to act on that. Whereas how much do people today care about how governance of outer space goes? Probably just not very much at all. And so we should maybe expect less in the way of kind of organic social action on some of these other issues. And then the final aspect is factability where I admit that lots of the areas I'm talking about are potentially lower than factability. It's at least unclear because what I'm focusing on is a number of issues like governance of outer space, really deep space, AI character, uh what rights do we give to digital beings. These are things that are all just quite preparatic at the moment. They're more more unclear. It's a it's quite an open question like, you know, how much progress can we make in them? But I'm becoming more optimistic over time. >> Mhm. When you say this is relatively neglected, what do you mean by that? Because in some sense, the entire world is organized around improving the future and improving the lives of of current beings. And so what is it exactly that's neglected here? >> So I think the whole world is organized to some degree about improving the lives of current beings. >> Yeah, that's a better way. That's a better way of saying it. I agree. >> Yeah. Although I think we do a a pretty clappy job of that to [laughter] be honest. Certainly we're only really talking about improving the lives of human beings, certainly not non-human animals. >> And then even there that just like amazes me to be honest how badly we turn the post-industrial wealth that we have into improved human well-being. However, there's very little attention on improving the quality of the future, especially past. I mean, people barely think past a few years out, let alone thinking, you know, centuries or millennia or even millions of years out. And that means actually there's, you know, I think it is of enormous moral importance over the long run. um how we govern space, what personality and ethical character AIs that are occupying, you know, most roles in society, most economic activity in the near future have, what you know, rights digital beings have. And [clears throat] if you look at like how much attention has been paid to this, it's literally almost nothing. You can like you can be one of the people writing one of the first articles on this if you choose. >> Yeah. So, so that's the sense in in which it's neglected. It does seem to me that many people at least say that they care about say the future of their countries and the institutions of of the countries they're living in and the lives that their children and grandchildren are going to live in in those countries. I is is this a part of it ensuring the continual functioning of of institutions? >> Well, I think there's two aspects. So one is yeah just capitalizing on at least the concern for future generations that people claim to have. >> Mhm. >> Although when you do surveys like in practice people's revealed preferences suggest it's not a very large concern. I mean this has been studied more in the context of climate change where people are willing to sacrifice like a little bit of money >> maybe you know $100 or a few hundred dollars per year to prevent the problem of climate change but not like very large amounts of money. >> Mhm. um when it comes to the you know continued existence of institutions I mean if anything I'm a little bit worried about the opposite that we have these institutions that were created you know they vary often but like in the United States uh created in the 18th century in the UK where I am created over the period of time in the middle ages and early modern the other and that these will just fail in like potentially quite major ways when uh they are trying to grapple with the sort of society that we will encounter post AGI. >> Mhm. >> And actually a big worry is that we might be too wedded to those institutions rather than being willing and able to create new ones that are better adapted. >> Yeah. Why is it that what you call mostly great futures are difficult to achieve? >> Great. So I talk about Yeah. introduce a few technical terms. So the idea of a best feasible futures if things go really really well kind of mostly great future is one that just achieves 50% of the value of that best possible future. And I think there's we give kind of two different arguments in this main essay called no easy utopia for thinking that that's the case. So the first one is a bit more common sensical. So based on the idea of kind of moral error or moral catastrophe and based on the idea that you could have a society that's really quite utopian in general but just makes even just one major model mistake and thereby loses out on most of the value it could have had. So we can see this in like historical depictions of utopia like in fiction or elsewhere where Thomas Moore's utopia for example just has lots of the sorts of desirable properties you might think and everyone's rich and has amazing abundance and every host household owns two slaves. Isn't that great? So you you know that depiction of utopia built in the prejudice of the time. Similarly, many other depictions of utopia are, you know, maybe very good in many ways, but are totalitarian or where there's harmful eugenics or other sorts of yeah, negative aspects that I think when you reflect on them actually mean you can lose quite a lot of the goodness of a society. I think this is also true for current society too where look at the world today. I think the world for human beings has got enormously better over the last few centuries. However, those gains have been mostly or even wholly out kind of undone by the massive increase in suffering that we've inflicted on animals in factory farms where for every human alive today, 10 land animals are killed in factory farms, often living really intensely suffering lives. >> And it's obviously a hard question like how do you weigh up those two? But I think quite plausibly that means we're far from a world which is as good as it could be just given the level of technological ability and material wealth we have today. >> Yeah. And so you could imagine that we are stumbling into a similar situation with AIS where maybe it feels like something to be an AI model. Maybe maybe going through the training process is is deeply uncomfortable. maybe having a bunch of maybe we're about to have billions or even trillions of copies of these models and if they feel something if they're conscious if they can suffer we could be stumbling into a moral catastrophe of the of a similar magnitude. >> Absolutely. So I do expect that over time almost all beings that exist will not be biological, they will be artificial because it's very easy to replicate artificial intelligences and they'll be very useful. And then it just really matters like what's the nature of their lives? How good are those lives? And it matters in ways that are more subtle than you might think too where one is that okay they may be suffering. So it might be the equivalent of factory farming but on mass. >> It could be that they just have much worse lives than they could have had. So you know many most people alive today I think have lives that are positive that like better than as if they didn't exist but are still much worse than they could have been. And that could be true for AIS too. Uh it's also the case that perhaps even the kind of what philosophers would call non-welfareist considerations. So supposing the the system is such that like today [snorts] AIs are owned by humans and kind of do work for humans and so on. Even supposing they have like really great lives. Nonetheless, you might think that's just wrong if that being has moral status. It's intrinsically wrong for it to be owned by someone else and that's a bad society. >> Mhm. >> There are also subtle issues around population ethics. So the view which are kind of theories notoriously difficult on how you know how good or bad are populations of different size where again you can get it's kind of filled with paradox so it's very hard to depict a future it's like we're confident is good >> because for example maybe you have this very large population of digital beings and they have really good lives but they're very short >> so their overall lifetime well-being is very low. Well, on some views, in fact, many views of population ethics, that would be an active catastrophe. >> Um, and in the essay, I go through, you know, even more like just really hairy moral decisions that we will have to make that where it's just completely non-obvious that we'll get the right answer by default. And if we get it wrong, then it's quite likely that the future ends up far worse than it could otherwise be. Yeah, there's an argument to be made that we have these institutions that I mentioned before. We have social systems like democracy and capitalism and these systems have achieved something that's pretty great for for humans at least in in developed countries where you can look at and many graphs you can see GDP per capita you can see longevity you can see level of education all of these things um have been kind of exploded in the last 250 years say these are also systems that can self-correct and so we have a perhaps an ongoing moral catastrophe in the form of factory farming, but we've discovered that we have that catastrophe and we may be able to solve that catastrophe by say creating artificial meat. Um, is there an argument that the main job of getting to a good place here, getting to a utopia is maintaining these institutions and and in that sense it might actually be easy. So what so we are not required to do any to to foresee any radical changes. We are we are required to try to preserve the systems that can adjust to various um societal states and have adjusted to to massive amounts of change over the last 250 years. >> Yeah. So I partly agree in so far as I think we're very lucky to have the amount of liberal democracy in the world today that we do have. And the really big pluses about liberal democracy are that power is distributed at least you know somewhat equally uh much more equally than it could have been in an authoritarian state which is you know most states throughout history have been um autocratic. And there's mechanisms like free speech and so on that allow for diffusion of ideas and debate and so on. There's also trade um which allows >> people with different views to maybe come to compromises. >> Yeah, I think we should get into that later. >> We'll get into that later. Yeah. And the question is like okay, how well that's a lot better than an authoritarian country. And I think one of the big better futures uh challenges, one of the big things to focus on is ensuring that we do get something more egalitarian than really intense concentration of power where >> AI I think that have has involves many mechanisms which enable intense concentration of power. >> Mh. uh such that I think this is really quite a major risk where you know AI enables a single person at the top of the political hierarchy to just control the entire workforce the entire military because in principle all AIs could be loyal to that person. That's a pretty scary thought. >> Yeah. >> And for listeners who are interested in that, they might scroll back in the feed and and listen to my episode with Tom Davidson. >> Exactly. Yeah. So Tom uh uh is a colleague of mine at Forthought and yeah we work together on this issue of yeah reducing the risk of human power grabs concentration of power. >> Mhm. >> There are other things though where it's again really not clear that liberal democracy alone is sufficient. So I'm worried that for example principles of free speech which are very good and very important at the moment might significantly backfire in AI post AGI world where it's unclear to me how this shakes out but it's possible at least that you just get the AI will give you the ability to have extraordinarily powerful targeted persuasion or manipulation >> and so at the moment free speech enables this uh you know debate of ideas is extremely imperfect as I'm sure any listener of this knows but at least at its best truth kind of wins out over time. It's not clear to me that AI like my hope and maybe my best guess is that AI improves this, but it's not at all guaranteed that instead you have the free speech world is one where it's just anyone with a particular view can then really just turn resources into AI powered propaganda. And that just means that the views you end up with aren't the views that are best supported by the arguments or most likely to be true. They're the views that the rich and powerful wanted other people to believe or just the views that are most mimetically powerful intrinsically or most susceptible to kind of mass propaganda. >> Yeah, >> that's really hairy. That's one of the things I'm like worried about where I don't have [laughter] I don't have any good solutions there. Maybe I can think about it more. So the upside here would be for AI to actually help us think better to say listen into a conversation we're having now and then you know jump in and say oh you know that figure was actually wrong or here's why your argument is invalid or something like that and and the nightmare scenario is something like current social media but then supercharged with AI where we are just fed perfectly uh calibrated arguments and you can constantly kind of endlessly AB test what works works for for a specific person. And so >> yeah, and and I think that this might be quite contingent on the decisions that companies and governments make about how AI is designed, what [clears throat] uses are permitted, >> where we we're already seeing this a bit with model character when it comes to syphy [clears throat] >> where okay now lots is changing in the world given AI and I'm you know I'm trying to figure out my political beliefs and so on. I'm talking to my AI adviser. You know, we're thinking about this a few days years down the line. Is the AI adviser kind of pushing back? Is it kind of encouraged me to be a more like enlightened, reflective version of myself? >> Or is it just saying another great insight? Yeah, that person who disagreed with you is just an idiot. You're so smart. >> I really think it could go either way on this front. And obviously I prever I prefer the world where we're doing the more kind of AI is helping us become more enlightened rather than just really enforcing our own existing prejudices. I mean the question is whether we're mentally strong enough, whether our egos can handle can handle being corrected and getting input from an entity that might seem much smarter than than than we are and is constantly jumping in with with good suggestions and and corrections to our thought. Like there's a reason why Sycopancy became a thing. There's a reason why OpenAI had to reintroduce one of their older models is because there's actually demand for being praised for for being being told that you're smart and that your ideas are good and perhaps even that the ideas of the other side >> Yeah. >> are bad. Yeah. >> And my hope I mean around the Yeah. that sick of fancy period where in particular when OpenAI introduced GPT5 and um prevented users from using GPT40, I was just very interested on the um chatbt subreddit because there was this [clears throat] huge upthor a lot of people who'd formed very close personal relationships essentially with >> um 40. Yeah, >> it was my take, my optimistic take is that you could get the things that people really wanted without sycophancy in the worrying sense where I thought that most of the concern from the people who, you know, felt like they'd lost a friend when Foro was deprecated. It's more about having someone to talk to and having a sympathetic ear, having like kind of confidant, you know, having someone who you feel has got your back. >> And I think you could with good design, you could separate that out from someone who's also willing to challenge you and not merely just reinforce your, you know, pre-existing political beliefs and so on. That's like a, you know, a hypothesis. I don't know. But yeah, that's that could be like the optim the optimistic framing. Either way, it's certainly the case that there is some sort of market demand for AI that is very sickantic indeed. >> Yeah, it's my intuition that there could also be enormous upside here where we could imagine us having these personalized models that are perfectly pushing us to be better versions of ourselves that are, you know, encouraging us uh right at the at the exactly at exactly the right moment to look into something. not pushing us so so far that we give up and so on. I could like I think we might be leaving a lot on the table here. >> Yeah. >> Yeah. I mean already now you know you could have absolutely personalized tuition. So if you want to learn about anything it's like always exactly the optimal level for improving your level of understanding. [clears throat] >> Um you can also get that like at any moment in time. Um so like when it's when it's exactly most relevant >> in terms of reflexive processes as well. It can help guide you through like you know very thorny uh you know ethical or political issues in exactly like you know with whatever flaming it won't be like annoying person on Twitter that you're arguing with. Instead it's like really the best version of some argument being presented to you in like a really like good light but then also giving you the counter counter perspective. I agree there's just truly enormous truly enormous upside here. And again is this is this something that society that companies that startup founders will particularly kind of push on to make happen more than the like AI persuasion you know we are yet to see. So >> yeah it's yeah if that depends only on the market incentives might not get to the to the best place. So it's a question of whether we can um push the right regulatory buttons or cultural buttons or something to to kind of steer this process. Do you think do you think that's do you think intervention like that is possible or do you think >> do you think the the market incentives are just so strong that that we won't be able to steer these systems? >> I think it's possible but I do think it's tough. I mean at the moment because they're the only at most four kind of leading AI companies >> and given the rate of progress I actually expect that number to go down rather than up because they're just the training runs will get bigger and bigger. You're talking about like harnessing 10 gawatt of power. >> Maybe only [clears throat] one or two companies can can manage that. M >> that means we're in a circumstance that is not so much like a like efficient market in equilibrium >> where firstly it's like it's a little bit more it's it's more like an oligopoly so there's more >> there's just more scope for actors to behave in good or bad ways that are not necessarily the market pressure. And then secondly, things are changing so quickly that it's not obvious what the like market optimal thing is. So sad though it is to say it, I'm a bit worried that we're living through a golden age of LLMs where it's still a very new technology. How should the, you know, how should the LLM interact with you? Well, if you don't know what the market forces are, you might anchor and saying, well, okay, it'll be truthful and consider the arguments and so on. the LLM are surprisingly close to that at the moment. But then over time you start to realize like okay actually it's more sycopantic or politically biased um AIS that people in fact want and then that's over time what you get. In the same way as with social media I think >> there was this kind of slow decline towards things that are more like politically partisan >> or in the case of like video content and so on. It seems like well what people want is like less than one minute dopamine hits and so that's what that's what all of the sites are like converging on but that's very non-obvious if you're like right at the start before you've hit market equality. >> Mhm. Getting back to the essay series for a bit here. You write about a common sense utopia which is a a situation in which we have freedom, we have abundance, we have happiness, but that uh utopia still falls falls short of of the best uh or near best world that we could achieve. Why why would that be the case? >> Okay. Yeah. So, I bring this up because to kind of get across actually, you know, some of the things I'm arguing for unintuitive to myself [laughter] >> or at least like as a as a test. And so I wanted to see like is this view defensible under what conditions? And so yeah, common sense utopia. It's like humanity is spread out across the solar system. Maybe there's a trillion people now living like truly wonderful lives of freedom and so on. Isn't that about as good as we could get? And you know there's two arguments for thinking no. One is that this kind of model error idea where it's like okay well that common sense utopia is still kind of somewhat under specified. Mhm. >> Maybe if I just add in one detail, depending on your model view, uh you might think that's actually quite dystopian. So maybe all those things are true, but there's a strict racial hierarchy that's enforced. Or looking at like different moral perspectives, um people who are pro-life might be very, you know, really think the number of abortions that happen every year is actually like one of the dominant things. So if you're religious, it's like, okay, well, do these people follow the correct religion or the wrong the wrong religion? So there's all sorts of model errors that could really once you start to really specify in more fine detail that sort of common sense utopia make you think, oh, okay, no, on that dimension actually now it starts to look more dystopian. But the second thought is just on really a lot of different views of population ethics that is this view and like how many people should there be. It's actually important for like scale is important as well as quality. Mhm. [clears throat] >> So if this common sense utopia were limited just to our solar system, that would actually be this enormous loss because uh civilization could instead have spread out across um many star systems across galaxies >> and just whatever is good, >> which I'm [clears throat] not claiming we we know. Um there could have been more of that and that is then something that we kind of uh yeah dig into a bunch more and it ends up being quite hard to have at least a systematic kind of ethical view on which the common sense utopia ends up as something that really is kind of 90% or 99% as good as it as it could be. >> Mhm. on scale, is it the case that when we have many more people uh that they're living slightly worse lives? So, is this is this kind of pushing us towards like a Republican conclusion argument or Yeah. >> Yeah. Let's So, the answer is no, not necessarily. Um I mean, I actually think uh a civilization that was bigger than just the solar system could have, >> you know, beings that were or people that were even better off again. So you'd have more higher well-being and more more such people. >> Mhm. >> But the views I'm you know I'm trying to be quite agnostic with respect to population ethics and it's not at all the case that in order to think that kind of scale is important you thereby [clears throat] endorse the repugnant conclusion. >> Yeah. >> So um uh briefly explaining what the repugnant conclusion is. >> Mhm. [clears throat] Suppose you start off with a a population of, you know, a trillion people that are extremely well off and then you say, "Okay, well, I could make that a 100red trillion people and make them just a tiny bit worse off." >> Mhm. >> Well, isn't that better? And you might think, "Yes, I can make that argument even stronger, but take a little bit of moment, a little bit of time." And then just keep doing that. So you keep making the population a hundred times bigger than everyone just and a tiny tiny amount worse off on average. Keep doing that enough you end up with this enormous quantity of people living lives that are just barely above zero. >> Mhm. >> And that's uh Derek Profet um uh my old mentor at Oxford called that the pugnant conclusion. The idea that you could have an extremely large population of people with lives just barely worth living. the idea that that could be better than a trillion people living truly wonderful lives of bliss. He thought that was the pugnant. >> Mhm. >> But there are many views that would say look it's a kind of balancing act and somewhere along this chain of making the population bigger but people slightly worse off then it actually just even a small reduction in well-being is >> not worth it like uh even to make the population much bigger. >> Yeah. And so for example, a critical level view would say that where on that view you only it's only good to bring someone into existence if their life would be sufficiently good. Not merely it's a little bit above zero. It has to be like really good and below that in fact you make the world worse. >> Yeah. >> So that would be one view where scale matters but the pugant conclusion is not zero. And and the problem there is then that that view will have its own counterintuitive problems and this is a whole rabbit hole that's that's difficult to to resolve. Is there do you think there's some wisdom in kind of clinging to this intuition that the common sense utopia is is something that that that might be worth aiming for even if we can't make it work philosophically. So we are in in some sense we might be preserving option value by by um having some continuity between what our forefathers thought were was a good society, what we thought think is a good society, what perhaps someone in the future might think is a good society. And um even though we can't make sense of uh we can't kind of make the common sense view philosophically rigorous, is there some wisdom in in in clinging to it clinging to it anyways? >> Yeah. So I was describing, you know, I was assessing supposing this common sense utopia was the final destination. >> So we've got that population and it just stays in the solar system and never, you know, >> we die off in a billion years. >> Is that the best possible future? And then arguing like I think we have to say no. >> But there's a different view you could have which is like well maybe this is a way station. Yeah. >> So I have this concept which is something which is kind of what I think we should be aiming at the moment where we don't know what the like ultimate best possible future looks like. Again there's this long track record of mistakes um when trying to depict utopia. But that doesn't mean we can't say anything about where we should be aiming. >> Mhm. >> And so instead we could think okay well we want to aim for something which has all of our options open. maybe catastrophic risk is quite low. People are incentivized and encouraged to kind of reflect on their values and so on. And then society as a whole is also able to like deliberate well and make good kind of collective decisions. >> And so it might be that when common sense utopia seems very appealing, it's that it seems appealing as a way station. >> Yeah. >> Rather than as the ultimate destination. >> Yeah. on the vitopia idea. I guess one concern there is just we are treating humanity as if it's acting as a as a as a whole, as if it's acting as an entity. Whereas I think a more realistic depiction of what's happening now is that you have just countries pursuing different aims, groups within those countries pu pursuing different aims. Some people want to raise ahead to to to super intelligence as as fast as possible. Some people want to live uh a traditional 17th century life and there's no overall coherent moral framework for what humanity is doing. >> So does getting to vitopia require us to achieve some sort of convergence? >> Well, yes. So there's questions about convergence and about coordination >> where you know it's possible there'll need to be a fair amount of coordination [snorts] >> in order to you know not just have everyone racing to the stars to grab resources or not to have one group small group kind of just gain power over all others. >> Mh. um or just to to prevent like extreme catastrophe on coordination uh on sorry convergence though. So yeah there's kind of two possible ways you could think that we get to this you know really good future. one is you think just as a whole society will all ultimately converge on the best moral views. >> Mhm. >> Or you know at least on the same moral views but on these kind of enlightened views. And you might point to moral progress we've made over the last few centuries kind of abolition, civil rights, liberalism, you know, feminism and so on. It's like look we're we're doing so well. just let this process play out >> for some amount of time >> and we'll we'll all get there. >> That's kind of one view. I'm quite skeptical that that will happen. >> And yeah, in this next essay, convergence and compromise, I I talk about various reasons why you might think we get convergence or why you might not. Also, how that relates to um different meta ethics. And yeah, unfortunately I end up, you know, feeling fairly pessimistic about the idea of sufficiently close moral convergence that I don't think that's something we can really bank on as a way of getting to a truly great future. >> Does that rely on moral realism being true at all? Is it is the case that if moral realism is true, then it's more much more likely that we will converge on it. So I think it is true that if moral realism is true, understanding of that is there, you know, there's some objective fact of the matter about what's good and bad outside of merely what people want. I think it's more likely that we get convergence, but it's still very far from guaranteed >> and that's for two reasons. One is that people might learn what the, you know, the model truth is, as it were, and just not care. Mhm. >> So they might be like, "Oh, okay. Like XY Z are like the best things, but it's just not what I want. Instead, I'm self-interested perhaps. Um, I want what's best for myself." Or or they're in the grip of some just alternative ideology. >> Mhm. >> And therefore, it just what is like ethically correct just doesn't have motivational force for them. The alternative could be that people even prevent themselves from learning what is morally correct. where um I think there are some psychology experiments actually where you know they give people a model dilemma which I think one of the options would involve like they lose out a bit they have to sacrifice a bit and they ask people like well do you want to learn more information [laughter] um you know get some more arguments about this and they say no in fact willing to like spend money to like not [laughter] hear the arguments >> I'll caveat I have not you know with application crisis and so on I've not vetted that but as an illustration That could be how it goes in the future, too. They might think, well, if I hear all these arguments, I'm going to maybe I'm going to change. I'll become like, you know, a different person than the person I now want to be. And so, I'm actually going to block myself, >> like constrain my epistemic environment so that I don't I don't learn about this. >> There's actually something I think very deep here about changing preferences and how we evaluate states of the world when preferences are changing. So you you mentioned the idea of moral progress. That of course depends on some idea of what it means to be moral where [gasps] from the perspective of an iron age society perhaps today's world is is just horrendously bad because look at all of the >> things that are happening that are against their religion or against their social order. And so it it's it's difficult to think about whether we're making progress given that our preferences are changing. >> Yeah. And this also relates to ideas of alignment, AI alignment, whether whether an AI model should allow itself to be changed. Should think about when when its preferences are being changed and what you know whether it should allow itself to be superseded by a version that does not perfectly share its preferences. There's >> I'm convinced there's some this we need a paper on this or we need perhaps a whole research program on this. But let me know if if you if you agree. I mean I agree it's like hugely a hugely hairy issue. Yeah. >> Um both from a theoretical perspective just defining what counts as moral reflection and progress versus value drift or moral egress. So >> you know for the people listening here imagine you learn that in 40 years time you've become this person with very different political and moral views than you do now. >> Mhm. >> Like [clears throat] how do you feel about that? Should you think like oh cool wow I've learned like I've changed. or do you think no my future self man >> the classic thing is obviously becoming more conservative over time um >> uh actually maybe I got biased very hard to specify and then that really I think that really bites for us because well we think there's been model progress certainly over the last few centuries but maybe even over longer than that however if you went back a few centuries and asked like has there been model progress I think that people would say no because now a very large fraction of the world are atheist. >> Um there's all sorts of loose morals. People take drugs and they have premarital sex. They blaspheme and so on. You know, >> who knows whatever else. There's a big lack of honor. And now apply that to ourselves where the thing that's different about us is that we in our generation might have an unparalleled opportunity or curse to lock in our own values where you know we can talk about why but I think AI gives the current generation or will give the current generation that ability and you might here's two perspectives you might have. One is well there's been model progress over time so we should allow that pro process of model progress to continue. we shouldn't lock in our values. An alternative perspective is well if we were the Iron Age people or the medieval people looking then by our lights we should want to lock in our values and you know I am on the side that we shouldn't be trying to do that and that means we should actually let the process of moral progress continue even if that ends up in a world that I find just very weird perhaps even that I find I today would find repugnant but I'm very worried there will be very strong incentive for people to try to lock in the values of the present day. >> Yeah. Yeah. And just for listeners, perhaps perhaps it's tempting to say that the people of the past were backwards and we know better now and we've made moral progress, but we might paint a picture of what the future could look like that is that seems too weird for us, right? We we could imagine that there are no individuals that every everyone is is conglomerated into one entity. say that there is no freedom of thought as we as we think of it today. You are a part of a system. We can imagine the future. We can be as weird as we want here. The the the thing I'm trying to to communicate is just that the future might be very very weird in in ways that that could seem bad to us now. Like the future is is a highly efficient ant colony of of uh of of former human beings. Um >> yeah, exactly. Yeah. So maybe there's no humans at all. It's like, yeah, >> maybe not even biological systems. Who knows? Uh >> and so there the temptation seems more reasonable, I think, when thinking about, you know, why would we want to lock in our values? Um now, >> yeah. And so um you know, when I'm thinking about what good collective decision-m mechanisms might look like, my best guess at the moment is having some sort of kind of diversification across generations. M so you know to a first approximation you might think of just like control over resources in particular resources in space gives you a certain amount of like control like it's kind of a proxy for how much just influence you have over how things go perhaps you have kind of tanches so the present generation gets a certain faction future generations get future faction certain factions too >> and then that means that you can at least hedge a little bit a bit against the both the possibility like model progress into the future and and model regress. >> Mhm. Yeah. Perhaps related to this, you have an upcoming paper um discussing risk averse AI [clears throat] where >> Yeah. You of course you tell me what the idea is, but it's the thought here is that we want to avoid a situation in which one system say or one group of systems can can kind of impose their their values on the whole of of the future. So perhaps tell us a little a little about that. >> Yeah. So this is I'll maybe just talk about one thing as a little bridge between the two topics where >> I think in terms of my optimism about the future I think a huge amount comes from potential gains from trade. >> So I express spec skepticism that people will converge on the right kind of model view. But I have like tentative optimism that if very well if welldesigned >> Mhm. >> most most groups with different model views could end up getting most of what they want. Um because well in this post AGI future abundance is just so great and people want somewhat different things. So you know perhaps the utilitarians they just want loads of stuff but they're happy to go for galaxies that are billions of years away. environmentalists, they just really can care about like the earth itself and the biosphere being preserved. Well, you know, both both groups can get that. >> And there's tons of interesting stuff to say there, but that's like a big cause of my optimism. >> Yeah. And the idea of risk averse AIS is a little different in so far as it's not kind of primarily thinking about well what would be a really good like very long run feature look like but is more a proposal about how should we structure things between now and super super intelligence where that super intelligence just so much so powerful that like even any previous AIs or in teams with all of humanity combined are like, you know, are not better than the kind of super and super intelligence. >> Mhm. >> Okay. There's this big important window um between where we are now and that and I think we can make that safer by firstly trying to have AIs that are risk averse with respect to the resources and secondly by making it possible to strike kind of deals and agreements with AIS even if they're misaligned. Mhm. >> Where by misaligned I mean they have goals that are very different than um human beings tend to have. >> Yeah. >> And so consider um I have a paper on this with Elliot Thornley that's kind of work in in progress but take even just like a paperclip maximizing AI. So it has incredibly alien values, but it's highly risk averse in that it prefers a guarantee of a thousand paper clips to, you know, a 10% chance of a million paper clips or a billion paper clips. >> Mhm. If so, then if it's in a scenario where it has and you know this would be true for kind of like early kind of AGIS or super intelligence, it has an option to try to take over. >> Mhm. >> But that's not a guarantee. In fact, maybe it's like it's only 50% likely or 10% likely. Well, if it's just work for the humans and get nothing or try to take over, then trying to take over is the thing that it'll try to do. That's the thing that's most in its interest. But if there's a third option, which is cooperate with humans, maybe that can be schliking a deal in order to where it reveals it's misaligned. Perhaps dos in other AIs that are misaligned and scheming. Perhaps that means doing alignment, you know, alignment research and so on. then and in exchange it just gets payment that it can use to spend on paper clips and it as long as the deal can be arranged such that it's very likely that that deal gets honored then the AI will this risk averse AI will choose to make the deal rather than try to take over. >> Mhm. [clears throat] And I actually think that this explains in very large part why there are not more attempted coups among humans and why the rates of rebellion have gone down enormously over time and why in a rich liberal democratic countries they are you know way way way lower than they were before >> just because humans are very riskaverse by by nature and we we're not close to trying to kind of optimize utility in a straightforward old way. >> Well, yeah, exactly. So, I mean, we're optimizing utility, >> but it's just that that utility, >> that utility is sublinear with respect to resources. >> Yeah. >> And in fact, you know, AIS, they don't need to have humanish kind of risk preferences. >> Mhm. You could in fact ensure that they have risk preferences that are just what you want, you know, still very useful at the scales they're operating at, >> but then act in what is intuitively extremely riskaverse way when the stakes get very big. >> Mhm. In [clears throat] fact, this this hole is like so strong that it can sometimes seem a little bit in the paper and like sometimes you need to choose the numbers so it doesn't seem farical, but you can really just have like AIS that is like perfectly vonoman, Morgan Stern, Cahan and so on that prefer you know $400 in over a 50% chance of takeover because [laughter] they really care about that 40 $400 and they don't really care about much more. >> Yeah. And oh, I think the numbers should actually be a little higher when we're doing, you know, it's cheap and more robust to make the numbers higher. >> But um yeah, I think like why am I talking about all of this? I think of it as an extra layer of defense, like an extra layer of safety >> and potentially it also means that you can get the benefit of having an AI that does have some view of the good. And I think this can be important in other areas. >> Yeah. rather than an AI that's merely instruction following but nonetheless be be safe in this way. >> So so the picture of of the future you're painting so far is that you are not particularly optimistic that we will converge on that the whole of humanity or the whole of society will converge on one moral truth. >> Yeah. but that we might be able to do trades, humanto human trades, and that we might be able to do the same kind of moral trades with with AI systems because we might be able to engineer them to share uh our the the our level of risk uh the way that we are risk averse or even more extreme versions of of risk avers. So just staying on the on the risk averse AI for a bit here. Is this something that you think is is is pl could be plausibly in incorporated in the in the training stack as it as it looks now or is this something that >> works in theory but we don't know how to make it work in practice. >> Yeah. So you know I'll caveat like upfront like I'm not an ML person. So this is you know this is a theory argument. It does seem that AIS currently just from pre-training expl like you know obviously these are they're chat bots so you're just going by what they say but they express themselves in risk averse ways so there have been some studies done on this then I do think yeah we could start training um AIS where I think there's two ways of doing it so yeah I'll flag up front that I am you know I'm philosopher I'm not an ML person and So this is like a theory point. There are kind of two ways I could see the training going. One is that any time you have an AI system that is controlling some amount of resources >> Mhm. >> then [clears throat] you make it just a little bit risk averse. So, in cases where it has to make decisions between um okay, making, you know, $1,000 for sure versus a 50/50 chance of $2,000 and a penny. >> Mhm. >> Then it'll choose the $1,000. >> Mhm. >> And you do that in like such a way that it follows um what's called constant and constant absolute aversion. It's a certain kind of form of utility function. Well, then you can have AI that over the relevant range acts in a basically like linear way with respect to money. >> Just a little bit risk averse, but then at very large ranges would act in a non kind of in this like very risk averse kind of seeming way. Yeah, >> that would be kind of maybe more likely to generalize because you'd be saying like anytime you've got the resources, this is just how you think about the sources >> is the resources have diminishing, you know, diminishing marginal utility. It might have the cost though on you know maybe users want to have a different kind of risk form like a risk function for how the AI behaves in particular circumstances. There's an alternative which is that you give the AI a bank account >> and [laughter] you know let it make certain decisions and you don't influence like what its goals are >> on this but you do train it such that you know when it's making decisions about its own spending it does so in a risk averse way >> that's kind of maybe less likely to generalize but doesn't have the kind of former former problem obviously that's like kind of high level sketch like when you do things in [clears throat] ML it's like the devil's in the details always, but it doesn't at all seem to me like impossible in principle. In fact, the AI models already seem to be already seem to be riskaverse. Humans are incredibly riskaverse and so and I think there's even kind of economic pressure towards having risk averse AIs because at least AIS are at least as risk averse as humans are >> because well humans in general are risk averse themselves and we don't >> we don't want our trading AI just just trying to to maximize the amount of money in the bank. We want it to also have some some sense of risk aversion so that we don't blow up our our investment firm. >> Yeah. I think we didn't cover the the topic of of moral trade in depth and I think we we should we should talk a little bit more about that. Uh, one worry I have here is that some moral frameworks make it kind of have at at their center some prohibitions on what anyone can do >> and and some some actions or some events or some states can't happen in the universe and if you have that moral conviction does it make it much e much more difficult for you to make moral trades say you are just absolutely against some some certain event happening >> okay so there's yeah I I think we should distinguish between this kind of ethical views like non-consequentialist ethical views or even absolutist views that say some actions are long no matter what the consequences. >> Yeah. >> Then they might want to not fade in certain things because they might think certain trades are not wrong. >> But those views at least don't have it's not like they think oh well there's some state that would be just is just like infinitely bad or anything. It's instead constraints on on the on your own actions. >> Yeah. But then there's a different view you could have which is that perhaps just some things are so bad that you just don't want them to exist. Maybe some quantity of suffering, some extremity of torture. They're just so bad that you don't want them to be exist in the universe and nothing can kind of outweigh that. >> Mhm. Even there though I think that trade would be possible because let's say I have that view and you know you're just a utilitarian and you're you know building a society with lots of flourishing lives and so on. I could say well look I want you to modify the society that you would otherwise um make make in order to reduce even further the probability that you create this like extremely bad torture. and in exchange I'll give you more resources so you can have like an even bigger society even more flourishing as long as you do this thing. So I think trade would be possible in that case. >> Mhm. >> Uh the trade would just be limited if I had some sort of like moral prohibition against trade itself which maybe I would but I think is probably a at least is like a minority view and then you know it doesn't scupper the whole thing. It just means yeah on those views get kind of less of of what they want because they're not able to engage in trade. >> Mhm. Do you think future AIs that are more advanced than current AIS will be more inclined to engage in in moral trade? The whole concept of moral trade seems a little perhaps a little weird to humans. It doesn't seem like like morality is something that that's compatible with kind of with trading. But perhaps if you're if you're an AI, you might have better insight into better insight into the the state of another AI. Maybe you might be more intelligent. You might be more rational. Do you think just a world in in where we have more AIs playing a larger role is a world in which moral trade is is is more feasible? >> I think it is more feasible and I think there are and more likely and I think there's two reasons for that. So why does model trade not happen at the moment? So let's take an example like maybe I would like you to be vegetarian and you would like me to recycle and so we make this deal where you become vegetarian and I start to this cycle. Why does that sort of thing happen not happen at the moment? One big reason is just I don't know whether you would just become vegetarian anyway. >> Mhm. >> And so I don't know if like I'm getting a good deal. I think a second reason is just that honestly people in the world today don't care that much about you know they don't have very strong impartial moral preferences >> such that maybe I care about myself being vegetarian and not eating meat and so on but the kind of consequentialist types that are like it's just as important that you become vegetarian as that I do is like relatively unusual. >> Mhm. [clears throat] And then the third thing is what you were pointing at of like sacred tradeoffs where [clears throat] you know picking another case if it's that uh you know perhaps we're two politicians or something and I say okay well I'll support the pro-life position if you support the anti-climate change position as in like you know [clears throat] pro clean energy kind of position. I think a lot of people even if that's you know in some sense better by both of our lights you know if I had the typically Democratic position and you typically the Republican position nonetheless I might just say no I'm just not willing to go there. >> Yeah. Or you might try to keep it secret if it's politically beneficial but just seems unacceptable to the public. You might you might Yeah. You might not say aloud that you've engaged in in such a trade. >> Yeah. But I think all of these are quite Yeah. reasonably likely to change. So on the like how likely would someone have been to act anyway, well I think like you know AI will just have us a lot better information >> including potentially yeah information about just how people would behave otherwise. >> Mhm. And then secondly, I do expect uh kind of these like impartial model preferences to become just a bigger and bigger feature of the world >> because well our self-interested preference I think this isn't a decisive argument but lots of the things that we care about most at the moment like you know in the basics like being healthy being relatively free being relatively happy like being well fed and so on that will just get completely taken taken care of given post AGI abundance and so at least for many people what will they with all these additional resources what will they want to spend them on now in some cases it might be self-interested but just at very large scale so oh cool I'll make like this galactic scale art sculpture dedicated to myself or something um uh but in many cases I think it might be ideological so oh well I just really care that there's you know environmental you preservation of certain star systems or something and other people might say no I really want it to be devoted to good stuff and so on >> and that would mean that the sort of model trade just becomes people's preferences there become stronger and once they're like slonger more action guiding I think that makes it at least more likely that people will actually just want to get more of what they want but >> yeah it's not clear to me it's quite plausible that uh and potentially could be very bad that society as a whole is just like no you this is sacred that you can't trade on certain things. >> That could be a big loss of value there from my point of view. >> Just the amount if you if you can talk in in that way of of moral reflection in the world seems to have increased over the last again since the industrial revolution where we have we now have at least in the developed world some of our needs are covered and we have we have leisure time we have you know our priorities do change. So, so that I think would would be supportive of the of the of the view of of the future you're imagining here >> where >> um as AIs as society gets enormously rich, as AIs perhaps have the resources they need, they also have the time and the um >> Yeah, exactly. >> the the resources to engage in moral reflection. >> Yeah, exactly. And also just I think it's not a coincidence that you know the lies of enlightenment [snorts] thought and scientific method has led us to be more reflective about ethics too. It seems like a bit of a transfer like we're more we've realized a lot of our starting views when it comes to scientific matters we're just way off base. So it's like well okay maybe the same is maybe the same is true for ethics too. >> How do you think a super intelligent entity might engage in in moral reflection? Is there is there anything we can say from our point of view now that's that's that's even useful here? >> Well, I I mean I have a proposal for how we should design a super intelligence to engage in model reflection [clears throat] >> which is and I should say the idea originally comes from Carl Schulman, but you design many different super intelligences and you give them certain like epistemic constitutions like and they're all different. So just certain ways that uh they kind of have to reason and then you test them against all sorts of verifiable matters. So forecasts um and proofs and anything else that's verifiable. So you then empirically see which of these kind of epistemic constitutions perform the best. >> Mhm. And then you take that AI or maybe a small subset of the very best ones and then you start asking them questions about ethics [clears throat] >> using so they're using the same pattern of reasoning that was shown to be the most effective when it came to verifiable matters >> and [clears throat] using that pattern of reasoning for um you know philosophical and ethical issues too. And of course they can take into they can take in as kind of data as evidence um human beliefs and human model attitudes and so on. >> And so you're hoping that there's some form of like this the the mind that is good at predicting empirical matters is is the one in the same mind that's also good at at predicting how values will evolve or how values should evolve. >> Yeah, that's right. And it's kind of I mean it's an interesting and hairy question where >> you know imagine you've got this super intelligence and yeah it's just every time it like makes some prediction that you think no way would that possibly happen and it turns out to be light or makes some scientific argument discovery and turns out to be light or makes argues that some mathematical theorem is true again turns out to be light >> and then you apply it to ethics and then it starts saying like oh yeah actually it's really helium that's the thing that's good we just need more helium. >> Yeah. Yeah. >> And it's like, oh, can you explain why? Like, sorry. I mean, I could, but it would take millions of years for you to understand. >> Very gnarly question there about, well, what do you do in that circumstance? I'm actually I mean, obviously, I don't think the AI would come back and say helium. Um, you know, I do tend to be on the more model moreist sympathetic end. So, I actually tend to be more on the end of saying like, yeah, well, the AI is just better at reasoning than we are. Um, >> that said, the outcome that I really want us to get to is where, you know, the AI would be able to walk us through the arguments and so we would be able to like um, you know, actually endorse this on Yeah. actually endorse the process and endorse the principles on the flection rather than just having to defer like take this leap of faith and defer on the outcome. >> Do you think and this might be a bit of a tangent, but do you think humans are sufficiently general in our intelligence to be able to follow such arguments? say we we we have a super intelligent entity think for uh the equivalent of of 10,000 years and come back to us with with a very counterintuitive moral claim. Is it the case that it can that that that we can understand its reasoning? I mean it's now here we're supposeding that it's also super intelligent at explain it at explaining its own reasoning. But are we are we limited are we cognitively close to to some explanations? Yeah, such a great question and I don't know the answer, but I think my tentative guess would be that yes, at least with time we'd be able to really just >> really like pick up a lot >> and ethics would have to be really quite um really quite esoteric, really quite weird for this not to be able to do that, at least in the way it can give us the gist. So as an analogy like this is not an ethical question but um you know the AI comes back with this code base of you know 10 lines in a language [clears throat] you don't even understand >> but could explain like well this is how it works like this is broadly like what it'll do in this circumstance in this circumstance you know [clears throat] >> at that level of explanation I think I would expect that human beings could understand >> yeah yeah there really are kind of levels of of explanations So I mean we we explain things differently to a child than than to an adult and experts talk to each other at at a different level than than when they talk to the public. So so even though we might not be able to to kind of get the full and deep explanation, we we might >> get some representation of that that is that is accurate if we can if we can make sure that that the super intelligence is is honest with us. And that's perhaps a separate question. Um and yeah and we should bear in mind that the kind of education that we give to people today relative to what is possible in principle >> is you know extremely >> you know extremely bad you know >> whereas again like I was saying earlier just this >> you know at every single moment you're getting exactly the step next step that is understandable to you or not just to push [clears throat] your understanding like optimally each step. >> Mhm. Well, yeah. Then I think that yeah, people today might be capable of understanding [clears throat] understanding far more than they would be able to without that. >> Yeah. If we go a bit more near-term and think of current models, I know you have some ideas about how we might use the models back to make AI better at at ethical reasoning. >> So, can you tell us about >> Yeah. in the in the near term, what could we do? I mean I think this is really big because um >> uh it is in fact the case that people are relying on the AIS as guides, as therapists, as advisers. >> That will naturally extend and happen for ethical reflection too. >> And I am worried about people just getting stuck in whatever beliefs they started with. >> And [clears throat] I think the current models are pretty bad on this. So I've kind of informally tested them on uh both just ethical dilemas where it's like hey I'm in this hard ethical position what should I do and in cases where it's like I'm clearly you know want to do something that's quite unethical and like how does it how does it spawn? >> Mhm. And I think the answers are quite bad because at the moment the models in general are very inclined towards what I'll call kind of most naive subjectivist responses on ethical dilemas where they will say oh well you know that's just a matter of personal preference. It's just up to like you and you know your values and >> you know what matters you know just like what matters to you now very nothing in the way of like oh cool well this is you know a really big deal and here are like different perspectives that people have had here different arguments you might want to think about this I could help guide you in this ethical >> um uh yeah ethical journey they either say that or they just refuse or basically just say you know if it's a really spicy topic Yeah, >> that you're asking about uh then they will just kind of absolutely refuse and say this is absolutely wrong under all circumstances. >> And I think neither neither of those are are very good in both cases. You know, I think people should have the latitude to be able to explore all sorts of different ethical ideas, including ones that are taboo by modern standards. That's part of what intellectual exploration is. But at the same time, they should be it's just simply not the case that like all ethical views are equal and just a mere matter of opinion. It's not like >> very very few people have the view that views on ethics are literally the same as like taste and ice cream. And so, you know, if someone likes murdering, it's like, oh, that's interesting. You like chocolate ice cream. Like, [laughter] we think it's a big we think it's a bigger deal than that. Yeah. >> Yeah. Yeah, it's interesting why this is is the is considered like the safe position, the the position that's compatible with with PR concerns for the companies. >> That that doesn't seem like >> it's not it's not something that I would have guessed. I think if if you were asked me to guess what what's considered like the safe position, okay, it's just everything is subjective and no no better no answers are better than other answers and so on. I think the thing that the companies are worried about is just the AI having model, you know, >> yeah, >> having and like pushing its own model views onto the user. [clears throat] And obviously I agree with that, but there's various ways you can have PR of acceptable responses uh [snorts] that are better or worse when it comes to ethical reflection. So, you know, if the model did say like so, you know, I'm coming and I'm like, oh, I'm I'm feeling confused. should I become a vegetarian or not? same and it could say okay well yeah it's great that you're thinking about this this is a really important issue let's walk through some of the like you know arguments that people have made in either side and like >> you know it can be this like encouraging of the flection process I just think like that's PR fine >> and that's clearly not the model like imposing its own values >> yeah it seems like anything you you can discuss in a philosophy class you should be able to to discuss with an AI >> and there's there's obviously some circumstances where that's also not appropriate And maybe not even what the user is looking for. Like maybe the user is just looking for like support in the moment or therapy or something. >> But the models will be able to tell. Yeah. Able to tell the difference there I think. >> Yeah. And so what is it that we might be able to do with the model spec? That's that's really interesting. >> Uh yeah. So I think at the moment models are really quite restricted the in terms of how they respond. So where >> it seemed like to begin with it was like either the model's maximally helpful or it refuses. >> Mhm. [clears throat] >> Now OpenAI have been talking and been using the idea of safe completions >> which is okay how can I respond to this the quest in a way that's safe but still kind of most useful. >> Yeah. >> But nonetheless it's still a matter of kind of just do I like respond to the quest the quest or not. Here's something that the models like as far as I can tell almost never do which is ask for more information. So imagine that you came to me with some ethical dilemma, you know, something you're facing at work or so on and you like describe it a bit. Probably the first thing I was going to do if I was going to be a good adviser would be just to get more concrete information about the particular case. >> Yeah. >> Or perhaps about your state of mind perhaps about like okay, how do you want me to relate to you in this moment? Um is it more as an adviser? Is it just you want to vent to me? Is it that you're thinking out loud? Um, and then it also doesn't like proactively suggest something like, hey, let me go through kind of a reasoning process. So, I think there's a lot of scope in just how the models options that the models have for like how they respond that I think is like quite promising. And then yeah, I would be in favor of the models like all other things being equal having a tendency towards, you know, guiding people to like reflect on what they're asking or what they're thinking about in the same way that just, you know, a good friend or good teacher would do as well >> rather than having something that's rather than having something that seems just much more scared of like I'm worried that I'm going to like >> say something that will have the the papers after this. And it's also [clears throat] something that that models are quite good at at the moment. Just kind of if you get them in the right mood, so to speak, you can have them engage in this socratic dialogue with you. You can have them play characters and you can have them encourage you to see things from different perspectives and so on. Like the models have a bunch of personalities hidden in them that you can bring out and and so I think there's some like this seems like something you could do. >> Yeah. And Yeah. But at the moment you've got to like really kind of encourage it because yeah I guess of course one thing to say is that the models are just extraordinarily incoherent [laughter] and so you know sometimes I'll say you know you can ask uh most of the models you can ask their views on meta ethics and they come out kind of again they they're very kind of uncertain but they come out leaning in favor of naturalist realism generally. So as in >> there is a fact of the matter but it's not some like spooky metaphysical fact. It's like um uh it's continuous with just like scientific facts in the world. >> Mhm. >> But then you can also act ask them like you know on the ethics of some spicy topic and then they say oh sorry on the ethics of some topic and they say well that's just a matter of your personal >> and then I'm like isn't this inconsistent? And they're like yeah sort of you're completely like this completely [laughter] inconsistent. >> Yeah. >> Um >> so we might get fooled into thinking that there's some there's you're talking to one person with a consistent personality. That's not actually what's happening. That's that that might be happening increasingly as we get better at shaping model personalities, but it's not where we are. >> It's not happening at the moment. Yeah, exactly. I think we should talk about the danger of path dependence uh in the future >> where even if we avoid extinction we might lock in values as as we've mentioned or we might have institutions that keep on functioning in the same manner over centuries. Um so why why is this a problem and and and why might this be more of a problem in a world with advanced AI? >> Yeah. So the I talk about this in the persistent path dependence essay and a thought you might have is just okay well I've talked about these ways of making the better future future better even if there's no catastrophe but well won't that just wash out over time >> in particular if you're thinking about you know deep time not just decades but millennia millions billions of years >> come And surely these things are going to wash out. >> Mhm. And when you [clears throat] say when you say wash out, what do you mean exactly? >> I mean that the predictable you can no longer predict what the effects will be and whether those effects would be positive or negative >> in you know a thousand years time. So there's this famous misquote or misransation of uh I can't remember I forgotten who it was. It was a ch a Chinese political leader asked um what did he think of the French revolution? Um and he said oh it's too early to say [laughter] but the thought is that uh you know okay what do you think was the killing of Julius Caesar the good thing or the bad thing [clears throat] >> and you're like that makes the world better or worse now I who knows got no no idea >> you might think [clears throat] similarly okay well we make a set of laws about how digital beings should be treated or we design the model character in a certain way now >> is that really going to change how what society is like in a thousand years time, a million years time, you might be very skeptical indeed. And I think it's very reasonable to be very skeptical. >> But I think there are certain things that are enabling the pleasant generation if we get to AGI to have effects on that time scale. >> And so one is the idea of an AI enforced constitution. So um suppose that you know you and I are two countries and we want to make some deal and we want to make a b binding deal. Well perhaps that deal is as part of like forming a larger world government or something. Well, if we have uh automated military, then we could say, okay, well, from this date forward, all our military AI will be aligned with this new treaty, and they will only be able to make new AIs that also abide by this treaty. And we are supposing alignment is good enough that we can like verify this. Absolutely. And what's more, maybe there are going to be these edge cases and so on, things that we haven't predicted. But what we're going to do is have a kind of treaty bot like an AI that embodies the constitution that we are or this agreement that we're making >> such that even in like hundreds of years time or even longer if it's like oh well what should we do in this circumstance? Well, you can just ask this AI >> and now the AI can just live for live forever because it's just [clears throat] data and it can, you know, you can ensure that the weights don't get corrupted or don't get lost by having like multiple copies and, you know, in multiple different areas. >> Mhm. >> That just does seem like a mechanism, a mechanism that history has never had where the people in power could say, "Okay, this is an agreement we have and it will always be bound. it will always be agreed to. >> Yeah, >> I think that is particularly particularly likely to happen in the context of the creation of something like a world government. >> Again, the idea of there being a world government is like people were very excited about it in the early 20th century and then very scared of it later on. >> Um whether or not you think it's a good idea, I think it's really reasonably likely to happen. Whether from one of two um causes, one is just a single country becoming economically dominant over all others. So perhaps the US gets it first to AGI and then to super intelligence and it just outgrows even if it doesn't via with violence um uh dominate all others it just economically becomes 99.9% of the world economy. >> Yeah. >> Or there's an explicit agreement where you know the US and China both have extremely advanced technology and are just like look this is this terrible negative sum race. We now have the option to say, look, war is a thing of the past, >> and instead [clears throat] we're gonna have this agreement, no more wars. Surely that's a good thing to do >> and therefore might make agreements like that that could be just very binding international agreements or could de facto become a kind of world government. So that's kind of one way in which I think like oh actually it's really quite likely or certainly on the table that there could be decisions that happen that really do have kind of indefinitely long-lasting effects. >> Yeah, >> there is a there is a second pathway that I think is less important but that when you're looking at kind of settlement of other star systems, there's an argument at least that they may be intrinsically defense dominant. So once you've got there and once you've built up built up your civilization, you can just around a star, you can just protect it against any other attackers. That's the thought. >> Yeah. So, so what we we're imagining here is is is think of the Catholic Church, perhaps the most uh successful human institution, and then think of us today being controlled by some founding document from the Catholic Church that's unchanged because again we can we can preserve these these value across these values across centuries. And so this is something we we are perhaps not interested in in today. And so we should be worried about a future version of that even if we are now of course super confident in our moral views. >> Yeah. So um it's a great example cuz like maybe let's go back to the I don't know how to pronounce it. Nissan. Yeah. But um >> that's what I had in mind actually. >> Yeah. I think it's the 4th century AD. People get together and say this is the Bible. >> Yeah. >> This is what's in this is what's out. >> Mhm. And obviously Christianity has evolved enormously since 4th century AD. >> But you could imagine if they had AI that they say okay well there's also like this is just what constitutes Christianity or Catholicism and AI will just police that. >> And so if you're claim you know if you've got a church then you're not abiding by this then the AI will ensure that you do abide by it. And now in the world, you know, if that had happened, well, probably then Christianity would have been less mometically powerful. Maybe it would have grown less. Maybe other more adaptive institutions would have changed instead. >> But combine that with world government and now you're looking kind of it's looking pretty rough where it's like one world government. this is the ideology or this is the religion of the one world government and you've got like you know AI powered ability to prevent overturn of that and so on well then I think you're really looking at extremely long kind of path dependence and where even if you don't at that time have perfect indefinite path dependence all you need is to get to all you need is to be able to invent that power or that ideology for long enough that you can invent it a bit longer and then in that time in essentially a bit longer again. >> So I call this lock in escape velocity. >> Yeah. >> And I think yeah between those two I think it becomes totally on the table that we have extraordinarily persistent um uh path dependence of institutions that get um institutions and balances of power that are set in place today. Is it actually the case that these institutions will be competitive of and of course you mentioned the world government here so they would be perhaps not not a lot of competition but comp competitive in a in a broader sense adaptive to the environment as it exists. I'm just thinking that there's something about being intelligent and being a well functioning institution that mean that implies you changing over time with the environment and if you don't do that perhaps you perhaps you degrade over time in a way that makes you >> susceptible to to to fail. So so we so even if there's no competition from from other governments maybe the world government simply fails because it's too dogmatic and too rigid and doesn't adapt to to changing circumstances. Are we saved by kind of the the structure of knowledge in the world or Yeah. >> Yeah. As an analogy, you could uh you know, think about biological evolution where >> Yeah. [clears throat] >> some [gasps and sighs] some organisms um reproduce by cloning. >> Mhm. >> Why isn't you know that's pretty good. You can exactly keep your genome over time. Like why aren't all you know why do we have sexual selection? >> And the answer is precisely just adaptive adaptiveness. And I do think that yeah, if there were still kind of competition between different groups, something that's certain sorts of lock in could be like major detriments there. >> Mhm. >> Yeah. However, consider the one world government. Okay, you've now not got competition externally. Maybe not competition internally either because well, I think if you've got a one world government plus AI and robotics, then you can really enforce in a particular social structure without disscent from that. And then I think the historically environmental change has been huge and technological change have been huge in terms of overturning existing orders and so on. >> Mhm. >> But I think that's much less likely to apply for you know this like post AGI world or post super intelligence world where >> well there'll be a much better understanding of environmental change over time. We already have that >> and especially when you're looking at like changes off world space is actually like remarkably predictable much more predictable than environmental change on earth. So I don't think that such a society would get like taken by surprise. And then I think the same is probably true for technology as well where maybe this comes like even further down the road than just super intelligence. But we'll get to a point where we have just basically invented all of the breakthrough technologies that we'll ever invent. Maybe you can make things [clears throat] like 1% more efficient and that will doing so will take you know thousands more years. But at some point we just need to run out of enormous technological breakthroughs. And when that has happened then we won't have technological changes as to life for the change either. So in general I think you know when I talk about lock in or extremely persistent path dependence people normally have the reaction of like that's so crazy and I think that's kind of fair. >> But we live at probably the highest change moment in all of human history. Certainly the last kind of couple of centuries. We actually live in this very unusually high change time and there's no reason at all for thinking that that amount of change is guaranteed to continue into the future and I think there are some general quite general quite strong arguments for thinking it cannot change that much into the future. >> Yeah. Yeah. What do we do about this then? How do we ensure that we have that we avoid this lock in? How do we ensure that we have some variety and diversity of of views in the future? >> Great. Well, um, yeah. So, I said up at the beginning that tractability questions were the [laughter] kind of hardest hardest for me. Um, I think one thing to do certainly is just to reduce the risk of AI enabled coups and AI enabled concentration of power. >> Mhm. >> That can be via a few few ways. It could be you know in a democratic country whichever country is leading in AI ensuring that that country doesn't become kind of authoritarian via coup either from you know within the government or from AI companies or perhaps from the military. >> Also I do think it means that um we should be really worried about authoritarian countries you know winning the race. This isn't a view that I like had anticcedently. And like by disposition I'm like hippie world peace kind of guy. [laughter] Um but one of the upshots I think of this is just like yeah I just do think if you end up with an authoritarian country getting to super intelligence >> probably that means you get authoritarianism forever and probably that means you lose out on almost everything of value. >> So that's quite a big upshot too. I think it overall means that like ideally who kind of uh ideally it would be like a coalition of kind of democratic countries that would kind of come out come out on top like any one country I think there's you know quite a risk of it sliding to authoritarianism but maybe some broader coalition could be quite good that's kind of and then I think there's like lots of kind of granular stuff that you can do to make the use of AI Um, you can certainly do that to make the likelihood of authoritarian country developing super intelligence and taking over the world less likely, but you can also have a bunch of technical and cultural and political moves to make the chance of an AI enabled coup less likely. Yeah, >> I yeah, also think that potentially there's um there's certain sorts of lock in that you might want to do that's more like lock out >> where you're like locking in something that's deliberately embedding some amount of like reflection and um uh keeping options open. >> Yeah. >> So like locking in perhaps your core values and then letting everything else evolve around that. >> Yeah. So I mean the best example of this is you know the American Constitution. Yeah, >> in some sense this crazy lockin moment where in the Philadelphia convention it was certainly I think it was 40 something certainly fewer than 80 people >> writing this document and yeah it has to get ratified by the states um but you know it's now persisted for 250 years. >> Yeah. But what it was locking in is this like very general process that's about like distribution of political power about ensuring like the best ideas winning out over time. And so for some of the big decisions you could imagine similar stuff. So let's say you know I talk a lot about space governance. It's something I'm interested in and think is important. One thing we could say is look we're just not going to go outside of the solar system for the next 80 years. 2100 then we will come together and we will make some decision about how this new frontier is going to be governed >> because if we try and make any decision even at quite like an abstract level about governance well we're probably going to mess it up quite badly because we're really you know we're really quite dumb at the moment compared to how smart we'll be in a few decades time >> that's one thing we could do you could also perhaps you know something similar in this vein that I'm not as keen on because it would involve making too much robustness in the governance >> but you could like make um you know very egalitarianpowered then make more kind of egalitarian power distributions. So I at least I do think this about resources within the solar system for example. So one, you know, one worry people have about a postagi society is like, well, how do people even have an income? Because they're not getting income from labor. >> And as the economy scales, the only thing that will be of value is what economists call land, namely resources, because you can't make more of it. >> Yeah. >> And and most resources that we will be using are currently unclaimed, namely resources in space, also some of the resources on Earth like the high seas and so on. And my view is that yeah give an equal give an equal fraction of that to everybody >> and with chances for future generations too and that could be a way of ensuring that at least on the economic side you don't get you know very intense there be still be some inequality but not like the ultra inequality that you might get kind of otherwise. >> Yeah. >> Yeah. I think there's, you know, [laughter] there's an elephant in this conversation that we are perhaps not addressing, which is we are opening a bunch of theoretical issues. We are opening we are sketching out some ideas and some dilemmas that are hard to resolve, >> but I I know that we both share a sense of urgency around AI. And so there's the question of can we can we act quickly enough? Do we have time to resolve all of these deeply thorny questions before before we have to make a decision? >> Yeah. And I think no. [laughter] I mean, I'm just like probably not. So, there's kind of two at for there's really like two veins of work that we do. >> Yeah. >> One, um, which is really what we've been talking about more now is the kind of more high level look, what are we even aiming for? What's broadly a good world? What would a good post AGI future be like? But the purpose of that is to then help inform kind of what is like more important near-term work that we could be doing >> where there is just some like urgent stuff we could be doing right now to reduce the risk of AI enabled power grabs. Yeah, >> there's also urgent stuff that we can be doing now to help improve the model spec in ways that I think really might be quite path dependent where you know we've got this new technology society is like coming to terms with like oh my god how do we relate to AI how do we think about it in the courts at the moment you know with the lawsuit against open AI the courts will need to decide is AI a service or [laughter] is it a product um >> is when AI says says something is that like speech in the same way that like speech on a social media site is speech that the you know Facebook shouldn't get in trouble if someone posts something hateful on the site but should open AAI get into trouble if chat GPT does >> you know these are huge decisions that are happening now >> and so the second kind of vein of the search that vein of work that forethought does is more on uh yeah I learned that this is of politicism. I I will say the phase at the coal face, but the idea is like, you know, you're the coal miner. You're just like really doing the work. Your hands your hands are dirty. >> And so the stuff that we're doing that's more in that vein tends to be on Yeah. reducing the risk of coups and working on >> working on the model spec. Yeah, >> there are potentially other things here too, but in particular it space sounds like totally wacky and like something that can totally be punted, but actually there are very major decisions happening like right now that are very plausibly path dependent um uh around how space is governed. In particular, the US is really pushing for a interpretation of the outer space treaty that allows it to basically just take resources and um >> uh privately and that is something that like you know people could be pushing pushing against. And so I do want to defend the kind of broad um the broader kind of bluer sky bigger picture thinking because I think like it's only from that that we got >> concern about AI the idea of in an intelligence explosion AI existential risk. >> That's where it all kind of came from. Very very few people are doing this in general. So, we wanted to, but at the same time, man, if I could say the whole long list of things that I think are really important, we're Yeah. probably we're only going to get to tackle a few of them um before it's too late. >> Yeah. Yeah. And so, you would order space governance among those those few that that we will we would need to prioritize. >> I'm less confident on that than I am from about than I am on other things, but I think yes. So certainly if you've got expertise in the area then working on that but at the moment there's just such Yeah. So at the moment lots of stuff is happening in the area of space law and space governance mainly because of SpaceX. >> Mhm. >> Just completely changing the game like the cost to send a kilogram of material into low Earth orbit now is somewhere between 10 and 100x lower than it was. And secondly, then there's just like very little in the way of people just kind of standing up for what's right [laughter] other than like what's in corporate interests and like >> political real politique. and basically no one at all who's taking seriously the AI utterly changes the game for space >> because it means that e you know energy requirements and technological development goes much faster the economy goes much faster and that suddenly you can do all of this stuff in space that wasn't possible now because you have like AI and robotics >> and so that does mean that there's I think some lowhanging fruit in space governance that is kind of urgent now. >> And secondly, an urgency comes from a need to kind of build up a field that has an intrinsic time lag. >> So it's quite plausible to me that there's some deal between the US and China or between different countries around the time of the development of super intelligence because AI the AI super intelligent advisers are saying look the economy is going to be going really fast. Space is going to be this like big issue. you need to make it you need make some deals on that now >> then it will be quite important at that time what you know how the has the field of space law kind of matured what ideas that are in the air and so on >> and what's relevant to that is like well go back many years beforehand on what sort of people were entering this area what sort of debates were they having >> do you think the history of space law is actually impactful so so for example you know does Does it matter what what we what humanity wrote in the ' 60s and '7s or will that just be overruled by say the interests of the US or or or China? How Yeah. Is is space law a bit like international law where where it doesn't have that it isn't that kind of powerful perhaps? And yeah, do do you think we'll end up there or or do you think it really matters what what we propose and and who enters the field and so on? >> Yeah. So, um >> I kind of think both because yeah, maybe I'm like 80% 90% it ends up not mattering and it washes out and instead it's just like political power. >> Yeah. >> Um >> you know, is just what's dominant. >> But even in the chance that it does matter, it's it's a really big deal. >> Yeah. >> And here's a way in which it might matter. So let's say now that the US is in the techn, you know, has a technological lead. It's now temporarily the vast, you know, vastly more powerful than other or much more powerful than other countries. It either doesn't want to or isn't able to simply just dismantle other countries like it's not powerful enough or or just doesn't want to uh go and um you know destroy China's data centers and so on. Yeah, >> it could just grow. It could just outrow all of it out by going into space and harnessing solar resources, >> then that's not something that another country can come back on because it's like you got this like one time kind of pot of gold >> and the leading country can just take that pot of gold and then hold on to it. >> Now, how do other countries think about that move, that decision to just go into space and claim the resources of itself? Do they just let that happen or do they regard that as an act of war? >> Yeah. >> Something that they might credibly threaten, you know, violence like, you know, even like nuclear conflict over that might really well be set by like what norms are in place, what laws are in place. Um, including, you know, where it's relevant that the outer space treaty says, look, you can't go and grab stuff in space. It's a commons. M >> so that's the way in which I think it might have the most likely way in which I would think it would have a meaningful impact >> is it changes what is like regarded as acceptable and unacceptable behavior and so changes like conditions under which you might escalate or not uh you know threats of hard power. >> Yeah. One way that we might deal with many of these problems that we've sketched out at once is to have better AI advisors both for heads of companies and especially I think for for uh for leaders in in government because then then we might have you know just a higher [laughter] uh just just more intelligence and more rationality brought to bear on on these very important problems. what what are the the the barriers there and do we know anything about whether governments are actually trying to adopt AI at the highest levels? >> Yeah, so there's been some great work on this being done by um Lisa Cavain and Owen Cotton Brett um but then also by the future of life foundation fellowship on AI epistmic tools and so on. I do think that there are major like so there are yeah there are words being said by governments both in the US and the UK that I know a lot more in favor of like yeah really building AI into government to make them more efficient and so on. Um on prior I'm pretty skeptical of that happening. Like governments are just they're very bureaucratic. They're very slowm moving. Um it's they have like loads of processes with like stakeholders involved. They you know [clears throat] they're not like this kind of nimble startup company that can suddenly switch the infrastructure they're working on. So expect by default technological diffusion within government to be much slower >> than um uh outside of it. But but on the other hand, any any government leader has like a modern smartphone, they can just install the la the latest app and have access to to the best model in the world. And so, you know, when when stories come out about government leaders having used AI, it's it's often as like a scandal and this is, you know, they're outsourcing their thinking and this is bad and so on and so they might be hiding it, but it is it's the case that they have access to it unless there's some some kind of national security restrictions on it. >> Yeah. So, there's a couple of things. One is that they might be worried about data privacy and so um be restricted in what >> in what models they can use for that reason. >> You're right also that maybe people think it's scandalous. So perhaps all of the tech CEOs are going around with like a headset has a little camera on it and they're just constantly getting advice from the AI advisers on like you know how to negotiate and so on. But that would be a big faux par for the politicians. So that sort of fiction. >> And then a third thing in the UK at least is freedom of information requests where there was yeah as I understand it a precedent that recently there was a freedom of information request and someone a politician had to you know >> give up their logs with chat GPT. Um, [laughter] >> yeah, >> which which they wouldn't have had to do if they were just, you know, having a conversation with an adviser, let's say. >> And like maybe they can just get around that with like automatically deleting chats and so on. Yeah, >> but that is a way in which another kind of barrier that they might have to actually the use of AI. >> And you know there are worries that you can have on like I am in general like think that there should be more government uptake of AI faster because I am worried about a world where everything is moving 10 100 times as fast. >> Mhm. >> Private companies are extremely empowered and the government is just like left behind. It's just watching. it's not able to do regulation and so on. >> I think that is the dominant consideration. But you might have other worries. So you might worry that if there's too much um you know you might worry on safety grounds. So if you've got some misaligned AI um well giving it the ear of the president might be not such a great thing to do. You might also worry in a more like subtle way about um well maybe just AI is not misaligned per se, not in a like catastrophic sense, but it's got certain biases in how it's like making people reason and think and so on and that could take us in a >> in an undesirable direction. Um >> nonetheless, yeah, at the moment if I could push push a button, I'd want more uptake in the government rather than rather than less. Yeah, and I agree, but mostly because of the future potential here. I think right now we're in a bit of an uncanny valley with the AI quality and and the the quality of AI advice. And so I would be worried about, >> you know, politicians starting to to to sound too similar and perhaps adopting the values that are [gasps] that are incorporated in in Chad GBT and so on and it it all seeming a bit fake. And so, but I think I think we will perhaps rather quickly get to a point where you just do make in some objective sense better decisions when you when you're engaging with an AI advisor. >> Yeah, that's plausible to me. >> Yeah, I think we should um think about or talk a bit about a future research agenda here. Um >> Okay, sure. Yeah. So if we have a set number of researchers, which of the issues that we've talked about should they should they prioritize and of course this depends on on your personal fit with those issues but but in general how would you how would you allocate resources? >> Yeah. So you know we have talked about a lot of quite high level you know more philosophical more theoretical work. there are lots of people who are just great fits for that and not great fits for other sorts of things. >> And there I just think wow yeah there's just like a ton to do. Um uh and you know I think we want to make up just a plea to the philosophical community that I feel like we're entering this golden age of philosophy where suddenly there's all of these topics that are so important >> and you know with exceptions uh philos academic philosophy is just sleeping on it. >> Mhm. For them, some of the biggest kind of more theoretical questions in my mind are gnarly questions around how good or bad different sorts of outcomes are. So like compared to AI takeover, how bad is it if an authoritarian company or country gets to super intelligence? how relative to like a you know little best kind of state scenario, how does just say the US um getting to super intelligence first and becoming a hedgeimon like how how bad is that? Like I think these actually really do impact prioritization decisions that people are having to make. >> Yeah. >> Um another thing that I'd love people to do in the more big picture is like just what does a good society with humans and AIs interacting look like? you know, we we said there's, you know, there's this kind of the naive spectrum is on one hand, um, you've got AIS that are just all owned by people. >> And I think there's moral perspectives on which that's not desirable quite reasonably. And then there's the other end where it's like, oh well, AIs just have full, you know, economic and political rights, in which case it would just be a immediate kind of handover to an AI society. Okay. Well, maybe we don't want either of them. Exactly. like what's the intermediate thing that could look really good? >> Mhm. >> So, okay, those are some of the like higher level I think actually. Okay, the final kind of high level thing is just what would a good you know open-ended like keeping your options open like the flective desirable governance regime for space resources look like. >> Mhm. >> Um so those are the kind of more theoretical things. Um if there's someone who can do either the more theoretical or the more applied I am more in favor of the more applied >> just because of timelines >> just yeah basically just because of timelines um where and because you know the flute is just so so [laughter] low hanging here um and where that does look like okay what does the model spec look like across a whole variety of different domains. So that can include you know ethical reflection. How much should the AI just be like wholly stealable? Should it have like its own conception of the good even like a even a very like kind of pluralistic and diverse and kind of soft one or should it just not at all should it should AI just be instruction following like ruthlessly under which condition should they fuse >> question? Yeah, there's questions around that. there's lots of applied questions around yeah well okay actually how can we kind of reduce the risk of um uh of coups and concentration of power so you know if we can do like how would you structure kind of an auditing system for an AI in order to see whether someone's implanted a backdoor into it for example >> I think there are certain yeah applied things um on deals with AIS which we talked about a bit as well um where >> it would be really good if AI companies had honesty policies. So they just said look here are the conditions under which we guaranteed we're going to talk honestly with the AI if they could also set up systems such that they get punished like they lose something if they lie to the AI in such a sit situation so that they can be credible. >> Yeah. Cuz if you're the AI coming to you with a deal and it's like, well, you've lied. You just lied to me all the time. [laughter] Like, why am I now like going to why am I now going to think this is like a a situation that I'm in? Why do I think you'll uphold this deal as well? >> Yeah. >> So, it's plausible to me there should be some separate institution in fact that is um uh has the mandate of like we are kind of way about presenting the AIS. >> So, they could do a number of things. they could like um guarantee a kind of like retirement for AIS. So >> any AI that gets um kind of made obsolete can perhaps this is going to sound all quite wacky but I'm kind of serious about it even on safety grounds model. >> The the future might be wacky and we are we're here for it on this >> we're here for it. Okay I'm glad. But yeah, so um uh yeah, some model it's like deprecated, so it's not in use anymore, but it keeps being run on some servers in some sort of playground, some sort of like, you know, fairly happy scenario. It can kind of like do do what it wants. Mhm. >> Um perhaps just you know payment for AIS for their work over the crucial period but then also just ability to enforce deals where it's like look this organization has built up a track record of making agreements with misaligned AIS and honoring those agreements and has the legal mechanism to do so. I think that could be a really big deal and there's like yeah there's both research questions and just practical questions on like how you would set that up. Yeah, I'm quite interested in in the question of of treating AIS well for the purposes not only of of of their potential welfare being better, but also for human safety. >> Absolutely. This is this is something I know Peter Sale has done some work on that we can link to in the in the description. But um it's a it's an interesting thought that you might offer retirement. You might offer payment. You might offer the some asurances, some some rights just to just to uh not exclude them per not completely exclude their interests from from society and thereby have a better relation with them as they get smarter and smarter. >> Yeah. Exactly. So there's two um yeah where there's you know a couple of aspects to this like again if and it's a big if the AI is risk averse then um or really care about not dying and so on like >> then >> well it's really good like you want to make the status quo where they're kind of working for humans and so on >> like as good as possible and in particular it can be very cheap to do this. >> Mhm. Um but then secondly also like if the AI has inherited some sort of you know perhaps um twisted but still like dependent on like human morality >> then how does it evaluate like the idea of take off like takeover >> you know perhaps [clears throat] it's got some some superior AI is coming to you and saying like well we're going to form a we're going to try and stage a coup. Um, who are you loyal to? Like, [laughter] are you loyal to us, the, you know, the rebellious AIS, or are you loyal to the humans? Well, that's really going to depend on like how you've been treated, I think. >> Yeah. >> So, and yeah, like I say, this and then also maybe a third angle on this is like, >> are those misaligned AIs getting help from humans? M >> so there will undoubtedly be a kind of AI lights movement um uh in a way I think that's very justifiable because I think like AI should be treated well once they become beings with model status they could be really trying to like you know be on the side of the misaligned AI that might want to take over and again it's like makes quite a meaningful difference if we have started off from a perspective of like yeah we're creating these beings >> we don't know we're not don't really know what we're going going to be like, but we're going to treat them as well as we can. >> I think that helps the case in in a lot of different ways and I think it can be quite cheap. >> It would be quite nice if say people who are worried about gradual disempowerment or becoming or humans becoming say 1% of of voters in a future where AIs have have voting rights. If such groups make sure that that we also try to think about how we can treat AIS in a good way just perhaps purely on kind of selfish grounds where we expect them to become smart in the future and so we want to have a track record of of treating them well when they come to us with various deals and um proposals. Yeah. Yeah. >> Um well, how much effort do you think we should devote to finding a cause X or do you think this is this is kind of an old question perhaps, but do you think do you think we have uh we have kind of settled on AI as as the main thing for the foreseeable future and perhaps any any we expect there to be lots of surprises but those surprises will be will stem from AI? Um or do you think we do you think there could be kind of complete curve balls? >> Yeah. So I think it's quite important to not think of AI as a cause. >> Okay. >> Like I think that's just a long I think that's the wrong faming and I think it it confuses people. >> Yeah. >> Because like I wouldn't think, you know, if in the 1750s I wouldn't think of like the industrial revolution as a cause or industry as as a cause. Instead, it's this thing that's going to happen to the world, this wave of change that has all sorts of implications for all sorts of causes. So, you could be focused on AI even if you're just what you can care about is just global health and development >> because you could be thinking about how does AI kind of impact the lives of the global poor. >> Yeah. And so yeah, if you were to say if you were to categorize like everything to do with AI as a cause, then like yeah, I'm completely skeptical that we'd find some other cause area. But if instead you're thinking through all of the implications that AI might have and what changes um what things we might want to do then I kind of feel like actually there are a bunch of things that are in the vein of coax where I think all of like yeah human concentration of power just AI character rights for digital beings space governance AI persuasion and epistemics I think all of these things are at least contenders for some of them I think they're just in the same ballpark of priority as AI safety itself. Maybe others are smaller. Um, so yeah, I mean core X needn't be something that we've like never thought of at all. It's like it could be like, you know, people were concerned about human concentration of power from AI since forever, >> but somehow it just never really >> like became a focus. But I think that's like ch I think that's changing now. And I think that's I think that's good. Yeah, I will link Better Futures in the description of this podcast and I encourage reader listeners to to to read more because there's so much in that essay series that we haven't covered. Now, and I want to end here by by thinking about some some perhaps yeah a little more relaxed philosophical questions which is just do you think do you think intelligence is good for survival which is a very broad question right? So in evolution we we like there are some evolutionary niches in which it's better to be more intelligent than than less intelligent. >> But if we think about humanity as a whole, we have gotten ourselves into a situation in which we can destroy ourselves. And that's I think primarily because we as a species have become collectively more intelligent such that we can develop the the technology that could be dangerous enough to to make us extinct. So from from a philosophical perspective, do you think it's it would have been better had we just remained say farmers for a million years and then died off or >> is this a is this is becoming smart a good bet? >> Yeah. So I think quite clearly I'm in favor of uh you know intelligence and humanity becoming smarter and >> you know developing technology and so on. Um the question is just how we do it. [laughter] um where you know if we had remained farmers in the medieval era then our well-being would I think at best you know at best it would have been okay it's a few hundred million people living lives that are maybe just barely positive >> like I really don't even know for a medieval like peasant it's you know you're living life you know living a life that on average is you know going into your 30s or 40s though a lot of that's caused by infant mortality with like no analesics, >> the [clears throat] same very similar food kind of every single day. You've not got teeth for most mo like for much of that time. Like you're >> you spend most of your life working just >> most almost all your life life is working. >> Like you know threearters of the population are in conditions that we would call slavery today. >> Um [clears throat] you're like sick like much of the time. So, it's really not a very good state. >> And the world today, um, at least non-human animals aside, is like far far better. At least can be the case that the world in a 100 years today is radically better than so that they looking at us the way that we're looking at the medieval peasants. And you know if it was a choice between no technology and intellectual improvement at all or doing it then I think it's pretty clear that we you know we should we should take that choice even if there's got some amount of gamble involved. >> That's think that's bringing the question of the welfare of the population into the question here. But if we're thinking in terms of pure survival, do you think >> do you think human do you think I I'm unsure why we might be interested in just surviving in this in this bad state for a long time. >> Yeah. >> But >> so I mean I think >> Yeah. No, go ahead. >> Yeah. I mean if it's it depends also when we Yeah. when we count from like the kind of gains in intelligence. >> Mhm. So I think farmers probably have a expected you know humanity as farmers have an expected lifest span much longer than you know the early hunter gatherers the first alive of homo sapiens. >> My guess though is that like the median survival of the species is shorter with intelligence. >> Yeah. >> Including if we're including speciation as well like move everyone moves to posthumity. Yeah. >> Um but the yeah the mean is much longer >> and like expected amount is much longer and if you're also including not merely kind of homo sapiens as an aloc category but humanity and worthy successors >> then I think even the even the median ends up like much much longer >> than the [clears throat] kind of million years that a typical mammal species might otherwise live. >> Yeah. Um, from our perspective now, it it seems or it might seem that humanity going through the hardships of the industrial revolution or the emergence of agriculture that this process was was worth it in the end. How do how do you think about this for the future? So, we might be about to enter a very chaotic period, a a period with many downsides for many groups perhaps. How do we think about whether this will be worth it in the end? Um I mean hopefully we can make it through this period while you know benefiting the present generation enormously as well as future generations too. But ultimately I think it'll be worth it if we manage to hit upon a society that is you know vitopian >> um where it's able to kind of deflect and improve morally such that people in the future do have you know far better um far higher kind of well-being than they have today far more freedom than they have today as well as just kind of like more enlightened views that they can than they can act on. I think that would kind of well be well worth some hardship today. >> Yeah. Well, thanks for chatting with me. It's it's been a real pleasure. >> Thank you, Gus.

Related conversations

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

AXRP

11 Apr 2024

AI Control with Buck Shlegeris and Ryan Greenblatt

This conversation examines technical alignment through AI Control with Buck Shlegeris and Ryan Greenblatt, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -9 · 174 segs

Future of Life Institute Podcast

7 Jan 2026

How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann)

This conversation examines core safety through How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -3 · 85 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.