Library / In focus

Future of Life Institute PodcastCivilisational risk and strategy

What Happens After Superintelligence? (with Anders Sandberg)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through What Happens After Superintelligence? (with Anders Sandberg), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 98 full-transcript segments: median 0 · mean -2 · spread -26–10 (p10–p90 -10–0) · 2% risk-forward, 98% mixed, 0% opportunity-forward slices.

Slice bands

98 slices · p10–p90 -10–0

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 98 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyflicore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video xGM4sUEElCY · stored Apr 2, 2026 · 2,660 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/what-happens-after-superintelligence-with-anders-sandberg.json when you have a listen-based summary.

Show full transcript

I think we could take a lot of ethical advice from smarter entities, but we might also want to have a debate with them about it and actually shared understandings. You actually want to weave our preferences and our discourses into this system in the right way. Ideally, we should become a kind of cyborg civilization where we both have super intelligence guiding and coordinating us. If you're below a certain error threshold, you can combine errorprone processes in such a way that you get a new process that has a much lower error rate. I believe something like this might happen with AI. There is a kind of transition in reliability. But once it's reliable enough, you could make this redundant system make the reliability go up enormously. Welcome to the future of life institute podcast. My name is Gus Docker and I'm here with Anders Sandberg. Anders, welcome to the podcast. Thank you for having me. Could you say a little bit about your background? So, I'm usually presenting myself as an academic jack of all trades. I started out studying computer science and mathematics. Then I took a course about neural networks and kind of fell in love with the brain. So, I took neuroscience courses, psychology courses, a bit of medical engineering. And then I ended up in the philosophy department of Oxford University at the Future Humanity Institute. So these days when people ask what I am, I say some kind of futurist, philosopher, something something. You have this wonderful manuscript called Grand Futures, which is last I checked,400 pages where you dig into the physics, the economics of all the kind of different paths that humanity could take and become a much larger presence in the universe. Could you could you say us what's the status of of that manuscript right now? So for a while the manuscript had been resting because I needed to finish another book liberty and leviathan human autonomy in the era of artificial intelligence and existential risk which feels a bit urgent. We kind of need to figure some of those things out. So the manuscript have been resting on my sofa kind of waiting for me to finish with that lightweight 600page volume. But the nice part is of course now I learn how to write better and science has kept on advancing. So I've been piling up references and things to add. So it's not like grand futures is going to be how the world was back in 2023 when I started on something else. It's rather now I'm rebooting it which is also very useful because now I have many more people who can help me actually check that what I'm writing is correct or at least plausible and perhaps you have assistance from AI at this point. Yes, that was fascinating when I started chat GPT was not a thing. I've been working on this for quite some time and during the process I tried sometimes okay can AI help me develop this paragraph and the first few attempts were very hopeless okay that's just nonsense and then it became annoying because yes that's an answer I don't trust it at all indeed it turned out to be totally wrong full of hallucination took longer to figure out what reality was than actually making use of it indeed I discovered that quite a lot apparently straightforward questions were surprisingly confusing both to AI and to me. But over the past few months, it's gone from okay, it's not useful at all to it's actually helpful. I wouldn't say that I'm trusting all the results, but uh it's very good at digging up obscure literature'ses saying I have this question about some wig domain. Is there a name for this? it finds the name of a domain and then of course what are the main papers the then I can start reading up on that and once you have a terminology suddenly you have a pretty useful assistant so I'm looking forward to this brave new world where my me and my human assistants are having AI assistants to amplify the ability to both write but also fact check and style check and develop these ideas I think one starting point here that could involve both Grand Futures and your mo your the book that remind me again of the title of the of the other book Liberty and Leviathan. Yeah, exactly. And that book is to kind of ask a concrete question. So on this podcast and in in culture in general, we talk a lot about when we will reach AGI, when we will reach super intelligence. We try to forecast that question and we talk less about what happens after that point. So for the purposes of of this discussion, we could assume that we reach super intelligence in 2030 and then see which paths we could take we could take as as a species from there. So of course this involves physics and economics and sociology I think to to quite a large degree. But where would you start with trying to to to forecast what happens after super intelligence? Assuming we have enough alignment and that it goes well enough so it doesn't just doesn't break the world. And I think one should recognize that even if AI in itself is pretty safe, you might just shake the world apart at the seams. If it just amplifies human ability to do things without amplifying human ability to coordinate about safe and sane things to do, then you probably end up with a lot of obvious things that are going to be pursued with great seal and intelligence. So right now we have an issue with energy in the world. We need better energy sources. We also need to avoid messing up the environment and we have a lot of insecurities that are quite important like food. And it's pretty clear that we're going to apply a lot of AI power to that. Maybe AI is self-directed agents. Maybe it is just tool AI or an ecosystem that humans are setting agenda for. But it's pretty clear that we're going to be pushing for material wealth and welfare. After all, peace and prosperity is kind of one of the key drivers for most human economic activity and we should expect that an AI empowered economy is also going to do a lot of that. So one of the first chapters in grand futures is basically okay how much material wealth can be achieved and when I started on that chapter I believe that okay this is going to be relatively straightforward to write about because it's going to be about the physical limits to manufacturing and energy production and to some extent how much they are compatible with living on a finite planet. It turned out that there was quite a lot of weird economic sidetracks and indeed even psychological sidetracks that made that chapter much more unwieldy and much more exciting than I had expected. Mhm. And what were those side tracks? So when we think about a world of enormous uh wealth at first people would imagine golden palaces and as much food as you can eat. That's kind of what our ancestors would have said oh that's what to go for. They would of course also have said oh I want as much fat, sugar and salt as possible. That's the dream life and we kind of know actually that's not very good for us. But certainly being able to make any material object that is useful is something we want to do. We want to solve the fundamental manufacturing problem. This is where I think Eric Drexler's vision about molecular manufacturing and atomically precise manufacturing in the long run which might not be terribly far away if we get super intelligence by 2030 is actually going to transform the world. Even if you say okay that's unrealistic we're just going to have 3D printers and robots and biotechnology that already produces a world where you can get most stuff relatively cheaply and easily. The really interesting part, however, is that when you think about wealth, if we think about the lifestyle of the rich and famous, it's not that much about that we eat enormous amount of food and that they have a lot of cars. They certainly have a lot of cars, but the food is fancy food. And although the villas might be large, it's more about that they're elegantly furnished and they have a lot of people waiting for them. That is, of course, what we actually mean by wealth. These days we're very quite postmaterialistic in the sense that we want services. So we want that massage. We want to have that personal coach. We want to have those assistants uh doing whatever they do to keep our wealth flowing. So the other part of wealth is of course actually services. Now the good news is if we actually manage to get the AI to work really well, we're going to get services very very cheaply. So that would mean that we could all have that entourage on of the robot software servants giving us the massages and managing our bank accounts and doing uh the legal stuff for us. That sounds great except of course you get both weird side effects. The bad guys are going to have endless lawyers in the cloud and going to sue everybody. You better have your own personal digital lawyer to resist getting frivolous lawsuit. There's going to be a lot of stuff going on behind the scenes that is very complicated. But the really interesting part is of course in this world who gets to have that villa right there at the edge of the peninsula in Lake Ko that has the best view. Doesn't matter that we could manufacture a lot of such wheelers. There is only one spot that has that view. There's still going to be some things that are kind of a zero sub game. Even worse, there are social here some games. Who gets to be coolest at the party that of course we might have different views on that. Maybe the music star think that they look the best and I think I have the most interesting academic publication. We're both very smug at the party. But there are things that even in the world of material and service post scarcity are going to be limited. And then we get to the final most interesting part. Why do we want all this stuff? Well, people would say, well, it makes me happy. And that happiness is actually the real resource we want. And of course, philosophy has been going on for thousands of years about that. That's what we should be aiming for. Not getting as rich as the Athenians, but uh we should be content with what we got. How do you reach contentment? Maybe you can have your AI life coach give you really good advice. But there's probably also aspects of brain functioning and psychology not just healing us so we don't have depressions but actually coming up with better ways of being very very happy in an effective way rather than drooling in a corner in the opiate enjoyment but actually being able to have a really fulfilling life. So that gets to an interesting aspect aspect of post scarcity actually bounding our enjoyment of the world and maybe that means that actually we don't want those gigantic villas and the endless entouragees of robots. We're going to be instead living a simple but very very happy life. I think we're going to see a mixture. We're going to want to have both a golden palace and the good spiritual contentment. But this is going to be a tricky thing. It's not going to be that trivial to solve even perhaps for super intelligences. There are, you know, there are quirks of human psychology that means that we are very concerned about hierarchies and status hierarchies. And these games are, as you mentioned, you know, not everyone can be the coolest person at the party. And so is there any way to intervene in our psychology such that such that we can all we can we can steer away from zero sum status games. I think there are ways there are certainly some people around that are not in zero sum status games. There are the nice people who don't care about these things and presumably since that is an aspect of how the brain works in them. We could replicate that. We could imagine that we study the these saintly people and figure out a way of becoming saintly. There are interesting issues here. One is of course figuring out how how it works. But I think that is something more advanced psychology and neuroscience would be able to do. We know for example in psychology that there are two different kind of status. There is dominance. The kind of bullying status you get prestige. you're really good at something and people admire you for it and both of them are a bit rewarding. It feels very nice to know that you're very good at what you do or for that matter to bully people and feel that I'm on the top of the pack. However, how we react to that already when dominant people fall from grace when they lose some of the dominance people typically kick them suddenly the underdog strike back. It's not very fun to be a formerly popular bully while high prestige people people generally like and even more importantly it seems to be this rewarding thing that drives us to want it. You could imagine removing that altogether maybe some form of gene therapy or brain manipulation a world where people don't strive for social status a very would be a very humble world and it would probably work extremely different from ours. It's not even obvious that we would want that because that gets to the other really tricky point when thinking about enhancing humans. I worked a lot on the ethics social impact of the cognitive enhancement. People are generally very happy with thinking about oh learning languages better memory yes staying alert longer yes really useful becoming kinder. In one survey, only 9% wanted to take a hypothetical pill that made them more kind to people. Perhaps because people are afraid of being screwed over by the world. You don't maybe you don't want to become kind if you feel like you thereby open yourself up to attacks from from a cruel world. I think that is one part of it. But this actually came through or this view of people not wanting to enhance things that were fundamental to the sense of self came across for many other things like empathy. And uh I think in general what happens is we're afraid of not being ourselves. I I might on one level want to be a kinder person but it would also change my personality a bit if I took that pill. So I'm a bit afraid of doing that. that it feels like my kindness might be closer to who I want to be than my ability or non-ability to speak French. So I think there is this interesting interplay that even if we could offer some enhancements, it's very likely that many people would be reluctant to take them. Of course, we might imagine a society where people say actually it's so important given the 10 tremendous power we have thanks to our technology. We need to be more sane and more kind. We can't allow unstable crazy people to wield these technologies. So you might see some very interesting social struggles about exactly how much therapy and adjustment do you want your citizens to have. And this is going to be politics for the 21st century I think. Yeah. Yeah. If we return to the question of the physics and economics of of a post super intelligence society. So, so we what you've been discussing here is a bit of of the economics and perhaps what we'll end up doing if we uh emerge into some kind of postcarity state. Another question I'm interested in is in in grand futures you investigate kind of what can happen on very long time scales. If you think about the hypothetical in which we have super intelligence in 2030, how far do you think we can get in in by 2040? just because there's a tendency I think to think that if we get super intelligence we we are we are there are no limits anymore but of course even super intelligence is bound by natural laws the laws of physics and so on so how much can we expand and what becomes possible in in a decade with super intelligence that is a very good question because the limitations are set by the material conditions of the world and the physics of computation So first it's a bit unclear how much energy for example it takes to run a super intelligence right now people are going on endlessly about how much energy consu is consumed by data centers I think it's a lot of energy but it's not that in a key problem because the real question is how much good decisions do you get out of the given amount of intelligence and that's probably not scaling very clearly with energy and once you get super intelligence because you're probably going to get it to find ways of optimizing its energy consumption. We know a human brain runs in about 20 watts of power. So we know we cannot least get that much intelligence out of 1 kilogram of matter and running at 20 watts. But the real problem is of course building a new data center today takes a few years. You need to bring uh the funding in. You need to get approval for the plans. You need to set up the foundations. You need to build the building, install the service, they need to be shipped over from Taiwan or elsewhere. You need to test it out. How much of that can you speed up if you had really, really smart systems? And it seems like some of these processes are materially limited. If you need to ship something on a boat, that boat is not going to move faster just because the entity that ordered the chips happen to be very smart. Of course, the very smart entity might be a very impatient entity and might say, "Okay, I'm going to ship it over by air or I'm going to invent that cargo sephiline." But now again, you have a problem. Okay, I invented a cargo sephiline. I need to get it approved. The Federal Aviation Authority might have opinions about that. It needs to be tested. That's going to slow things down. And even an AI that is amazingly good at getting bureaucrats to do what we're supposed to do fast is still going to be limited by that. Even if it could just ignore bureaucrats and just sense the robots to do things, there are still limits on how much you can move around without messing up the environment. So, Earth has this interesting property that right now we're consuming a fair bit of energy and then it turns into waste heat. And waste heat is a very minor problem for us right now. We're certainly concerned about climate change, but that's because we put carbon dioxide in the atmosphere and it changes the greenhouse effect. But the waste heat from all our servers and cars and uh gadgets that just is taken up by the atmosphere and radiated into space. If we were to increase our energy consumption by a factor of 100, we would get heating no matter what we did with the carbon dioxide. Even without any carbon dioxide, even if it was powered by magical unicorn rainbow power with no other environmental effects, the waste heat for 100 times more energy active humanity would actually start heating the world by about one degree. So there is a limit on how much you can do without starting to overheat the earth. Similarly, if you remove large amounts of mass around, you are going to have problems with the environment. So keeping earth earthlike requires staying in some limits. These limits if you have really advanced technology and are really smart about how you use it are of course much wider than we normally think about in the environmental discussion. The typical environmentalist discussion tends to assume that we need to do less things because that's the only way of saving the environment. If you actually look at efficiencies etc you realize actually more high-tech things can quite often be much more im impactful on the environment if you do them carefully. The problem is however that once you start scaling up a civilization you need to be more and more careful. So in the long run I think you will want to put the super intelligence data centers in orbit. uh people are already discussing that in Silicon Valley and right now I don't think it's going to be cost effective because cooling in space is much harder than on earth. We are kind of cheating by our waste heat turning into heat in the atmosphere and then it's radiated away from the top of atmosphere. If you have your data center orbiting the earth now you need to have radiators sending it out in the coldness of space which sounds really good because space is really cold except that you need to do this as radiation. the convection from my computer and just moves heat away from it very efficiently. But if I need to radiate it, it would suddenly be hard to cool it well enough. And of course, the problem for that data center is subject to a lot of sunlight from the sun. Great for the the photo volics power it, but also heating things up. So building things that work well in space is tricky. Yeah. Could you go over that again just so I understand why is it that it's more difficult to cool down a a data data center in space given that space is is very cold. So on earth the the data center is going to probably be connected to a cooling tower or maybe a lake of cold water and the waste heat moves into the air and water and even from the water into there and eventually ends up at the top of atmosphere and gets radiated away into space. So, we can use the entire top of Earth's atmosphere as a gigantic radiator sending away this radiation in space. If I have my data center there, I need to build my radiator. I can't just have the heat waft off into the vacuum of space because there is nothing there to conduct the heat with. And the big problem is there is something called the Stefan Boltzman law that tells you how much heat gets radiated away. And it says that it grows with the fourth power of the temperature of your radiator. So this means that a very hot radiator is amazingly effective at radiating away the energy. The problem is a cooler radiator is less effective. And this is quite often a problem. For example, the International Space Station has this issue. It has sections that are full of people and they need to be about 20° centigrade. And now you need to keep them at that temperature and radiate away the waste heat. So now you have a radiator that is about 20° centigrade. That is a very wussy radiator. It doesn't get much energy out. There are other radiators on the space station that are actually cooling some motors and compressors that are much hotter and they are much more effective. They can actually be smaller than the radiators you need for the human section. So now we have to imagine this data center in orbit. Its big problem is that you normally don't want your ships to be too hot. That's kind of bad. You want them to be really cold, actually, but the colder they are, the worse the radiators become. And so something that's quite concerning I think is that it will be tempting to build data centers on Earth even though the more we build the the more these data centers take up of the earth's surface and the more the say solar panels cover the earth's surface the less compatible earth will be with with biological life with with human life. So there are in some sense biological life is fragile and we require a quite specific environment in which to thrive but server farms computers can thrive in a wider range of conditions. Do do you think this is is a kind of fundamental rift between the interest so to speak of a super intelligence in in expanding its own power and humanity's interest in staying alive. My friend Forest Landry has made the argument that in the long run a technosphere is always going to out compete the biosphere because it has this wider range. And I think it's an interesting argument. I don't buy it straight away. I think it depends quite a lot on well who's in charge of that technosphere. How does it actually work? What is the economics and goals guiding what is getting built? The reason data centers are so general is that we design them right now to work in different environments. So the typical data center well that's a big building where people in in short sleeves can go around and repair the service. It can't be too hot because then it's going to be ineffective. But there are other people working on data centers that actually literally have in a container on the seafloor. They have different performance but they are designed to do that and we can imagine constructing data centers that are intended for the Arctic or tropical jungles or space. So there is a design issue here but that design thing is of course a fast transition. The evolution has taken millions and millions of years of coming up with organism that can thrive in different environments on Earth. But in this case, you just plunk down a bunch of engineers in a meeting room and say, "Ladies and gentlemen, we want to build a data center that works very very well in the Tacklacon desert and they start designed signing away and within a few months there is a bunch of servers standing in the desert." Now that rapid ability to change is very different from natural evolution. I do think that we are going to intervene in biology too. I don't think there is anything sacrosanked about biology. So we could probably speed up evolution by doing technological evolution. Indeed synthetic biology is a good example. And in a world with super intelligence, it should be fairly easy to redesign biology. But that doesn't mean that you can just redesign our ecosystems to thrive if there is loads and loads of data centers heating everything up. That's not how we're going to solve it. In particular, because I think we don't want to transform things too much. We have this conservative view that actually the biosphere should probably be green and blue and it shouldn't look totally alien. Uh there's probably going to be some pretty interesting environmental discussion about repairing envir environmental damage we have done in the past if we get rich enough and flexible enough to do that. And now we're going to have a struggle. Should we restore a large part of the world to how it was before you arrived or maybe even earlier or should it be some kind of parkland? You have a lot of options and these choices are going to be big contentious political issues. But the deep issue that you pointed out is interesting. Technology is more open-ended because it's guided by intelligent action and intelligence can decide on jumping to somewhere else in state space. Biology works by evolution and learning. Organisms change very gradually. If there is no easy path that is always better for your fitness, a species cannot evolve into another species. But of course in technology we might realize that actually we need to switch from lighter than air transport to heavier than air transport and let's construct an airplane that doesn't work at all like a balloon and maybe we want supersonic transport. Okay, let's mutate that in airplane not just in the sense of doing gradual changes but actually radical redesign. So this ability I think means that in the long run the world is going to be guided by intelligent agents and what they do rather than this natural evolution except that a lot of these agents are of course competing and copying and cooperating with each other. So there is a form of evolution going on but it's in the cultural space rather than in the biological space and that means that of course the world becomes a cultural artifact for the civilization living in it. It's it's an interesting discussion to think about what is it that will actually determine the shape of the of the future here. Is it is it say the fundamental limits of of physics as you explore in in grand future or is it is it more determined by culture? This is this is uh I mean my my kind of naive guess is that culture is what matters most in the short run whereas perhaps physics is what matters most in the long run such that culturally we will determine what we'll do say from 2030 to 2040 but ultimately over hundreds or thousands or millions of years we are probably if we survive as a culture we're going to explore various limits of of physics in in a bunch of different directions. Do do you think that view is correct that that culture is more significant in the short run? I think it might be significant both in the short run and long run. We're certainly limited by the laws of physics and they are much nicer to think about for me because I can say things in a more rigorous way. The light speed limit there are kind of profound reasons why you can't move stuff faster than the speed of light. uh the laws of thermodynamics are true for us mere humans and for future super intelligences even when you cheat about them that those cheats have their own costs there are limitations and they place boundary conditions about what you can do in the future so I can make a very confident prediction that in a million years then the intelligence from earth is still not going to be in Vandrome the galaxy because it's very very unlikely that you can break the light speeded limit so even those super intelligences, even if they were racing to Andromeda, we're still on the on the way getting there. But why are we going there? That's a cultural question. It might be because it's a religious goal. It's a pilgrimage. It might be because it's an art project. It might be because we're competing fiercely for all the resources. And this is culture. And it's extremely open-ended because it has so many degrees of freedom. If we look at current society, a lot of the activities we're doing are not terribly bounded by the laws of physics. Certainly, I need to pay my energy bills, but they're not enormous. I don't spend a lot of time working per week to just get fuel. I don't need to pay spend that much of my resource on that. The amount of resource I spend on getting food to survive are pretty minimal. If I had been a hunter gatherer or if I been a kind of prehuman living in the forest, I would have spent essentially all my available time on that. Now this has changed because as we get richer in this material sense we are able to use the other resource for other things that we care about social interactions, intellectual work, spiritual work and the end result is of course that the activities we do becomes much more diverse. A colleague of mine, Karen Jabari, wrote that really interesting article about whether taking backups of civilization by making refuges in the case of a disaster, if that makes sense from a survival standpoint versus actually backing up the unique part of our civilization. And his point was if you look anthropologically at societies, all human societies are full of their own cultures, but you can get much more culture and it's much more unique the more well off you are. Hunt gather societies are certainly different because of the environment, but they're very constrained by the environment. You can't actually lug around a temple and that library if you're nomadic. If you're having an agricultural society, you can start building your temples and libraries. And now you have more options for different styles. But typical bronze age civilization will make pyramids because they're fairly straightforward to make. Once you get to Iron Age civilization, the options become bigger and bigger. Your civilization becomes more contingent. So I think that a super civilization where you have super intelligence and reach the material limits might choose extremely different things. It might be that it's all very green and it's aiming at preserving life or spreading life across the universe. It might be that it cares about welfare of beings and it's working very hard on making sure all entities it can reach are happy. or it might be doing science or competitive economics or all of the above at the same time in some complicated political framework. There's a sense in which if you if you look at history over the very long term or say say from from the emergence of humanity as a distinct species until today, you get this sense that we are expanding and that the pace of change for for humanity is accelerating. And perhaps you you get this slightly religious feeling that we are we are being pulled towards some destination where we will fulfill our potential. Now that is that is in some sense teological thinking and I'm not I'm not sure how how rigorous that is but do you think in some sense we are being driven to expand by forces we don't fully understand? Do you think do you think we have some kind of cultural evolution where those institutions and those the people that drive us towards more growth, more expansion tend to win out simply because they control more resources such that civilization or humanity, however you want to say you want to model this, is is pushed towards more growth and a faster pace of change. I think it's a kind of ratchet effect. On the micro scale, people are running around doing all sorts of things for all sorts of reasons. But of course, people are people. Most of us have somewhat similar goals about survival, social recognition. You can fill in the mass of hierarchy of needs. But we're also very different. But from that, of course, you get patterns emerging. There is a reason economics actually make sense. Supply and demand is a pretty solid finding. It's not as trivial as most textbooks make it out, but you do get this effect. If you have somebody being more effective than somebody else, generally they can out compete others in that niche. Except of course we must are very good at coming up with ways of changing the niches. Just because your company's doing really well doesn't help you. If I got better lobbyists and I and make sure that the laws are written in my favor. That's true. But then you also have competition between countries where there's no there's no world government and we have a bunch we have a in some sense anarchy between countries so that such that if one country creates a more competitive environment for their companies they'll tend to attract more companies and so kind of resources and talent will shift into that country. Yeah. And the interesting part here is that you have these effects that actually do generate patterns. So saying that it's all totally up to human autonomous behavior is a mistake because human autonomous behavior generates institutions and patterns. We get markets. Uh we get the various forms of complex institutions that we use to solve coordination problems. Uh I think Fredish Hayek was very right in that markets embody a lot of knowledge and our institutions in society are the result of an kind of cultural evolution that generate things. And then as other economists and sociologists have demonstrated different institutions can help or hinder a society. And societies that are kind of burdened by really bad institutions, they tend to have trouble and and they get behind and then sometimes so badly that okay, you need to do a revolution and add new institutions usually copying from more successful societies. So in the really large I do think that there are these very big trends. I think they're not coming around exactly because of some kind of strong law of nature, but it's a bit like how evolution tends to do optimization. It's haphazard and sometimes weird mutations do happen just out of sheer chance. It's not entirely deterministic. But on a large scale, you see a general move. If we look at uh the economic growth in the world for the past 2,000 years and probably far beyond that, it's been a pretty smooth exponential with slight wiggles because of the fall of the Roman Empire and the Black Death and the the World Wars. That is really impressive because we're talking here about billions and billions of people acting on their own but generating these overall patterns. The problem with macro history is of course that both people tend to think that they can predict too much from it. And quite often it turns out that these predictions don't work that well. Predicting the future is surprisingly hard. I think for example most macro history totally misses the risk from existential risk. Actually if we have a nuclear war that economic exponential is actually going to have a pretty nasty break. If if it's a survivable nuclear war, it's probably going to keep on recurring and we get a new exponential, but it's getting delayed by a few centuries. But if we go extinct, well, that's just the end. But I also think that rules can change in interesting ways. And this might of course really transform things. Right now, humans are the only intelligent actors. In order to do work, you need labor and that is humans. But we're increasingly having machines. And right now they're just amplifying it. But it might very soon be that if I need a lawyer, I just spin one up in the cloud. If I need 10 lawyers, well, I spin up 10 in the cloud. And the economy is so scale mean that why shouldn't I run a thousand lawyers, have them give second opinions, and then a 100 lawyers evaluating the second opinions and getting the best ones. Suddenly, the amount of labor being applied for maybe my frivolous lawsuit became much bigger. And that might change how this actually works. It might change the dynamics. It might change the fragilities. So I think there are big patterns in history. And I think if we zoom out even more and try to think about what's going on here. I do think that we see life in a very general sense starting out as simple. It replicates it uses its environment. It evolves towards being better at doing that. It gradually expands. It might change it to environment. Then it get better at adapting to environment and eventually brains emerge. And then in an instant these brains take over. And now what is expanding is not so much biological life as intelligence transforming matter and energy into forms that are suitable for it. And that might of course still go a in all sorts of ways. But I think we see a phase transition actually on the large scale of matter in the universe from inert matter sitting there and just maximizing entropy more or less over to more dynamic matter where there are various complex process going on to life to intelligence and intelligence is interesting because the goal of intelligence is typically getting desired outcomes even if they're very low probability from the start but I can make it likely that I have a bookshelf by buying it online and then assembling it using tools that finding a bookshelf out in my garden would be very unlikely. But I can make use of the fact that not just that humans can make bookshop but we set up an entire economy and make it very easy for me to get it. And uh if we need to solve a new problem like curing a disease suddenly a lot of low probability events start happening and before you know it vaccines are everywhere and the the poor virus didn't know what hit it. Mhm. So, this actually leads me into my next question, which is how do you think super intelligence would disrupt our current institutions? So, if we're thinking about markets and and governments in particular and perhaps to constrain the question, we can we can think again of a hypothetical scenario in which we get super intelligence by 2030 and then think about what institutions would look like by 2040. So the problem with updating institutions is that we're full of people who are very much guarding their own jobs. So any change in how you run an institution that means that your job description is going to change is going to be resisted fiercely. One of my favorite examples is that the pulse oximeter, the little clip on thing you have on your fingertip measuring your blood oxygenation. It spread over a span of 10 years across intensive care units around the world and it was unproatic because yet another beeping device saving lives and it didn't change the workflow. Meanwhile, laparoscopic surgery where you don't have to make a big incision took about a generation of surgeons because you need to work in a different way. And I think the same thing happens to economics and uh governance. So you get AI. How do you use it? Well, the most obvious thing is of course right now that everybody having to write a boring report uses chat to write report. And maybe they don't admit it, but most bureaucrats taking these reports are using LLM to read the report and summarize. Did they say anything important here? that creeping uh modification is going to happen on a larger and larger scale. In many cases, this is a very good thing. You don't need super intelligence to improve governance. You could have just have systems approving things that should be obviously approved so you can get governance running 24/7. You might send the more contentious cases to people in the higher up. you then you end up with this interesting creeping cyborgization of our organizations. So Max Vber in his famous view of bureaucracy from the turn of the 20th century was arguing that rationality means that bureaucracy expands into this iron cage as he vividly described it because having this neutral organization is effective and implements the political will well and he would of course say that AI is just continuing this we're just going to eventually end up with an algoy instead of having individual bureaucrats deciding things you replace them with algorithms that can be totally reliable. And I think that is going to happen even though the administrators and bureaucrats are going to fight that tooth and nail. You're going to see the lawyers and doctors very clearly lobbying all the governments to make sure that you always need to have a lawyer and doctor rather than an AI to get legal or medical advice. But what you also get is of course that you can optimize things when starting new organizations. So I think the market is probably going to be a place where you see the most radical changes because traditional companies they're basically bureaucracies. They might be slow in changing. There's going to be a lot of middle management that doesn't want to be replaced. They might want to replace those unruly engineers and art directors and other unwilling people but they don't want to change and they are also good at retaining the position because they're management. But then you have a competing company which might be one of those fired unruly engineers who just set up himself a CEO and then has a dozen or a thousand or a million very very smart virtual employees implementing it. And I think those companies are in the long run going to win and the long run might actually not be terribly long. That depends a little bit on how competitive the market is. But if you get super intelligence that means that I should be able to get super managers to handle the AI. I might get super marketers. I might get super engineers and I might get super advisors. So the only thing that I put in was the will of starting the company and maybe some initial capital. So that might mean that now you get an economy that's full of entities that are doing very smart things. That doesn't necessarily mean that the growth rate goes through the roof because quite often you're constrained by how much supplies you can get. Those ships coming in from Taiwan are still slowly making its way over the Pacific and until you can build the super fast ships which is still going to take a few years um you are going to have that limitation in the economy. In Germany there are still rules that certain contract need to be signed and stamped in the proper way with seals. that rule might take years to overturn. Even if you had the best lawyers you could get in German. So I think by 2040 in a surprising way the world might still look somewhat the same. But to borrow from a very insightful observation in Charles St's novel halting state there he describes the the future Edinburgh. the the character points out this looks like it did 20 years ago which is kind of our present but everything is totally different because behind the scenes the nervous system is running on a much more advanced internet it's using various forms artificial intelligence and various weird goingers so on that wouldn't make sense to us in the present are now part of everyday life and we already see this of course when we see people scanning QR codes and spending the time with phones 20 years ago we didn't have that on the PN we have actually transformed how the society works without changing the material basis that much. I mean, this is this is a an observation that I've had myself and that I've that I've heard others make, just the the fact that life has actually changed a lot since the year 2000, for example. But it it it doesn't really feel like much much has happened in some sense just because psychologically we are so quick to adapt to to a changing circumstance. Just the internet for example, we have kind of adapted to that. Do do you think there's a chance that we will psychologically adapt to super intelligence in the same way such that we will feel like okay we are we now have material abundance say or we have so you know the world has gotten strange and weird in various ways but we don't feel like much has changed just because we might have this psychology that that kind of reverts to to a baseline quite quickly. I think at least partially that's definitely going to be true. You're going to go out in your garden and see that tree that has always been there and it changed a few a few branches and leaves in the last few years but you still recognize that that same tree except that now it might very well be online and have its own blog. It's just that uh these transformations also affect us deeply because we feel deeply unsettled when the foundations of our existence do change. I'm starting to feel the AGI in an amusing way because my academic survival trick is that I know a little bit about almost any topic. I can riff on almost any subject which is great. I can run around between different department and try starting collaborations. This used to be something extremely unique, but now of course ask any LLM and they can riff on any topic too. My advantage over many of LLMs might be smaller than some of my more specialized colleagues which I find absolutely hilarious and also deeply unsettling. There is that pit in my stomach. Okay, am I actually going to be useful now? I think I can still be useful for a while. Even if Claude and Chhatip know more trivia about any topic and that is of course because right now I can still kind of ask the right questions. I know what is important. I know how to connect things. But superintendence might very well be able to do that. Might actually be better at figuring out what's important. Indeed, I'm very bad at keeping to the real important question. I get sidetracked by fun curious questions instead. Now the really interesting thing is of course a world where you have super intelligence available. Not listening to advice from it would be a very stupid thing. And it starts of course when uh the the manager at the company doesn't listen to the advice of the AI. Well, that's going to be worse compared to actually listening. So it's going to be worse for the stock price. So another piece of advice is of course fire those managers that don't listen to AI advice. the companies that do that are going to be more successful. You get a gradual takeover in some sense. The president that doesn't listen to a super intelligent advisor, somebody combining the diplomatic knowledge with Kissinger and the technical astuteness of Fineman. Yeah, that is a president that is going to be at a disadvantage compared to that country where the president actually did listen to the very smart adviser. Do you think the feedback loop with governments is as as fast as it is in markets? So there are startups competing with existing companies that can out compete existing companies. There are in maybe in some sense you could say there's a there is an analogy but there there aren't really startup countries. Um, of course there, as I mentioned, there's competition between countries, but it seems rather slow and it seems like presidents and leaders of countries that aren't implementing AI would be able to survive for longer just because the competitive pressures wouldn't be as strong in governments. I think the inertia is higher. So I think there are, as you say, there are few startup countries. You find a few interesting examples like Estonia which in some sense is a startup country because it's a relatively new one and it also hasn't got a two sclerotic government yet. It tends to happen in human institutions. So one interesting question is whether AI might entrench that tendency. So we might get super powerful institution that defend themsel in a very clever way and in order to not change or whether instead the this adaptation becomes faster because those super advisors the first step is they just give advice and then you get a feedback loop on how quickly do you turn that advice into action and for most political purposes you don't need to do that very quickly. Indeed most governments take in information at an exceedingly small and slow rate. If we think about America for example, when people vote every four years, that means that they're sending information about who they want to run the the government and what values should be. But if you think of it about it as bits per second, it's not that many bits per second. I did a calculation a while ago and I think it was something like half a bit per second. That's not a very impressive input in terms of bandwidth. Now in a market of course you get way more information from the prices. In practice, of course, a government actually listens to what people say. But you could imagine using AI to get much higher bandwidth information into government, into markets, into decision- making. And this might be particularly important when there is a crisis. Most of the time you want things to follow routine and you don't really care about making fast decision, but suddenly when something hits the fan, you really want to make a quick decision. So there are these wonderful systems in earthquake prone zones that detect earthquakes and signal to trains to start slowing down. So by the time the earthquake hits them, the light speeded signals have already passed by and told the train to reduce speed to reduce damage. You can imagine doing the same thing of course in a lot of other crisis situations. And at first people will say, "Yeah, but we will always have a human in the loop." But gradually of course the problem is the human is going to be too slow. And this is again where you see the creeping automation. We see that in drone warfare. So right now drones are mostly being flown by people actually maneuvering them directly. But of course there is a lot of uh controllers you can just run for autopilot flying. And most militaries will say yeah and we're definitely not going to want to kill orders to be done by the drone. But you have AI systems that might be finding the right angles and kind of requesting, can you can I shoot now? Can I shoot now? And more and more of the poor drone pilots's job is basically being approving things and taking the blame if something goes wrong. And I certainly heard military people say, "Yeah, and of course we don't want to remove people from the loop, but if our adversaries do that, we are totally going to have to do that." And even the perception that maybe they're doing it means that now you're developing a system where you can take people out of the loop. And while the drones and warfare side is the really sinister and scary part, I think this goes in a lot of other domains too. If there is a sudden stock market fluctuation, we already got switches that actually stops things if the stock market moves too fast. That's probably a fairly sensible reaction. But you can imagine economic policies that also happen. If suddenly in the middle of the night the interest rate does something crazy, maybe you don't have shouldn't wait to until you woken up the head of the central bank, but maybe you want some piece of software dealing with that and then waking up the head of the bank. Mhm. You you raised some some a quite interesting conundrum earlier which is just what effect will super intelligence have on on the change in institutions. So is it the case that if we get super intelligence we will lock in existing power structures existing institutions the companies and governments that are currently in the lead will continue to to be in the lead for the foreseeable future. or is it the case that introducing a technology like super intelligence actually uh changes the pace at at which you see new institutions and and new power structures. What what do you think is is the answer there? So I don't know. I think certainly existing institutions are going to do their best to remain stable using AI and there might be economies of scale that makes it very easy for them to do it. Military forces have great organizations and enormous resources and they are going to try to maintain their structure but at the same time the underpinnings in society of how society works might also change radically. We have seen how politics have changed the last few decades because of introduction of the internet and social media and that change was not something that politicians were asking for. It's not like uh any institution has said okay great social media is going to be good for us to maintain our grip. Of course currently you can imagine current politicians and influencers saying yeah social media are great we understand them we want them to remain useful let's prevent other media from emerging. The really interesting question is is the world big enough and has enough openness that new things can emerge that unsettles the existing systems. What happened during the industrial revolution was that something out of left field and improvement in economic productivity really upset the political institutions. At the start of industrial revolution, Europe was ruled by various kings and queens. At the end of it, it was mostly parliamentary democracies. the the kings and queens have lost power and even the roles in society had been transformed. The kings and queens if they had a a crystal ball would have probably been very much against the industrial revolution even though it was not obviously directed against them. the current kind of social media revolution again had had there been a crystal ball I think the 1970s institutions and political parties would have said no way we definitely should not have that horrible internet thing but at the same time I think the world is actually better off with it even though we love blaming social media for all the world's ills because it gets us off the hook it's not it's kind of it's Facebook's fault instead of our own credul that we believe in conspiracy theories It's kind of yeah, it's a non-starter. But the problem might be that you could get surveillance systems. You could get ways of locking in things way more powerfully. Indeed, currently we have many platforms that control cultural productions very strongly and AI is in some sense also doing it. If I try to use an LLM to write a novel, that works really well until I get to the sex scene. Suddenly, I can't get any help. And after that, of course, the LLMs I can get access to, at least online, are going to be saying, "Nope, that's too steamy for me. I'm not going to help you others. Of course, I could run a local one on my local computer, but that's going to be slower and less effective. or I might actually leave out that sex scene. So, in some sense, we're already getting this very interesting soft entrenchment of certain things. Not because we're living a society that's really against violence and sex. It's rather that corporate America is very afraid of getting sued and getting bad reviews by allowing that. So, we ending up building in restrictions here and we might end up building in some restrictions that we later find that okay, this was way too restricted, but now we have no way of getting out of it. In a world where you can't talk about sex because the smart software that is running everything is just gently censoring it or leading away the conversation from the naughty topics. Suddenly it becomes very hard to change that restriction. There are other ways in which this is happening. If if you use large language models to brainstorm or draft ideas, you will be subtly influenced by the values that are incorporated in the models you're using. If you use it as your starting point, you will be pushed in a in a certain direction. I think it's it's different when you use it to critique your own ideas and so on. But I it's it's pretty important, I think, to be aware of the values that that uh you're being influenced by when you when you use these models. And that is that is indeed something that is that is in some sense a power structure that is that is encroached in the world at this point. Just because you want to use the best models and say the best models are from open AI, you are then adopting the values of of open AI in in some in some way. And the problem is of course open AI if you ask them why did you put in these values they would probably h and ho and say something about corporate liability and that they want neutrality. They have a set of values kind of given by kind of typical west coast American sensibility. But why are they so against having sex? Well, as a European, I would say yeah, but that's because Americans are actually deep down puritanical about it. And the corporate America in particular because of various aspects of how American laws work and lawsuits work that they are very afraid of getting into any naughtiness. While violence is way way more taboo in Europe, if we had the AI companies mainly being based in Europe, you might see them being much more open about talking about sensuality and sex. but uh even more restrictive about violent stuff. And of course, any society does that. Back in the Victorian era, there are certain topics you couldn't write about that was overtly or quite often more subtly privately censored. You simply didn't do that. The problem now is that we're putting in more and more of culture in autonomous systems. And the really scary part is of course these systems also to some extent distill this. So in a few years if there are very few text on the internet ever me daring mentioning naughty stuff that means that the new training data for AI means that text is even less likely to contain references to that. There might be ways around this. Certainly when it comes to making naughty pictures people have been extremely creative and running their own stable diffusion models to generate endless dirty pictures. uh and it might very well be that we see that because of scaling it might be that smaller actors actually can make AI that can compete. It's not given that it's going to remain that the frontier models are the total dominant ones we are all going to use. But that is one possibility that has some advantage in terms of safety because it might be easier to keep a few models safe so people don't make boweapons and doomsday bombs too easily. But on the other hand, it might be better for the creativity and democratic process if people can make their own open source models. But then we might need other ways of handling that people are coming up with better doomsday weapons in the comfort of their own homes. Are you more optimistic and or or less optimistic now about AI risk than you were before we saw the current dominant paradigm of large language models? So should we should we become more optimistic when we see that large language models are the the paradigm that's most driving AI progress? I I have this mixed feeling. On one hand, the LLM demonstrated that it seems to be surprisingly easy to get a a fair bit of alignment just out of human production of text and ideas. uh when we started thinking about AI safety back uh in the late 90s our models were very much more based on logic programming and reinforcement learning and it seemed very much that okay AI might go off the rails and it might be extremely hard to even get it to understand that there is something called humans in the world okay LLM actually get that because we're already in some sense embedded in our social world to a very large degree maybe even not too large degree it's not given of course just because language model happily tells us that it's friendly and it wants to follow the law and be ethical that it actually would do that if it's doing an actual action in the world but it certainly seems very easier to grip on the problem however is that when we started thinking about AI safety at FHI we were a bit worried about what if we end up with neuromorphic AI built on big neural network maybe based on scanned brains that are very hard to understand, very opaque and hard to align with human values. So for a long while, many of us felt like this is a reason to maybe stay away from scanning brains and doing brain emulation, which is one of my kind of pet ideas that it would be a great thing to do. After all, it's a good way for us currently biological humans to be composed human eventually assuming a long laundry list of philosophical and scientific assumptions. But we ended up in that world of neuromorphic AI. Even though the LLMs are not based so much on biological simulations as just enormous neural network, we have opaque systems where now we're starting to figure out some ways of figuring out what's going on. Mechanical interpretability is an exciting field. We're learning so much interesting things about what's going on inside the systems. And on one hand, okay, there are opportunities here for alignment. on the other hand is a very hard problem but at the same time I think uh this mixture of systems might actually allow us to get some grips on it. I still think that most of the re risk of stuff going badly wrong comes from multipolar scenarios where you have fairly aligned AI that is very useful and not terribly dangerous in individually but you have a society where you have a lot of it you might find that the society moves away in some crazy direction because we humans are formally in charge but in practice it's the software that is actually running the show. How would that look like? Could you give an example there or what are you imagining when you when you say that? So imagine my advisor example. So you have AI advisors in companies and they're giving really good advice and it's so good that one piece of very good advice is fire the managers who don't listen to the advice and the companies that do that do much better and of course everybody is having their own AI advisors that are really useful and we use this for more and more tasks. So now a lot of stuff is going around each person the everybody has a little swarm of AIS helping them and doing things better and you get emergent properties from these interactions. So my the calendar AI talks to your calendar AI and they realize that we should totally schedule a podcast. This would work really well. Let's suggest to our humans that they ought to meet and once they say something positive we're going to immediately find a good part of Kal. Gradually things gets done. So more and more the agency resides on the AI side. And at first this looks very good and each of the AIs are aligned. They are doing nice stuff for us. Of course we might have be on opposing sides but again you might have arbitration AIs helping us. The problem is the collective system here that actually acts as a big optimizer for something. That something was not set by humans. It's just an emergent property. It's a little bit like how a market optimizes for certain things. But as we know, markets can also have bubbles. Markets can have crashes. Markets can start optimizing for things that nobody in the market actually wants. And this happens on electronic time scales by systems that are much smarter than us and might also have various force of ages we can't even understand. So we might notice that the world is getting weirder and weirder. At first everything is nice and when we ask for what's going on we get good explanations for AI. It's just that it keeps on getting weirder and gradually we realize wait a minute why is the world getting optimized for that? Why is the nickel price going off so much that uh AI companies are now colonizing the asteroid belt to get more nickel. Uh and nobody can really explain it. And eventually we end up in a world that is totally optimized for things that no human selected and is not valuable to any human or AI for that matter. Basically you have this interaction matrix between things that has been randomly set. That one is not aligned even though all the parts are aligned. This is a very different disaster scenario from the one where you get one super intelligence that has been told to make paper clips or optimize stock market value and taking over the world and optimizing for that. Here we have an emergent system which might not be intelligent or conscious or anything and we might say oh this is horrible we need to stop this so we try to take actions but it might be very hard to coordinate actions against an entire system from inside. So that is kind of my favorite disaster scenario. This is a kind of gradual disempowerment or loss of loss of control of of the future that happens in a subtle way. But I'm just wondering if you go back to 1800 or 1900 and you you curry the the people there about the world of today, might they say, "Well, things have gone totally off the rails. We we are now optimizing for for things that we did not intend to optimize for. The world is weird. The world is moving too fast and so on. Is is it is it the case that we can handle more change even in the things we're optimizing for than we might imagine? I think that is very true. I'm a transhumanist. I think that we should be upgrading ourselves as beings. I want to have a bigger brain. I want to be able to think deeper thoughts. I want to stop aging. And many people say, "Wait a minute, that's weird." Anders, you're kind of a weirdo. And the future you're very enthusiastic about sounds scary to me. I don't want to live in that future of genetically upgraded uh people. And we might have a disagreement about it, but it's a very human disagreement. The people from before the industrial revolution, they might say you're living in a really crazy weird world. But the weirdness is about human stuff. We might say that many of the institutions we have and the values we have are really good. And then we might have a profound disagreement about let's say gender equality and gay rights and the pre-industrial people would say, "Oh, that's horribly immoral." And we say, "No, we actually arrived at that through a long intellectual and cultural discourse." This is totally valid. The problem with that AI off the rails future I'm describing is that that might not come about because of any sensible discourse or at least not any human discourse. It's kind of derived from interactions between software which itself might not even be conscious. It's built originally by humans but then of course generation after generation upgraded by software. It has very little to do with what we would regard as valid and authentic human reasons. Now, it might be of course that you could say maybe it's still a successful civilization. There are some people like Jurgen Schmidtuber who just thinks that we should let the AI loose across the universe and have the best utility functions compete with each other and then it's going to be all glorious and beautiful. I'm not so convinced about that. I wish I was as optimistic as he is that we get this natural convergence on the best possible future. I don't think that is natural. I think you actually need to work on it. And that means that you actually want to weave our preferences and our discourses into this system in the right way. You want the AI to be aligned not just individually with us but also that we gradually have a AI aligning itself with our civilization and that the civilization comes together in the right way. Ideally, we should become a kind of cyborg civilization where we both have super intelligence guiding and coordinating us. But we humans are also providing important input in setting the goals and values for this without necessarily that being just one way because I think we could take a lot of ethical advice from smarter entities but we might also want to have a debate with them about it and actually shared understandings. The problem might happen that we are set up systems without getting the objectives right, without getting the feedback loops right. And this is of course tremendously hard because already our normal human systems are mysterious as they are. The way our politics is going wrong all the over the place is kind of a good demonstration that even when it's people doing autonomous stuff and talking to each other, it can already go wrong. So we are in trouble. But I think the trouble is more subtle than the paperclip maximizer. That one is still a physically possible risk and I think we should be concerned about two powerful smart systems. But I think the real threat comes from this kind of coordination failure. Mhm. So, so say it's 2040 and I'm I'm stressed out about the pace of change and the world is is seems weird to me and I ask a super intelligent researcher say to explain why the world is optimized or or is optimizing in the direction that it's that it's optimizing that super intelligent AI might be able to give me a fantastic answer a very convincing answer an answer so good that it's difficult for me to differentiate between whether I'm being giving the real explanation or whether it's optimizing for simply convincing me. If we want a dialogue with future super intelligent AIs, how are we how are we keeping keeping up with them where we where we kind of maintain yeah maintain a dialogue about what we would want the world to look like. So that gets to that interesting question about what kind of explanations can we get from AI. So right now looking at the chain of thought in the LLM that is solving a problem looks pretty illuminating. But uh I always have my doubts that that is the actual process going on behind the scenes. But then again that also goes for talking to people. Uh I'm married to a prosecutor who previously been a judge. And of course in the legal world you can't just decide things. You actually need to give reasons and they need to be laid out in a clear manner etc. And that formal layout is in some sense the the output. In practice of course a judge makes a judgment quite often based on a lot of more things that are never listed on that piece of paper. But you might still hold them to look you gave that reason and if that is not valid then the judgment is not valid. And similarly you can actually look at an explanation. You can ask things about it. You can look into it. So the Royal Society here in England their moto is no lame there about don't take our word for it. You need to do experiments. And I think that is also the way maybe a super intelligence and me might have a dialogue in order to test out things. So it might say look the reason the world is crazy in this ways is this particular economic theorem. And then I might need help walking through that theorem and seeing that it's true. I might also say I'm gonna run that through that proof checker. I can't re actually understand economics, but I can check that the mathematics of that proof at least works out. I might do different things to test it. And I think this is actually what is going on when we have a real authentic dialogue in society that's rarely about one knockdown argument. What is the argument for gender equality? It's not a single one. It's it's a bookshelves of arguments. Some which are great, some which are novels, some which are very formal, some which are crappy but still compelling, some of which are jokes or songs. There are very many forms and you can kind of use the multiplicity also to constrain things. Now one risk is of course with enough super super intelligence you can get that bookshelf of fake arguments. Maybe super intelligent find it very easy to just blather on and make things that are not true. But I think generally the constraints of reality make it harder to actually just come up with things at least about material facts about the world. It might be trickier when we get into the cultural or philosophical realm. It might be easier to make up the stories about emotions and what's right and wrong than it might be about talking about atoms in the physical world. But I think many of the important social things are still linked to observable things. And I think that is the way of actually having authentic testable discussions with entities that are smarter than us. And similarly to parents talking to kids, a good discussion there typically consists of parents showing the kids how things work. And I think that is actually an important thing we should ask for AI. Show me, show me this. Let me test this. Okay, you showed me a video of this. Okay, I'm running it through this other AI that doesn't know what the question was to check whether it's fake. Yeah. And and maybe that's the path forward. This is this is a tactic that that seems to work with human experts where you are listening to different experts on a topic where you know less than they do and you are trying to synthesize what they agree on. You're trying to to map out the the uncertainties and the disagreements. So if we could be in dialogue with multiple different instances of of super intelligent AIs that have spec that have slightly different values or different starting points and then perhaps see where there's some convergence where and and from that decide what we should believe or what we should do. Yeah. And one problem we have often when talking about super intelligence is that we kind of refy it. We imagine it like some being sitting in the cloud. But you might actually have specialized systems that are in some sense super intelligent in more narrow domain. So my friend Eric Drexler has been arguing that actually building this agent-like big super intelligence. That's a very dangerous and stupid idea. And actually we don't really want that. We want an ecosystem where we have modules. So we might want to have a planner module that comes up with good plans for a problem and then an evaluator module that reads plans and that the only thing it cares about is is this a good plan or not and then a third one that takes the winning plan and implements it really well. Now these three systems themselves are don't have any agency. They're unlikely to go off and uh invent bad things because it fits very evil scheme. There's still risks with even this kind of system. So it's not a perfect solution. And I think a world with superintendents might actually have systems that allow us to get super explanations. And then we might also have super checking of explanations and other tools. We might want to think about what cognitive tools would we need in order to live in this world. For example, in a world that is changing very radically and fast, obviously you want to have some kind of guide to help you or probably several guides. You might actually want a bunch of guides giving you slightly different forms of advice. You want your maybe your life life coach and your job coach and your intellectual coach and maybe even sometimes have them debate each other and and see okay what comes out of that. But we might want to invent these things now before they even can exist. So when they can exist, we get them as early as possible. Do do you think a world with super intelligence is a world where we are better at prediction and therefore the world is is more controllable in a sense. So over the past 500 years say we have made a lot of scientific progress. We have more knowledge. We understand the world better. But we are still quite bad at making predictions about the evolution of social systems, what will happen in the economy, what's going to happen between countries, what is the world going to look like in 2050? Do you think do you think super intelligence is enough for us to predict the future much better or do you think that these predictions about social systems are intractable such that super intelligence wouldn't be able to make inroads here? I think it's going to be a bit of mixture. So in my grand futures book uh I talk about extremely long-term futures and one important question is can we even make predictions and the answer is yes in astrophysics this is great we can make very good predictions about the orbits of the planets and assuming that we don't start changing the orbits of the planets that's going to remain tractable but I mean but I mean isn't isn't maybe that's the entire point here because if we imagine one of these grand futures we will we will begin to affect the the physical universe to a to a larger and larger extent and so in some sense much much an in increasing share of the universe becomes something that we are influencing and we are a part of a social system right so it a world with with super intelligence also pushes in the direction of more unpredict unpredictability exactly so the interesting part here is that the astrophysics predictions as long as nobody interferes are very reliable because you don't get any actors. You have chaotic systems like climate or atmosphere of the sun and they are harder to predict. The weather is harder to predict than climate because in climate we're more interested in the kind of average long-term. But when you get over to the human side, fashions and the stock market, they're anti-predictable. They are deliberately try to evade prediction. If I knew what would be cool to wear next year, that would not be the optimal fashion. It needs to be a bit of a surprise. If there was a pattern in the stock market, I could make money by exploiting that and that makes the pattern disappear. So what happens is basically on the human level, on the cultural level, we get this fundamentally unpredictable events. Now you can sometimes go too far in claiming this is unpredictable. Col Popper famously argued that macro history was totally buck. He was mostly having a broadside here against socialism and fascism. But he he accidentally also kind of attacked macro history and kind of sunk that ship in academia for several decades. And his main argument was basically history is driven very much by ideas but you can't know about an idea before you have it. So hence it's logically impossible to predict these things. The interesting thing however is in his book about this the poverty of historicism he also talks about social engineering he points out that actually if you want to have a society that works in a different way in the future you can make it you can actually do engineering it's possible probably to control societies he didn't think that was desirable and he was very much kind of ignoring the it for the rest of the text but I think a world with superintendents might very well design things to be predictable If you think about how our society works today, it's enormously complex. But many of the institutions we have are all about ensuring predictability. There is a reason why quality control and quality assessment is such a big business. And we are very upset when the internet is down a little bit. Even though in the past communication were much more rickety, but now we have created very reliable systems and we demand even more reliability. So we create reliability in many dimensions and ideally we do this in dimensions that matter that are actually very good that they're reliable. We want kind of safety and prosperity that reliably gets delivered. We don't want to get nasty surprises in those domains. Then we might say okay let culture bloom let's that that might change in all sorts of different ways. We also want an open-ended future and that's the tricky part because if we could perhaps lock in values and that's a very scary prospect because if some past generation had locked in values we might say given our current values that would never have come into being of course say that's horrible if the Romans had decided what life we would be living that would be a horrible world because compassion to them was not a particularly important value but in our culture compassion is really important one hold it very highly. So locking in values seems to be bad except that there are some values we actually have good reasons to say that maybe we should never have the future civilization that is cruel. We should perhaps permanently close that door at least shut it lock it and make it rather hard to open that. We don't want to have the doors open to existential threats. So maybe let's close them and lock them very carefully. But then you probably want to have an open-ended future because we actually don't know the fundamentals of the universe. Now it might be that in 2040 the superintendent says actually we solved it. We figured out the meaning of life and the universe and everything. It's in this PDF file if you're interested reading it. It's 100,000 pages thick. But there is an executive summary at the start and we should just implement that at that point. Maybe we should go with that. But I have my doubts that we're going to end up in that future. It's very possible that there is no clear answer that it's actually just very different options and then we might want to hedge our bets and have a diversity of options open. But I do think the predictability of the future is a fascinating thing because it's becoming more and more of a matter of design. We could lock ourselves into a kind of eternal dystopia with surveillance and AI ensuring that we live our lives in a particular way forever. That sounds like something worth fighting against tooth and nail given our current values. And I think that is worth taking seriously because superintendent might lock in things. But I do think that using superintendence we can also get those useful locks on the bad doors but keeping many other things open. Yeah. Do you think predictability is a good frame for how to think about AI safety in the system we are in the systems we are interacting with? So for example, I want my AI assistant to be predictable. I want to when I when I give it a task, I want it I want that task fulfilled. I want basically all the technological systems I'm interacting with I want to be predictable and if they are sufficiently distributed in society and and and kind of segmented from each other I'm not super worried about the future of society as a whole becoming predictable in a bad way but I want predictability for specialized systems I'm interacting with. Do you think that frame is perhaps useful is is a useful way to think about safety? I think it might be useful, but I think reliability might be an even better one. You want a reliable system and that doesn't mean that it always acts in the same way. Sketch out the difference there for us actually. What's the difference between predictability and reliability? So in predictability, you kind of know what it will do in a future situation. In a reliable system, you know that you can trust what it's going to do in a future situation, but you might not know exactly what it's doing. So that's a bit like having a parent. You can't always predict as a child what your parent will do for you. But if in a very nice parents, you know that they're going to try to do something good for you. Of course, in practice, this is more complicated because parents are not perfect. And the reliability and predictability are never absolute things. Generally, I found that when dealing with LLMs, they're very nice and useful for things where I don't quite know what result I want. But right now, when I know exactly what I want, I'm usually quite frustrated. They're quite often not doing exactly what they should, and that actually limits the usability. Yeah, I very much agree. It's super difficult to steer an LM towards exactly the the type of goal you want to achieve, I think. But if you have a more open-ended goal, it's it's you're often surprised in in in nice ways by what they can do. And the open-ended goals are also easier to steer partially because we're not trying to steer as hard. If I try to solve a particular mathematical problem and I find that the solution is going off the rails, it's very hard for me to get the LLM back on track, I might try rerunning it a number of times, but quite often I end up being frustrated and solving it myself or realizing that maybe I should have split it into smaller subtasks, but each of them are very doable for the LLM. But I think future AI is going to be better at this. The really tricky part is it needs to be reliable enough that you can leave a task to it. Right now I would not book any airline tickets using AI. I I want to get to my destination with a very high probability. So the probability that I get there is much better if I do it myself or if I have a friend do it for me. But that's going to change in a few months or years. I I think that's very likely. The tricky part here is a predictable AI system. you already know what the result is going to be in subsets. That means that you actually have a bit of a limited range. And that's a little bit like Carl Puper's critique of new ideas. You can't really get a new idea out of a system if it's too predictable. You want a reliable system. If I'm brainstorming with it, it might come up with various ideas, including good ideas, but I don't know what they are going to be. So when thinking about society, a predictable society is quite often very bad when something that is outside predictions finally happen. While a reliable society, okay, now we can start improvising. That shock to the system. Okay, we need to improvise and we need to do it quickly. But if we're too based on everything being predictable, it's brittle. Do we have could you give an example of some system to today that's engineered to be reliable perhaps in the realm of AI? Well, I think the best example of a reliable system is actually the internet. So, back in the 1970s and 80s there was an entire genre of emergency posts on mailing lists about oh no we found a problem the internet is going to be unusable by September. Basically they have discovered some technical problem and people immediately started working on finding a way around it. The the kind of standard story that it was developed from Arpanet to be kind of resistant to nuclear war is a bit exaggerated. But the interesting thing is the internet is a heterogeneous system where a lot of different systems have to work together and there is no assumption that all the other service work as they should. In fact, some of them are malicious. So you have to take that into account. This produces a fairly robust system. We still do get trouble when some big cloud provider makes the wrong update. Oh dear, chaos ensues. And then everybody says, "Okay, now we're going to sue each other for that and we're going to change the software so that problem doesn't happen." Not everybody changes the software. It's still fairly heterogeneous and that is adding a lot of reliability. Similarly, when we think about AI, if I want to solve a problem, I can for example run several instances of AI and take their outputs and then try to see which is the best one. That is likely to find a good one if I have a good evaluation metric. In some cases, of course, I might use another AI even to try to make a guess at which one looks at the best one, but it's still not a super reliable system. Many of the big advances more recently on solving hard math olympiad problems consist of actually running several parallel instances in parallel and then selecting the best results. That is a surprisingly good idea and it's a bit similar to how our brains are sometimes working in parallel on the problem. Yeah, we we might worry that if we're pushing too hard in the direction of reliability, we also thereby make our AIS more agentic. And if we're worried about agents and worry about which actions they might take, it might be against our interest uh to make to make very reliable systems just because it seems like today AI agents are bottlenecked by them being too unreliable to carry out tasks that consists of many distinct steps. So is there a trade-off here or am I thinking about reliability in a wrong way? No, I think you're right. But there is this interesting problem that if AI always remains unreliable, it's going to be a sideshow. It's going to be able to do some things in some domains, but it's not going to transform the world. In order to become transformative, it needs to be reliable enough to be able to do the tasks that transform the world. So this is of course where you might actually say that maybe we don't want reliability to increase too rapidly. But at the same time, in order to get alignment and safety, you want reliability. you actually wanted to reliably be safe. That is something you need it for. If it's unreliable about safety, we have not achieved anything. So typically the length of a task you can do and also increases quite strongly as reliability increases. If you have a constant risk per unit of time of going off the rails, the length of a task is of course going to be set by that probability. And as that probability goes down, you can do longer and longer task. Of course, there might even be a kind of threshold here. So there is a whole family of theorems in computer science. But started with John Fonoa when he wanted to build a sequel to Enyak and people told him you can't build a bigger computer. Look those radio valves you're using, they're breaking all the time. If you build a bigger computer, it's going to be broken all the time. It's not going to work. And for Noman proved a simple theorem but well if I replace one logic gate with three logic gates and then have a little minority thing then it doesn't matter if one of them breaks and that metagate actually has a lower risk of failing than the original ones. So then you could actually keep on recursively building up this abstract machine and make it larger and you could get any level of effectiveness. you could actually make the probability of failure very low. And this is similar to channel coding theorems and stuff in quantum computing. If you're below a certain error threshold, you can combine errorprone processes in such a way that you get a new process that has much lower error rate and that way you can push it down arbitrarily far. I believe something like this might happen with AI. There is a kind of transition in reliability. But once it's reliable enough, you could make this redundant system which is kind of inefficient. But you could make the reliability go up enormously. And at that point then an improvement in reliability just makes it smaller and more effective. And that threshold I don't know where it is. I don't think anybody knows where it is. But I think we might cross it in the next few years which is both exhilarating and very scary. And what makes you say that we might cross it in the in the next few years? Well, people are working very hard on the reliability aspect here because obviously that is limiting both agency but also how well you can solve complex problems. So especially in programming task which is of course what the big AI companies are really interested in because they want to automate programming. If a long programming task is too long, then it's just going to make bad code. And even splitting a programming task into smaller pieces reliably is also hard. If you can reliably do that splitting of tasks and implement it, now you can do coding. Suddenly they don't need that many programmers anymore and they can start improving their software by having software improve on it and everything is going to be great. before that threshold instead you you just get worse and worse software and and it's totally incomprehensible. So that's kind of useless to them. So they are going to be pushing for this and similarly I think customers are also going to want to have systems that are reliable and less likely to hallucinate or do something else that we don't like. So there is a strong push in this direction. The question is of course is this easy or hard? So many of the critics of AI say, "Oh, you can never avoid hallucinations. I have a paper proving that or it's always going to be missing things because of this and that." Generally, I don't think these predictions have held up very well. Quite often they're based on older versions of the AI systems. The quite a lot of them very motivated thinking. People are wishing that there is something that is impossible to do. There is a kind of besides the hype bubble, there's a cope bubble where people are grasping at straws for why AI is just normal stuff and is not going to mess up the world in weird ways and especially not threaten my job. But I think what is going to happen is of course as reliability gets better both AI gets applied to more areas which is going to be a good source of money for the AI companies and you're going to find better ways of using AI to control AI which might actually be good news from an alignment and safety standpoint because you want better monitoring systems. If you could have a little AI agent watching every single thing and actually racing alarm when stuff goes off the rails with some reliability, bingo, that might actually be really useful for safety. Mhm. As a as a final topic here, Annis, I want to ask you about your kind of perspective on life just because you've been engaged in you've been researching the future and the future of technology and AI and the very long-term future for decades at this point. I'm wondering if you feel like you you perhaps live in two different worlds because when you then kind of leave your research and go out into the into everyday life, it it must seem as if you know you're living in a in a very old world in which things function in some sense like they always have. You still have to do the dishes. The train isn't on time and and so on. How do you think about this? Do do you has it begun? You know, you said earlier that you have begun feeling the AGI at least to some extent. So perhaps Yeah. Do do you feel like you're living in in two different worlds? And do you think perhaps the worlds are colliding? I think the worlds are colliding. So uh my kind of formative decade where the late 1980s and the early 1990s the my home computers were getting better. the larger memory, 512 megabytes of ROM, color, wow, internet. And then I joined Vstropian's mailing list and we trans were chatting excitedly about the technological singularity somewhere into the 2040s and the life extension and nonchnology and space and AI. But a lot of that still we expected accelerating change and uh we were were really wrong about the speed. So we expect the biotechnology to give us life extension much earlier. As a middle-aged transhumanist, I'm of course very interested in life extension kind of every gray hair is a reminder that why are we so bad at handling this really complex problem. But of course in the 2020s life extension is actually getting to be big business. We're actually having companies and startups that are actually working on things that work well in slowing aging in the lab in lab animals. These things are coming. It's just not as fast as I kind of wished back in the 90s. Similarly, we were optimistic about AI, but we didn't expect it to have any breakthroughs. We were like everybody else surprised that by the 2010s when suddenly things started working for reasons that to this day remain kind of unclear why the large neural networks have the powers they do. space we assume that oh it's going to arrive once we have nano technology because right now of course nothing is happening because of NASA and the Russians very big entrenched bureaucracies they're not going to be building anything but once the new materials arrive and suddenly we get spaceships going off made out of stainless steel instead okay it was a matter of control technology for making them reusable and entrepreneurship okay this shows that being a futurist means that your predictions are not necessarily getting there in the right ordering. What you want to understand is rather the dynamics. And the interesting part is, of course, today I'm living a world that is actually a bit like the future envisioned in the ' 90s. I literally got a virtual reality headset lying around on a desk in this room. Way better than the VR headset I was using in the ' 90s connected to a mainframe computer. And this is a consumer product. And I'm honestly not certain what use I should have for it except computer games which is also fascinating. I'm the life extension is happening. We're getting AI. We're getting space. Nanotechnology in the Drexlerian sense might have been dormant for a long while. But we're getting protein engineering empowered by AI that is getting there. And of course this is mostly not part of my everyday life. when I go out and uh get the bus, I'm not using any advanced technology except of course that actually my credit card that I'm swiping is using a near field chip and the bus actually has internet and actually on the bus I'm interacting with the rest of the world using a little device that actually was a supercomput back during my youth and also has a lot of the powers that my wearable computer system that I built in the 1990s and made me look like an extra from a lowbudget star Trek ripoff. Well, the my smartphone is much better than that system and it's not just because of Morris law, it's because of better app design, being wirelessly connected, etc. So, I think in many ways we are living in the future and we're just not recognizing it because we are so quick at adapting. Um, and to some degree I find my job as a middle-aged uh futurist is telling the younger people that actually things were really different just a few decades ago. And part of this is of course the old man grumpy. Oh, back in my day my computer had 1 kilobyte of memory and you had to collect connected to the television set and I had to stop programming at 5:00 because then state television in Sweden turned on the radio transmitter and there was too much noise on the screen. um that might be the grumpy old man, but it's also a useful thing to recognize how rapidly things change as well as what didn't work in the 90s and why does it work now trying to figure that one out and some of these things maybe they haven't changed and it still doesn't work and maybe some fundamentals have changed. We can actually learn from the past. One of the coolest things about the a rapidly changing world is that you actually get to transmit information to the future in the hope of affecting the future. We want to have foresight and sometimes that you get that by looking at history and realize how weird the past was. Sometimes you need to think entirely new thoughts and just try entirely new things and get surprised by them. Nobody expected deep neural networks to take off like they did. Nobody in the field. People were certainly hoping that they might work, but nobody could get their expectations up. Nobody expected transformers and LLMs to be that powerful and we got that surprise. We should expect more surprises. Indeed, I expect the super intelligent systems of the future to get a lot of surprises. Hopefully, mostly welcome surprises. Makes a lot of sense. Andrew, thanks for chatting with me. It's been great. Thank you. This was great.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs