Library / In focus
80,000 Hours PodcastGovernance, institutions, and power
Lennart Heim on the compute governance era and what has to come after

Why this matters
Governance capacity is now part of the technical safety stack; this episode helps translate risk into policy with implementation value.
Summary
This conversation examines governance through Lennart Heim on the compute governance era and what has to come after, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Perspective map
MixedGovernanceMedium confidenceTranscript-informed
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
StartEnd
Across 53 full-transcript segments: median 0 · mean -1 · spread -18–17 (p10–p90 -9–5) · 2% risk-forward, 98% mixed, 0% opportunity-forward slices.
Slice bands
53 slices · p10–p90 -9–5
Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.
- Emphasizes governance
- Emphasizes policy
- Full transcript scored in 53 sequential slices (median slice 0).
Editor note
Strong bridge between technical risk and institutional action.
ai-safetygovernance80000-hourspolicy
Play on sAIfe Hands
Uses the global player with queue, progress, speed control, and persistent playback.
Episode transcript
YouTube captions (auto or uploaded) · video 7EwAdTqGgWM · stored Apr 8, 2026 · 1,509 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/lennart-heim-on-the-compute-governance-era-and-what-has-to-come-after.json when you have a listen-based summary.
Show full transcript
all right welcome to the future of Life Institute podcast I'm here with Leonard Heim Leonard could you please introduce yourself sure yeah thanks for having me um as you said my name is Leonard Heim I'm a researcher at the center for the governance of AI in short Goff AI that we'll usually say um coffee guy's mission is like roughly we're trying to build like a Global Research Community where we're trying to help Humanity to navigate this transition to Advanced AI I think the thing which might have seen over the last few weeks and what I'm mostly doing I'm working on this research stream which we call compute governance so I'm thinking about computational infrastructure I'm particularly interested just like is computer promising not of AI governance what are like the sub nodes of compute we can use to achieve beneficial AI outcomes what are Hardware mechanisms we can use to support this regimes so like everything compute in general but over time this became more narrow my background is in Hardware engineering so I studied Computer Engineering in school spend a lot of time actually figuring out how computers work and now I'm trying to build on top of this knowledge trying to yeah use compute to steer towards more beneficial outcomes fantastic all right so what I imagine I was talking about here is how we can forecast AI progress specifically by looking at at compute but perhaps I mean before we get there we should probably introduce the key factors that's driving AI progress which you call the AI Triad and what what is the AI Triad and and what are the factors involved here AI trying out three factors I think another way how to think about this I think is which I sometimes user concept is we can also call it the AI production function you know it's a function where we have certain inputs and we have certain outputs and questions like what are these inputs and one way how to think about the inputs is like to split it up into three components which we describe as algorithms data and compute what do you mean with algorithms well when we think about AI nowadays we mostly talk about machine learning within machine learning we talk about deep learning and even within there you can go too deeper to like the specifics of these algorithms be it the Transformer architectures how do you train these systems all these kinds of things they're all important for dimensional output the next thing is the data right those systems as we do machine learning we have some kind of learning so we give them some data on which you should learn on um this is nowadays like big data sets via text be it images whatever we're training these systems on which is another input right and lastly then we have the factor which actually enables all of this which is compute or computational infrastructure um this is eventually the system this physical infrastructure we need to then train these systems and also later like to execute them um sometimes people also think about Talent as another Factor there I usually would describe it as a secondary input you know Talent helps with better algorithms maybe crying more data maybe making compute better but like if you think about like yeah those three are like fundamental to like break them down and think about like how much they made for AI progress what is the output there and how have they developed over time does it make sense to separate these factors and talk about how much each factor contributes to AI progress or is it simply so interrelated that it doesn't make sense to to separate uh separate them out like that yeah that's definitely an ongoing question I think it makes sense trying to separate them right I'm like I'm claiming I do compute governance and I think maybe in the future people will be claiming they do algorithm governance or data governance and again there are always downsides to put things into boxes but sometimes like some apps like having these kinds of models yeah if we try to separate them I think what I've been historically interested in just like well what was their historic role over time and I think in general we can say they all trade off with each other right if I want to build an AI system I can just like try to spend a lot of times figuring out the better algorithms getting more data or just like throwing more compute at the problem and eventually will turn out to be like a better AI system with better capabilities um what we've seen over time with with data is we just acquire more over time so roughly it doubled I think it was every one 1.4 years so the amount of data we used I think this 1.4 years was not for text Data this again would look different for image data for voice models but I think Text data is the thing we're currently thinking about when we talk about large language models whereas in contrast if we think about compute what I mean here is like the compute used for training these systems right and when I say how much compute we use for training the systems I refer to how many floating Point operations feel needed to train the system that it's eventually finished and it can be run we did an analysis on this and we roughly saw that the training compute is doubling every six months so this this is like faster progress like faster doubling compared to data and how is this a doubling rate evolved over time so so as compute always been been doubling every six months or as it has the rate of doubling also increased over time yeah like when we like we did this investigation on compute the first ones to do it was like actually an old man it had a blog post called a and compute they did this I think in 2018 19 something on these lines and what they found well this AI training computer has been doubling every 3.4 months so twice as fast as I just said so we did we ran the same analysis with just like way more ml systems at the beginning of 2022 added somewhere new systems like well if we now look at the trend it's actually just doubling every every six months and this is partially they picked like a cutoff point where just like new systems like pretty compute intensive systems like Alpha go and these kinds of systems came out which yeah were like really at the top of the Strand and like skewed the solid Trend upwards I think if we would like look at the data right now which everybody can do if you just go to epochai.org um I think the doubling time would sit somewhere Point 6.5 months or seven months right now we don't haven't added g54 yet because nobody told us yet how much computers is if we could achieve before we might have better insights there okay so if we were to put some numbers on it how much of AI progress would you say is attributable to compute yeah I I don't feel like I I would like to put numbers on maybe maybe we can describe it with like what Richard Sutton calls This Bitter lesson where he was saying like oh people have been trying to develop new fancy algorithms trying to learn from the brain but what we've just historically seen over the last years basically we just have like search algorithms and we just throw more computed right and like of course I mean you're talking to a guy who claims he's doing compute governance so I think it's like an important note am I not saying like oh 50 of the progress of AI is due to compute or it's 90 I can't tell there might be more analysis on the Futures that's up for the economists I'm saying seems like an important note I think it has some unique properties independent of how important it is for AI it has some unique properties which you can use to yeah achieve better and beneficial outcomes I think it's important for us to talk a little bit about how modern advanced chips are produced so perhaps you could talk about you know where are they produced how how difficult this is to produce these chips and who are the key players involved and so on yeah I think I think the disclaimer is year I think chips or integrated circuits are probably the most complex device product whatever we as humankind have ever produced so whatever I say here take a finger grain of salt I'm like trying to explain on like a really high abstraction layer this is basically an effort of humankind that we have these kinds of chips it's a global supply chain what is one way of thinking about it I think it's like useful to think about like free processors we can say that's like the design phase so you just think about like well how do we actually we have all of these transistors we put them somewhere how do we how do we put them that it actually does some Marv right I think that's like constructionally which most people actually do not get like us talking right now here everybody having a smartphone just relies on a device which is like switches and you have trillions or billions on them on a chip so that's the design phase who are people who design these chips Apple for example is designing ships they're just like hey here's our new M1 a17 whatever they put in their MacBooks or iPhones once they've designed such a chip this is just like some piece of code like eventually it goes down to like how this shift eventually looks like um we need to fabricate this chip right um this is where we then etch this chip years of history there how this has been done I think the important actors to know here is like tsmc Samsung and Intel those are the ones which are leading the cutting-edge chip production and an important another factor is there there's the company called asml which people might have heard about is this obscure company sitting in the Netherlands which sells the machines to tsmc Intel and Samsung which then produces these chips right then we get our chips out of it and the last thing we need to do is like we need to assemble this chip we test and package them this is sometimes done at another provider or it's directly done at the fap right so the important thing is like apple sells these shirts but they're not actually producing them they just think about how to design them and then you send it off elsewhere right it's kind of if if you guys think about like oh I'm printing a t-shirt for my local um I don't know football club you're not going to produce it you're going to send it up to this our person who's like producing a bunch of t-shirts and then I print your T-shirt with your local thing that's the same as apple is doing it and then eventually you get it back and then you sell these so like yeah the free steps um designing fabrication and lastly assembly testing manufacturing that's the chip supply chain and what about certain bottlenecks in in these Supply chains what would happen for example if asml cease to exist or the tsmc sees to exist how much depends on specific companies here as I just said it's a really complex product it's the effort of like all of all of us all of the whole world coming together so it's a global supply chain right um I think there's explicit examples of ASMR and tsmc are interesting so if we look at asml as almost the only company in the world which is producing is eov machines which are used for producing Cutting Edge chips so if asml sees to exist I think they're going to be like a big shortage and like we're going to take a probably gonna hit a recession or something along these lines right I mean we can continue using a machine say currently chip but we want to keep on going right that's like the history of computing just keep on pushing making this ship smaller smaller and smaller over time asml is the strongest case we literally just have one company who's producing these UV machines right but even if they move to the fabrication where we have asml Intel and Samsung for cutting egg chips tsmc makes right roughly 70 to 80 of the whole Revenue in this domain so tsmc is like a really important actor layer right which is producing all of these chips like all of our iPhones um and MacBooks and whatever kinds of chips we are using this might look different for chips which are sitting in your I don't know in your dishwasher or in your car these are usually like older note chips which I use so they wouldn't be hit by a semi directly but I mean as someone starts producing the water chips so we have what people call bottlenecks or as other people call it also choke points right which I guess we're going to be talking about later at some point which you can then use to achieve um certain goals yeah I think that's like roughly the bottlenecks I'm just like it's just really complicated producer it just costs a lot of money it has a limited number of apps and it's just really really really hard to use these kinds of chips so the stories I've heard um about yeah how hard this reduces chips which kind of like obscure things happened which would use the yield like how many good chips you get out of this yeah it's crazy how far are the nearest competitors behind the the very Cutting Edge companies here and like Intel might be like the interesting case to look at right like who was once leading this and then got overtaken by tsmc into the was this company who designed the chips and also produced them and also package them right they were like doing this across the whole thing whereas like at some point AMD came along to something like revolutionizer was like hey we actually just designed the chips and somebody else is going to be producing them how much behind are there I mean I just said like a tsmc makes up the majority of the share of these Cutting Edge chips and tsmc is just leading the field there there are some forecasting questions and it looks like there's tsmc and Samsung are going to be achieving three nanometers this year will start Mass producing this Intel probably won't achieve this but probably next year so I guess Intel is like one year behind but that just describes the the node right the transistor size so like the fabrication process it does not describe how much they actually produced I think there's still a big difference there like this materials like way more apps to Just Produce way more for usml I guess it's more trickier um I guess a bunch of the other competitors just kind of gave up and they're just hoping like well we can might not be able to produce Erie machines but maybe we can make the next thing after the UV machines right um I haven't seen good analysis there like how far the albums are behind I guess it's just fair to say it's just really really hard and we have like certain countries so like trying really really hard right now to produce these kinds of machines to produce the chips but also like the Fabs which produce the chips and I guess it will be hard for them up to maybe impossible if there's like not new paradigms coming along it's an interesting difference between between computer chips in general and the Cutting Edge computer chips used for specifically AI purposes and in terms of production and supply chain or is it simply the same companies leading both chips in general and AI specific chips I think it's it's useful to to think again about like our free steps we have the design we have the fabrication and the packaging the design companies are different for AI chips at least if we think about which AI chips I mostly think about when I think about AI governance right so I think about AI chips ending up in data centers AI accelerators mostly produced by Nvidia but for example the a100 h100 our names referred there but also Google with their tpus their tensor processing units but you only keep in-house so they are designing them right there's some like equivalent to Apple Apple also is like AI core processors on their smartphone chip on a laptop but those are not the chips which I'm talking about when we talk about like Training large-scale Systems when we talk about training gpt4 you're not using your smartphone you're not using your computer there there's a different so the design space is different but actually the fabrication phase is the same they both send it off to eventually to tsmc or any other and they're going to be then producing it for you um and the same goes for the packaging process so if we talk about cutting edge of the talk about chips like Yip they have like the three nanometer chips they're sitting in your smartphone they're sitting in your laptop but they're also sitting in like these AI accelerators for example gpus which then go to the data centers the difference is design first there right the chips are like eventually turn out to be physically different um they Implement certain functions for example like a smartphone is still a general Processing Unit a CPU right it can do a bunch of stuff whereas yai accelerators we move along the Spectrum more towards more specialized chip right it's a specialized chip which is really really good at solving parallel problems what we've learned is that so Chip production is extremely Advanced and extremely complex and we've learned that it plays a key role in in driving AI progress so let's talk about what we can learn if we try to extrapolate progress in compute and find out what this can tell us about progress in AI in general so this is what you might refer to as AI timeline forecasting just thinking about based on what we know about compute about data about algorithms when might we expect for example uh artificial general intelligence to arise what can we what can we say specifically based on thinking about compute here so it's like a question which many people are thinking about just like well when is this this artificial general intelligence transformative intelligence human level whatever you name it right and I think there we already go into the details of things like really important for people to actually say what they mean with these kinds of terms um and not just say like oh my AGI timelines are XYZ independent of that some reports are trying to operationalize this and trying to forecast this and figure out when this has happened right and compute just plays an important role there I just try to describe the AI production function and I was like well we have these inputs then we have like to say a production function and we get certain outputs and I'm asking when is this output in AGI what does this mean about my inputs right and I think they're like two famous apprents there's ways of like trying to do about it I think one of them is like actually from igr code which is this biological anchors report which ones I think on this podcast here before and this is actually also something like a compute Centric framework where she's actively asking us like well if I would need to rerun Evolution how much compute is this how much floating Point operations is this if I need to read run the title development how much compute is this so we have like these different compute Milestones right and what she then tries to do as the next step is like we have this compute Milestones we're like well if I have a deep learning model which roughly uses Edgemont compute as these Milestones it might have similar capabilities right and then she actually tries to forecast like compute available in a given year and then you try to figure out like well what is the price performance of compute this is like roughly looking to Moose law right how many chips will be produced how many flops can they calculate and then you also look into how much you're actually willing to spend right it's an important question where it's like well do I actually want to spend that much money on doing this one training run to achieve this certain Milestone and the last thing is um you need to adapt this to some type of algorithmic efficiency right over time you achieve you need less compute to achieve the same capabilities right so like you discount it over time this is like what this described as algorithmic efficiency that's one way of thinking about it and that's one way how compute and I compute forecasting feeds directly into trying to forecast transformative AI given that people use different Opera different definitions of for example human level AI or transformative AI or artificial general intelligence is is there something useful we can gain from aggregating predictions across a number of Reports say that these reports use slow slightly different definitions of what we're trying to to predict will this introduce too much noise for us to say something useful I think I think it's still useful to just like maybe look at them and just see what different forecasts think we should like uh like trying to think about are they actually trying to forecast the same and what are the different biases in play there if I think it's like hard to just throw together a survey with this bio anchors report right but if you have like two empirical approaches be it the bioencies report like maybe like another approach um I think it's fair to just like Pro like look at them right so something where you do a survey of all the different methods and then you can weigh it to your own needs um I think what I'm more interested in is like actually well it's cool to have like these these timelines you know when is this thing going to happen or something I'm excited about like the Intermediate inputs to these kinds of systems right if somebody is just like thinking about AI timelines and that's on a way to think about AI timelines they figure out what is the growth of compute this is important right just for the things we've just discussed for example there's a guy like me who looks at growth computers like well this looks interesting maybe you can do compute governance maybe you can use this now to eventually achieve something so you figure out better timelines but you also like also intermediate outputs which are useful and this is part of the big reason why I'm excited about AI timelines research it's like there's a bunch of intermediate inputs um to these kinds of models which are useful and again from empirical model it's like the bio anchors report they are way more of this intermediate input compared to a survey in a survey I'm like all I have is like maybe the qualitative entry from like some researcher but they tend to be big and most of time to just get a number so I don't know how what Credence to put on this so in the process of trying to forecast transformative AI we might learn something that will turn out to be super useful for us let's get your take on this whole issue so based on all the reports you've read based on you know your deep dives into into compute how would you think about uh when we might expect artificial general intelligence or transformative AI for what it's worth I don't think there's like something special about just like Leonard telling his timelines or his numbers or something thing I think it like really depends um more like the Intermediate inputs are more interesting yeah how would I disconnect my views on this I do think there is a significant chance that AI turns out to be a big deal you can Define it as transformative AI is Agi or something right I do think I have like a 50 chance like within the next 20 years or so there might be something what we might call an AGI or a transformative AI right what do I mean by this well maybe we can measured on benchmarks there's like this famous mmau benchmarks like yeah there's something which like scores like 95 of this maybe this system would also like pass like a really long touring test but like somebody's like really drilling down on these systems and maybe the system also wins like a MAF Olympiad and also the system is like able to control a robot you know which takes care of my dishwashers I think it's like a typical example need so like these are ways that we're trying to operationalize this for example within with an Epoch if you're trying to think about forecasting our timelines like if I try to take this approbialization it's like yeah I guess within the next 20 years it's like a 50 percent chance what's happening and then I have a really long tail right if it's like not happening soon I was like well I don't know maybe there's like some magic sauce to this so it just really yeah it just will take a longer time we need to reinvent like the the whole thing again to eventually to get to AGI do you think robotics is in the same category as more cognitive labor performed by AI it seems to me just looking at the landscape that we're making amazing amazing we have amazing developments in what you could call cognitive tasks but less progress in robotics so if if we Define find AGI or transformative AI to include robotics that might significantly delay our prediction of when it will arrive it's it's like actually part of the key thing it's like man maybe maybe we have like those really smart systems we can do a lot of these things but like it just got really bad hands right um and me as as like active like the regular engineer who did like some robotics I was like it's really disappointing how like how slow progress is there it's still progress don't get me wrong right if you look I think everybody here watches like from Times Like These Boston Dynamics videos right you see those robots doing crazy stuff but there's just like still moving physical things our hands are like pretty pretty good on a bunch of stuff and it turns out this is really really hard to get this on a computer so I'm pretty confident there will be like there might be right now a system which knows how to clean up my dishwasher and it might now have to really efficiently but just like the sensitivity or like the logistics of moving your hands fast enough and like precise enough it's just gonna be like a really hard thing I actually think like robotics is like a thing which people just sometimes anchor on too much whether just think about like oh AI is only dangerous if it if it can move or has legs or something I was like no actually our life is like so digital anyways most of our critic infrastructure is controlled by Computing systems you don't need hands to do a bunch of stuff right us having this conversation right now has nothing to do with us having physics hands this could happen completely by us being AI simulator in this whole conversation being made up true true okay so how much do you think robotics depends on compute is there a is there a compute bottleneck that's holding robotics back or is it simply or not simply but is it is it more about designing uh accurate hands or accurate sensors or or something else yeah I think I think it's mostly about the letter so like funny enough like if if I say like today's systems use more more compute you also have the problem like these systems are big right deep learning big systems billions of parameters if you want to run to run them locally you sometimes can't because they're just too big they're too computation intensive so if I just put like a Jeep D4 in a Boston Dynamics reporter this thing is going to be out of battery kind of kind of quickly because you just need like four a100 to like run the system at a speed which is good enough so this will be seen like from a papers on Google but it's like yeah we had like this really small model because yeah we couldn't use a bigger model right and I guess sick most most of the stuff still boils down to just like how good are robotics hands these kinds of stuff you would be good enough at controlling it but you're simply not sensitive enough and like precise enough 50 cents I do expect AI can help on this just AI can accelerate research there are some resources right now how to make better robotic hands how to build better robots in general they now use AI tools to summarize their research or maybe the items to bring some new ideas and maybe new ideas come along maybe in the future we even have ai systems which have like completely new ideas how we will be building robots right looks like Evolution figured it out looks like Evolution figured out good movement way earlier than good brains right we could walk before we I mean probably not walking but at least crawl and like jump from two to three before we could like actually think really well so maybe we have some reason to believe there's actually not such a hard problem and like I think I'm getting like a bit confused why we haven't figured it out yet maybe we have an option of simulating physical environments and learning how to do robotics in simulation before deploying them and that I think would would make our robotics progress more dependent on compute so if we could run huge simulations these simulations would be extremely computationally intensive but that that is a possible path forward I don't know whether that actually gets us all the way there to real world robotic interaction what do you think about this dude do you have any insight here it seems right definitely can help with this I think we generally have this form like like this Sim to reel just like yeah like our simulations are complex but you've seen reality of being more complex and I guess this will just continue being like some kind of barrier um and like yeah I'm definitely excited of people trying to think more about this like how how would this actually translate you know if I have this robot moving in my simulation can I actually actually also move in the world right might be easier for your Roomba you know just like driving around and taking it only it's like a 2d space but like a walking robot with legs seems way harder yeah okay so we talked about the we talked about the AI Triad of of data algorithms and use and perhaps one useful thing here is to talk about how these can be traded off against each other um because this might tell us something about how important compute is so what do you think for example if we had the perfect data set how much would this mean in terms of AI progress combined combine the perfect data set with the compute and algorithms available today yeah I would love to know the answer to this just like what is the perfect data set one thing we've yeah like ran into when we like tried to collect just like what are the data Trends how much we've been using just like but actually the different dimensions of data right I can measure data like how many data points how many totems how many pictures how many gigabytes but we all know there's a different quality in data right and now we get into the tricky things like well how do we measure quality right um on these types of data one thing how you can see just like the data quality matters if you for example I think it was a chinchilla paper which is a paper by deepmind so you have a big data of text right and then you train the system on this text and usually it only gets succeeded the text only once it only has like one Epoch per text but interesting enough for example for um given chinchilla that it needed a lot of data because they found new scaling laws they actually did run twice about the Wikipedia text I was like well Wikipedia that's like a really great text you know it's like really high quality data it's like maybe closer to the truth than just Reddit or like a bunch of other stuff on the internet so let's train the system twice on this because it's better and we have like good reasons to believe if we look at humans that actually like data efficiency can be way better I'm claiming a more data efficient than gpd4 at least for the moment um I don't need to I mean I haven't read as much but you know and some things I can maybe outpace gbd4 um so there's lots to gain there and if you use less data um this also means we have like smaller training runs to some degree right unless we just show the data way more time so like it's it's not clear how it directly translates but in general means like more high quality data probably is useful probably better than low quality data I don't know how much better and less data means also less compute for these kinds of systems so I think there's definitely progress to be made over time and I think this is also partially where we've seen where we've seen progress over time right if you just had better data when more beta data became available so my impression is that we are actually using something close to all the data available online to train the biggest models is that also your impression do you think that's true that we are we are reaching limits of how much data we we have available online yeah um it's not like I think that's more data so we try to look into this and I try to be like well how much more text data is out there and you you start becoming a bit creative right like oh like we we don't produce Text data right now we produce voice data but guess what we can run another AI system on this to produce more text later cool I don't think I don't think this podcast is is the thing that should I mean maybe that's the thing they should be training on I don't know but there is YouTube out there let's just take all the YouTube videos in the world let's transcribe them here we go we have way more text this is high quality data probably not most of it's just like probably not that good some of it might be better right like some pretty good podcasts out there so there's like more data to be acquired and we try to forecast this like how much videos are getting produced if we like transcribe then how much my data do we get out of this and then we were like thinking about like looking it's like well it looks like at some point we might run off data just not for the simple reason there is lots of it out there don't get me wrong but we're also just using lots of it and we try to predict the stem of like scaling laws which is like this thing which describes where we have a network of the size how much data do we need and how much compute do we need to train it like pretty much optimal and it's unclear if these scaling laws can continue to the scale which we discovered there but there's definitely a thing where you might lack high quality data and I think right now I think reinforcement learning from Human feedback is already such an example it's like actually quality matters I think you know like this human feedback is like actually way more useful than training your on like a bunch of Scrappy data on the internet so I think I'm more interested in just like there's more data out there but you want to get high quality data how do you get more high qualitator for the things you actually care about for the things you actually want to do yeah so when when I've played around with tpt4 for example is some of the some of the output text is very clever and uh you we might be attempted to use that output text to train new language models this sounds like some kind of Ponzi scheme or some kind of magic where we make up data out of thin air do you think that would work do you think do you think output from from previous language models would be interesting as training data for new language models I don't know if it will work I think my guess just like somebody will like figure this out because incentives are definitely there because here we go this is just like the easiest way for us to like acquire more data and I guess companies will look into this um and personally probably not excited about it I was like ah AI systems feeding into AI systems I don't know I think this is like how failure could look like you know you just have like these Loops of AI systems you have like barely an idea where it did go wrong right and we have like this two layered black boxes well congrats you know we make it even harder to understand everything so I guess it would definitely go for this yeah trying to use this synthetic data to something you're just like trying to make it up but also just like maybe you have like I mean I think maybe an example is like actually this this llama model which came out of matter and then Stanford made a pack out of it right and I think if I understand it correctly correctly they like fine tune unit based on feedback from chatubity there we go or with an example you don't need human labor anymore you can skip your Amazon turacus you just use chatibility it's way cheaper turns out it was like cheap pretty good and this model is like performing really well so I expected at least for fine tuning it will be useful but I think we should like really be conscious there and just like again black boxes feeding two black boxes I think it's not an ideal scheme from My Lens who wants to understand these systems and make sure they're actually yeah well understood what about trading off algorithmic progress against the data or compute so so imagine if we had optimal algorithms how much would this matter for example I saw a deepmind recently made an advance in algorithmic matrix multiplication I believe which is kind of a core process of machine learning how much does this matter could this make say compute less relevant because algorithms would be so efficient that they would need less compute to run there there's definitely history I think it's just like I mean if we go back to the eye trainer algorithms is one thing and the question like well how algorithmic efficiency developed over time the promises I think it's really hard to measure and the best ways of measuring it is right now we look at the Benchmark which we use in this case um people used imagenet and looked and it's like well how much compute do I need to achieve the capability X on this Benchmark over time and Bobby historically seen there the compute used is halfing I think every nine to one year right so like in a year from now I can shift the same capability right now for half of the compute of the previous system which achieves its capability that's a big deal right but again compute Trends have been like a bit faster right because eventually it's not about achieving capability X it's about like you want to keep on pushing it Frontier right I think just like every percentage point of accuracy every percentage point of like better capabilities is actually useful because we're now entering this new error where we're actually trying to make money out of these systems and sometimes single digit percent points just matter there's just the difference between we all have autonomous cars or we don't have any autonomous cars right now but it's like yep we have like really high standards for these kinds of systems so algorithmic efficiency definitely reduces the computer use the question is like where do we continue pushing our AI systems and historical we've seen just like bigger and bigger systems and then you have like better gains due to algorithmic efficiency but also due to because we just throw more computed the problem more data okay let's let's you have this graph with different eras of compute usage in machine Learning Systems um where starting with the pre-deep learning era and then the Deep learning and then large scale systems who could try to describe this graph for us what what's the key lessons from this yeah so this is coming from our paper as a three errors of machine of compute usage and machine learning or something along these lines um so like the key thing is what we looked at like we looked at training compute which we've just discussed and trying to figure out like how this has been developing over time and before the time we mean I think we started in 1958 or something like when like the first advantage of AI has been happening and this is what we call the pre-deep Learning area the Deep learning area roughly emerged in 2010 to 2012. and within this area we roughly see the training compute is doubling every two years this reminds of us another law which might a lot of people might know which is called muslaw um Muslim exactly describes the trans uh transistor density on ships but I think in this case we can just roughly say well the price performance doubles every two years you know every two years you get a chip which is you know twice as good for the same price this basically means in this pre-deep learning area people just always spend like a constant Budget on on training these systems you know like maybe this constant budget was just like yep the CPU the processor which was like right at this desk of this researcher but at some point in 2010 2012 the Deep learning area emerged right famously with LX in it maybe with our systems before they did one new thing they use gpus Graphics processing units which are really really good at computation of parallel problems at matrix multiplication right which is the key thing which we do for our training these AI systems and with this error we basically see like wow people are starting building bigger and bigger systems deeper systems right deep learning that's what we do here and to compute growth just like Skyrocket right and I think there's like it's really important it's like doubling every six months right from 2010 to 2012. that's a big deal right if you double something for six months this cannot go on forever let's talk about this question of whether this can go on because so far we've talked about you know companies now reaching three nanometers uh which is extremely tiny probably close to the physical limits uh you tell me and what what does this mean uh are we running out of a possibility to create denser computer chips yeah let's take a step in just like well computers doubling every six months can this go on forever so what I've just I think yeah when we do when we say like we spent more compute on these systems what are the factors which enable us to spend more compute on these systems well it looks like computers get better over time how much better do they get over time I think moose law says they're double every two years um recent investigation by my colleagues Tamaya and Marios looked at the price performance of gpus in a roughly found well the price performance doubles every 2.5 years right so every 2.5 years you get double the amount of floating Point operations for the same cost when compute then doubles every six months this basically means we just spend more money that's what we do right we spend more money on the system than it looks like people did it because it paid off of the capabilities there are limits to how much money we can spend if something doubles like every six months so if you crunchy numbers and you roughly assume like hey it's doubling every six months and price performance half like every 2.5 years so this buys you more over time at some point you hit like you spent one percent of the US GDP or you spend like similar amounts as the Apollo project on these kinds of training systems is this likely I don't know it depends what the economic return is of these systems are people actually incentivized to do this um it depends if you actually have so much compute out there you know do we have data centers actually big enough that you just like can't burn so much money it's not like if I wave with money immediately compute the peers maybe a couple of years later it appears if I wave with the money and as you said well I've just been saying what way price performance um has been like halfing every 2.5 years or like doubling every 2.5 years we just continue right and and now we get into the weeds we're just like every electric engineering hates you photos I'm like well will Moore's Law continue is this actually true I'm just like man I don't care about Moore's Law I care about just like price performance will it continue um my rough guess is just like yeah there are road maps out there I think when I probably at least find for another five years of making transistor smaller you're right at some point we we hit like new barriers just like how small they can these transistors make but quick to us you're like human intuitivity it's been great you've just like continue doing this we throw exponential money out at this but we also get exponential more gains so I guess there is more to to do there but even even if moose law stops I think it's like an important concept to understand is right now we have this really short on these Cycles people want new iPhone every year right every year we want the new processor so your r d spending for these chips is immense If instead we have like longer r d Cycles this means we spend more money of developing this we might have better economies of scale so while your performance per chip stays the same you can sell the same chip over five years compared to usually one year so we might see better come to your scale so the performance stays the same but the price might drop which then eventually meets the price performance it might continue to yeah go up over time that you just get more for your money um but yeah key question for people to figure out what's going on there really hard question because you're just trying to forecast basically yeah just on the future of humanity the future of computing and historically I mean most law was pretty good at this I don't know about the other forecasts on this do you think there's some other Paradigm coming after this this continual punt or continual attempt to try to squeeze more and more transistors onto a chip do you think there's some you know some Quantum phenomena or some 3D Computing or is there anything you could see as an interesting uh heir to to the present Paradigm yeah I'm guess there will be like in the beginning we'll have like hybrid approaches or something right he was like well our chips you know like they're like we're hitting the limit of small we can make but maybe can stick them or you can just like put them all together like what Apple did with M1 Ultra it's like well we only can make a chip this big but what if we just put them next to each other with like a high performance interconnect or like what amd's company is like well we chips only get this code but let's just put four next to each other and like trying to connect them so like there's like a bunch of hybrid mechanisms like how you're trying to work your way around this right putting more chips next to each other doesn't sound to me like the same kind of progress as making transistors smaller and fitting more of them on a chip that seems right it's definitely it's definitely a different different out of progress there but like I think what I'm saying is like yeah and they're like different ways around this and there's like one way how you use hybrid techniques or like new techniques and they always like people to start thinking about completely new Computing paradigms right like in the beginning we had like those big relays and then eventually moved to integrated circuits and while integrated circuits have been powering us for like the last 60 years or something or like 80 or something the question's like what's next Yeah you mentioned Quantum Computing I'm not expert on this whenever I talk to people like probably it's overhyped as everything um and even like I think it's diff is trying to solve different problems there but I'm guessing the wrong person to talk there what else is out there there's something like neuromorphic Computing like more like analog Computing is brain inspired Computing I expect us to make like more progress there and this is particularly interesting when you think about AI cases my rough guess right now is like at least the things which I've seen it's pretty interesting for inference but not for training systems yet and I also don't know my other thing is like this is still process on Silicon I just don't know if you need Cutting Edge ships right um I would rather call it like to some degree it's like design Innovation you produce different types of chips but it's still silicon chips and other things which are the optical Computing I have no clue about Optical Computing if this is promising or not people will like figure this out over time and at some point it might become cost competitive I'm pretty bullish that it's not going to be happening in the next eight to ten years or something but then ask me again and we can see where we are so to what extent if we're trying to predict what's going to happen with AI progress based on compute how much should we take into account these kind of wild cards or or Black Swan events or whatever you want to call it where suddenly perhaps we get a much more intense progress in compute based on something like you know Optical Computing or Quantum Computing or something that that an advance that we hadn't anticipated an advanced that's not part of the trend line that we're extrapolating we should take an account I would like put relatively low probability on this it's it's not like the case nobody is trying right now to just build a processor right there's money to be made if you figure out like something better than what tsmc is doing and just to replace this billion dollar industry with something cheaper which you can build in your garage here we go I think you just you just won the lottery congrats I think a bunch of people are trying this the same goes for algorithmic Innovation everything around computation eventually I think a bunch of stuff just boils down to just like this this exponential growth or like some kind of growth maybe I'm just the economist you're like yep it's always three percent GDP growth you know that's that's always the thing that's always what's been happening I was like yep it's always Muslim it's just like this the self fulfilling um thing which you're like um trying to do so do I put a lot of probabilities on these kinds of events most of the time not because I care about like price performance you might tell me like well it's not going to be a flop in the future it might be something else it might be light bending whatever I was like yeah it might be the case but eventually what like I care about the output or something right that's the thing we we gotta look out for there I'm probably compared if you look at the eye Trend like do I expect like major breakthroughs and like better Computing performance probably not do I expect major breakthroughs for algorithms yeah I have like a higher probability there because we have systems like our brain rooms like there there is some good algorithm out there you know it's like it's running here right now and it's pretty energy efficient it's more energy efficient than gpd4 so I'm like there is something out there right it's physically possible there there I have some proofs I have like higher probability that these kinds of things might be happening tell us the difference between AI timeline forecasting and AI takeoff speeds um the timelines are trying to usually I mean sometimes people just say one year but we're trying to figure out like the time and point when transformative AI AGI whatever you want to Define is emerging so it's a point in time and usually people try to put like uh probabilities on this right there's like some kind of normal distribution around this or some kind of distribution around it whereas takeoff speeds is a Time duration where people are asking like how long does it take to go from A to B and then we already start getting to the weeds how do you define eventually A to B one way how to go about this is just like what a recent report by Tom Davidson trying to do is they think about just like when can we automate most of human labor in the world like most cognitive tasks or something it's like well the takeoff is and I'm not sure if it's exactly these numbers but something like from 20 to 100 of like having all human cognitive or like I think all human labor um eventually automated other people think about takeoff differently others are just like well takeoff is the thing from the first artificial general intelligence defined by some metrics to the point where this thing is going to take over the world something along these lines so like take off just obviously consideration from A to B and what these A's and B's are everybody has like different ideas there and I think all of them are like somewhat useful but all of these models end up looking different and some of these models are more attractable some of the models are just way harder yeah we could have two people with the same AI timelines say that they predict AGI by 2050 but different AI takeoff speed predictions say that one person predicts that we will have a slow take on say that we will have progress beginning in the 2030s and and being steady all the way up to 2050 or another person predicting that we will have uh much slower progress until 2048 from which we will have extreme uh extreme progress so that's that's kind of the the difference we have there is there anything to learn from studying compute specifically about takeoff speeds and here I'm I'm thinking for example this is just my kind of uneducated guess that if we know something about the compute how computers is produced and what how much compute is available this puts an a limit to how fast a takeoff we could have do you think there's something to that absolutely I think there's something to it right we just like look at compute just like what does it matter for takeoff um we might have discussed a bit of scale like maybe we can go back to the bioencres model you might be saying well takeoff is from the childhood Milestone to the evolution Milestone right because they both describe the amount of Labor you can automate or something then both of them are defined by compute here we go compute is useful I think where it's more interesting to just like actually look at a computer as a limiting factor for takeoff right we need to deploy these systems we need to train more systems so do we actually have enough compute for this right A lot of people are just like well if current threats continue and we have like another 10 years of of this progress we use this much compute and asking like well do we actually have so much compute like how much ships does tsmc actually producing so the production capabilities of like Fabs how much chips are going to be out there just just meta for these types of questions it also just matters like what about other things like at the moment is something it's like mostly producing like chips for MacBooks for smartphones and AI chips like just under a small minority of these chips will AI be economically important enough will people believe this that they actually re-steer and produce less smartphones and Moya accelerators or will you just build more Fabs and there are limits to this right if like the US is trying to do build perhaps right now it just takes a long time and it just really really hard and just costs a lot of money and all these kinds of things eventually can then inform form how do you feel about takeoff and just like compute being a bottleneck there right and not too important like important geopolitical events right if you don't have access to compute or if you think about tsmc which is sitting in Taiwan and there's a China Taiwan Invasion this plays in a really important role just like how many eye chips are out there are they accessible and distant feeds into just like are we actually doing these large training runs are we actually running enough AI models so we have like some some yeah certain percentages of the economy being automated perhaps a very fast takeoff would require something that we haven't seen before in terms of algorithmic efficiency for it to overcome this this bottleneck of of compute which we talked about what we can learn from trying to forecast AI progress because it might seem like a a sort of you know whether whether we have transformative AI by 2050 or 2055. I don't know how much that matters but you talked about how we can how how there are important insights to be gained from studying from from trying to forecast Ai and the thing you've landed on is just the importance of compute so perhaps you could here introduce this notion of compute governance that you've landed on yeah I mean as you just said like computer is one of these inputs and like well looks like an important input um what can we do um what is my definition of compute governance so like when we go back to this air production function we have like these inputs we have algorithm State and compute and my claim is well if I if I wiggle this compute node I can I can do something about like AI down here like the deployment of AI systems the training of AI systems eventually also this being used in a beneficial way that's like that's a rough claim that's what I describe as compute governance what the outcomes are there are various different things you can eventually achieve of this that's what I'm trying to figure out fantastic
Related conversations
AXRP
15 Feb 2026
Guive Assadi on AI Property Rights

This conversation examines governance through Guive Assadi on AI Property Rights, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -1 · 136 segs
AXRP
28 Jun 2025
Peter Salib on AI Rights for Human Safety

This conversation examines governance through Peter Salib on AI Rights for Human Safety, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -3 · 196 segs
AXRP
27 Nov 2023
AI Governance with Elizabeth Seger

This conversation examines governance through AI Governance with Elizabeth Seger, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -7 · avg -8 · 110 segs
AXRP
28 Jul 2024
AI Evaluations with Beth Barnes

This conversation examines technical alignment through AI Evaluations with Beth Barnes, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -4 · 120 segs
Counterbalance on this topic
Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.
Mirror pick 1
TED Talks
18 Dec 2023