Library / In focus
AXRPCivilisational risk and strategy
Suing Labs for AI Risk with Gabriel Weil

Why this matters
This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.
Summary
This conversation examines core safety through Suing Labs for AI Risk with Gabriel Weil, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Perspective map
MixedTechnicalMedium confidenceTranscript-informed
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
StartEnd
Across 111 full-transcript segments: median -7 · mean -8 · spread -31–5 (p10–p90 -18–0) · 17% risk-forward, 83% mixed, 0% opportunity-forward slices.
Slice bands
111 slices · p10–p90 -18–0
Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.
- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 111 sequential slices (median slice -7).
Editor note
A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.
ai-safetyaxrpcore-safetytechnical
Play on sAIfe Hands
Episode transcript
YouTube captions (auto or uploaded) · video hdQDZo7jGBY · stored Apr 2, 2026 · 3,205 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/suing-labs-for-ai-risk-with-gabriel-weil.json when you have a listen-based summary.
Show full transcript
hello everybody in this episode I'll be speaking with Gabriel Gabriel is is an assistant professor at turo law whose research primarily focuses on climate change law and policy he's recently written about using Tor law to address catastrophic risks from AI which we'll talk about in this episode for links to what we're discussing you can check the description of the episode and you can read the transcript at ax R p.net well welcome to axer uh thanks for having me sure so um I guess we're going to talk about this paper you have it's called Tor law is a tool for mitigating catastrophic risk from artificial intelligence um and I think most of my audience won't be from a legal background and they might stumble over those first two words like what is tort law so torts is the law of basically anytime you do a civil wrong that isn't a breach contract so breach of contract is you promised somewhat in some legally enforceable way that you would do something and uh you didn't do that you breach the contract they sue you basically anything else that you're suing um a private party for it's not you're not suing the government um uh a civil lawsuit that is that is going to be a Tor so most commonly you think of things like you get in the car accident and you sue the other person that's a Tor um there's also intentional torts like battery so if I punch you in the face you cons sue me for the harm I did to you if I trespass on your property um or I you know damage personal property like your like I break your car window or I key your car right those are all those are trespasser torts that's called trespasser chadels um products liability is torts medical malpractice is T so that's that's the broad field of law we're talking about gotcha great so so yeah how I I guess you have some idea for how we're going to use Tor law to mitigate catastrophic risk from artificial intelligence in a nutshell what's the idea how do we do it yeah so in a nutshell the idea is that training and deploying these Advanced AI systems is a risky activity um it creates uh risks of what are called externalities and economic J right so it's harms to people um that are not internalized to any Econ economic transaction they harms to third parties right so so uh you know open AI makes a product someone buys that product and uses it but it risks harming someone uh be besides those two parties right and um therefore the the risk is not reflecting the price of the product right um and so in principle holding whoever is responsible for the harm whoever caused the harm liable for that making them pay for the harm they cause should result in them taking enough precaution uh to to sort of optimally balance the risk and reward right so in the same way that in your own life right uh when you're deciding uh you know whether to drive to work or walk to work or bike or whatever right there's there's various cost and benefits you're weighing those uh you know some of them Financial costs health benefits whatever time costs um and you're able to weigh those risks for yourself we generally trust that process to work pretty well because you fully internalize um the benefits and cost there you you might make some mistakes sometimes but like your incentives are pretty well aligned but if if there's some sort of externality to what you're doing so you driving to work uh creates pollution right um that's mostly most most of the harms that that are felt by other people um then we might need some legal response to account for that and so uh Tor law when we have works well for that when we have a concentrated harm so for pollution it wouldn't work so well necessarily for everyone to suit everyone that's harm by your pollution um to sue you may want some some other sort of policy tool like like a tax on gasoline or on on pollution um to um to toine your incentives right but if you hit someone with your car right and if there's a risk that when you take that trip you're going to hit someone uh making you pay for the harm you cause works pretty well uh to align your incentives to make sure that you you know you don't take the trip if it's not necessary um or and you drive with appropriate precaution you know text while you're driving you don't um you don't speed too much right um now our existing Tor system might not do that perfectly but in theory uh the prospect of liability should make you exercise the right amount of precaution so that's basically what I want uh companies that are building and deploying this AI system to do is to is to have the incentive to to exercise the right amount of precaution um I don't think our current torch system necessarily does that but I think it can with tweaks gotcha so so to summarize it sounds like um basically the fear is if you're an AI company there's some risk you destroy the world and that's like much worse for the world than is for you so you're just going to be like you know you're just going to be more relaxed about that risk than you would be if you like really felt the responsibility you know if you really like had the you know had had we're internalizing just how bad it would be if the world got destroyed or you know some other sort of catastrophic risk so in the limit that's definitely part of what I'm worried about but I think even for for Less catastrophic scenarios uh there's there's there's harms that wouldn't necessarily be internalized by these companies but yes in the limit I think it's definitely true now obviously if you destroy the world you kill yourself too so that's like bad for you but yeah it's not as bad you don't feel that as much as as killing you know 8 billion people and all future civilization right that and so um yeah I I think you're you want the law to sort of make them feel that right so it seems like the obvious difficulty here is that if uh you have like if you suppose you literally destroyed the world right it's sort of too late to sue yeah right um and this is sort of the first pass difficulty with like some sort of liability scheme so how like how do we deal with that okay so yes it is TR that there are certain class of harms of which uh existential harms are a subset um that that are what I would call Practically non-compensable right so you can't actually bring a lawsuit or you can't collect a judgment for it so I think that that includes Extinction risks it also includes um risk that of harmers that we sufficiently disruptive that the legal system would no longer be functioning and it also includes risk short of that that would just be financially uninsurable right so if you kill a million people uh the damage reward is going to be so large that uh that you're not going to be able to pay that out that no plausible insurance company is insurance policy is going to cover that and so it would put your company into bankruptcy and most of the most of that damage is not going to be recoverable um and so the normal type of Damages that we use in toret lawsuits which are called compensatory damages damages designed to make the plane of Hall to make them in theory indifferent between having suffered the injury and getting this pile of money or having it never happened uh those aren't going to work um to get at those risks we need some other tool and so the tool that I propose are what are called purative damages so these would be damages over and above the harm suffered by the plantiff at a particular case um that are that we're assigning because there was a risk that that things went much worse right so system does some practically compensable harm but it does in a way that it looks like it could have gone a lot worse say there was a one in 10,000 chance of killing a million people and a one in a million chance of of causing human extinction right then you would want to sort of figure out what the expected value of that harm is the the probability of the harm times the magnitude and say well we're going to pull forward that that that sort of counterfactual liability from the risk that you took but wasn't realized uh and allocated across the practically compensable cases gotcha so so the way I understand the policy is roughly like we're going to say that if you when you have some sort of AI harm if it's the kind of harm that's like really closely related to these sorts of harms that I couldn't see you for um like Extinction like you know some sort of catastrophe that would super disrupt the legal system or just a judgment that was that's like too big to collect on then basically like the punitive damages are going to say like okay you know how like what types of harms are that related to like how similar is this to those harms how how much would like dealing with the problem that you actually caused um fix those harms and basically the punitive damages are going to reflect those factors roughly right so I think we care about really two things uh how much uninsurable risk or not practically non-compensable harm uh risk was generated by deploying the specific system that you deployed in the way you did and then we care about this specific harm um if if you had a strong incentive to avoid this harm uh and and you took costly measures to prevent it or reduce its likelihood or severity to what extent would that tend to mitigate the other durble risks and and the more that's true the more punitive damage we want to load onto those types of cases right gotcha um this is a bit of a Tang but you mentioned that like you know at some level of harm like there's no reasonable you're not going to have the money to pay off you know to pay the damages for this harm even like getting an insurance policy no insurance company would be able to pay out how how much harm is that like like how much uh yeah what what are the limits of just like making people buy liability insurance for this kind of thing so my understanding is that current insurance policies tend to top out around $10 million uh in the paper I use a much higher threshold for an insurable I use a trillion um I think we would want to sort of push the limits of What's insurable um to really find out but I think that's a sort of open question that uh that needs to be explored before this is fully ready to implement gotcha I think you would want to start out by assuming the unability threshold is sort of on the lower end and then if they can prove well I can ensure for more than that then you would say Okay harms uh the expectation of harm or the the possibility the risk of harms below that we handle that with compensatory damages um since you've shown you you can you can take out a hundred billion dollar insurance policy then that's going to be the cut off risks of of harms above 100 billion those will be what are going into the per of damage calculation right and wait did you say 10 million or 10 billion for how much people 10 billion okay um and I guess to sort of bullar that uh like in the US the value of a statistical life that Regulatory Agencies use is like $10 million per life or something so 10 billion is like you can get insured for like the legal liability for killing a thousand people but like not 10,000 so that's the rate order of magnitude as I as I expl in the paper the the numbers that are used by regulatory agencies are a little bit different than what's used in torque judgments um those tend to be a little bit lower in large part because uh the way mortality harms are Valu don't actually value the the value of the person's life to them sort of weird Quirk of Tor W that I think should be fixed for other reasons and is more important in this context uh but that's sort of a tag it sure sure gotcha so so this is the scheme uh in in the paper you sort of focus on how it could be implemented in a US context um if people are listening from like other countries or other legal systems like how how widely applicable is this sort of change so I guess there's two ways of acing that I think the the beginning of the paper that lays out like what the ideal regime looks like I think that's true regardless of sort of what status quo legal system you're working from uh in terms of what doctrinal or what legal levers you'd have to poll to get to that outcome or to get to that regime that's going to vary across countries um I would say in any common law country so that's basically englishspeaking countries uh Tor law is going to be broadly similar there will be some uh detail difference particularly products liability regime since those came came later are going to be different uh but the basic structure of negligence law uh is going to be pretty similar across all all common law countries and so a lot of the same uh considerations are going to come into play but I I would invite Scholars working in other legal systems to sort of flesh out what this would look like uh and and and precisely sort of what levers you have to pull in their system gotcha so one somewhat basic question I have about your scheme is suppose that um you know party or company a uh makes this like really big uh super smart Foundation model um party B fine tunes it you know makes it really good at a certain task um party C then like takes that fine tuned model and you know sells it or deploys it or something um suppose uh the model that c deploys causes harm like who are we imagining suing for that so who's being who's suing is easy it's who ever harmed right I think you're asking who's sued yeah yeah yeah okay um so I think you could potentially Sue all of them right and so if you're talking about under my regime where you know my preferred regime where we're we're we're treating uh uh training and deploying these models as abnormally dangerous uh you would have to sort of figure out at what at what step and and which actors took undertook these abnormally dangerous activities right if all of them did then strict liability would apply to all of them uh if if the way this the application of the standard Works uh you would say well this this step in the chain wasn't Deb normally dangerous then that would be assessed under negligence principles and you have to be saying well did they breach some duty of care right but in principle there could be what's called joy and several liability where uh the playf can can sue all of them or pick which one they want to sue you can never collect more than the full uh damages but you can sort of um pick your uh pick your defendant um now different states have different liability regimes for that purpose so Most states follow what's called join and several liability which means you can collect the whole judgment from any one of them and then they can sue their joint Tor feaser they're called uh for what's called contribution basically for their fault abortion share uh then there's other states that that use what's called several liability with apportionment uh where you can only Sue each defendant for their fault a portion share of the liability um and fult DEP portion is just this idea that if you have U multiple defendants or multiple Tor freezes that are causally responsible for the same injury you do some sort of allocation based on how how faulty their contact was or in the concept of strict liability like how how liable they are that concept doesn't apply as well in the strict liability context uh but you would want to do a similar analysis okay but but broadly the sense is like well a court would just or you know the court would just decide like you know who's who was actually responsible for the really dangerous stuff that the AI ended up doing and they would be liable for the thing so I want to make a distinction there when you say courts I assume you mostly mean judges uh so there's different role that judges and juries have in this process judges resolve questions of law juries solve questions of fact is the high level distinction and so you know say in system breach of Duty is a is a question of fact but what the duty is is a question of law uh if we were talking about the threshold question of is this activity abnormally dangerous such that strict liability should apply that's a question of law that a judge would would resolve okay okay I think that makes sense um and so I guess the the final kind of framing question I want to ask about this is um it it seems like a lot of this would be implemented on the jury side right like a judge would tell a jury you know this is roughly how you should figure out the damages and you know go and deliberate and tell me what you decided is that right so certainly the actual damages calculations would be fact questions that would be decided by a jury in a in a typical case um that the way judges review those is is they if they conclude that no reasonable jury could have reached a given result and they can overturn it but juries are supposed to have pretty wide discretion um now whether punitive damages would be available at all is a legal question it's a legal question resolved by judges so under current law it's uh requires malice or recklessness as a sort of threshold requirement for for punitive damage to be applied there's also uh various limits under under the due process clause of the Constitution uh that limit sort of uh the ratio of of compensatory damages to punitive damages uh those questions would be resolved by judges and so Jes would sort of be operating in the confines of of those legal rules gotcha um I I guess my question is one could imagine that juries are like really good at assessing these kinds of things you know they're just very good calculators they like you know really figure it out one could also Imagine The juries just sort of roughly do what they feel is right and you know maybe they're like forced to be in a certain range by a judge but like maybe they're kind of random or maybe they like you know stick it to the bigger party or something like that um and in in the second world it seems like it's just going to be hard to implement this kind of scheme because maybe we just can't tell juries what to do so I guess my question is how good are juries at like implementing formulas and stuff that uh judges tell them to so most damages calculations are pretty black box right it's like well what's the pain and suffering for certain things we can we can assess better so like lost wages are easier to quantify right pain and suffering is inherently pretty hard to quantify and and that's regularly part of Damages award we just sort of deal with the fact that you know uh there's going to be experts to testify and then jury sort of come up with a number right uh in this context I think you would have dueling experts right where uh different experts are testifying and saying well this was the risk um you know obviously there is deep disagreement in people who think about AI W and AI safety about How likely uh these catastroph outcomes are now hopefully the context in which a system failed in a certain way where it looks like it could have gone a lot worse will be probative on the question we're not trying to ask the question of what's the global probability that human extinction will be caused by Ani system right we're trying to ask what is the probability that this system what what's the probability that the uh the people who trained and deployed it when they made that DEC those decisions what should they have thought the risk was right and we can update on the fact that it it failed in this particular way we don't want to over update on that because um in some sense like expost the risk of of a worse harm is zero right and they didn't know it would fail in this particular way but the fact that it failed in the way it did can reveal some things about what they knew or should have known at the time they deployed it right and so I think yeah jurries aren't going to do this perfectly but I also don't think they need to right so what really matters here is the expectation that if you if you take a risk you're going to have to pay for the expected harm arising from them this and so as long as juries aren't systematically biased in one direction of the or the other as long as they're very roughly noisily coming up with a number that that that tracks the the risk um that's going to do what you need to do in in terms of generating the expectations of liability so that's a failure mode I'm sort of less worried about than others okay and and so so just to like give color to this it sounds like maybe what would happen is like there's some sort of trial based on some like harm of you know resulted resulting from something kind of like Ai misalignment and then like you know the defendant expert the defendant's you know legal team brings up some expert to say like oh you know this wasn't that bad and it's not that related to really scary harms and the other expert says no it's like really bad and and you know somehow like the juries are picking between like different people suggesting different things about what the damages can be that are kind of guiding their assessments is is that a reasonable thing to imagine yeah it's how I would expect a trial like this to go or the damages phase of a trial like this to go okay great um I guess a question I want to ask is um if I think of most AI governance work I think that it it kind of operates in a framework of saying okay our plan is going to has two parts firstly there's just going to be some process where like the government or a bunch of smart scientists figure out what AI can do how scary it might be and make that that really legible to regulators and then secondly there's going to be some kind of law or some kind of regulatory body that says that if you make a really big scary AI we're going to set out some rules for you to follow and you just have to follow them and we're going to design the rules so that if you follow those rules thei should hopefully be safe um and this feels like your your proposal feels like kind of a different flavor um than these sorts of proposals so I guess I wonder like how you think these kinds of schemes compare okay so to my mind the key advantage of a liability framework is it doesn't require that you and the government know what what specific steps have to be taken to to make your system safe I don't think we know that right now maybe we'll get there at some point right but I don't want to rely entirely on government being able to spe specify a procedure that makes your AI system safe right and so what this does is it shifts what liability does is it shifts the onus to figuring that out onto the companies where you know the labs where most of this expertise resides right I think it's going be difficult for for government to sort of bring the kind of expertise in house um that gets them to where the leading labs are and so um and even the leading Labs don't really know how to build Safe Systems right now right so I want them to not only be like throwing everything they have in terms of like once they've made a decision to deploy making it safe but I want them to you know if they're not confident the system is safe if they think uh deploying a particular system uh given the current state of knowledge creates a one a million chance of human extinction I want them to wait six months until like better interoperability tools come around or whatever whatever the idea is I'm not a technical safety researcher right but I want them to be thinking like I need to to to be as cautious as I would be as if like I own the world basically right and if destroying the world was going to destroy all that value for me right and I um that's not to say that there's no role for what I would call prescriptive regulation of the kind you were describing um but I think what's really important in that context is that they don't is that those prescriptive rules don't preempt uh or displace the Tor liability right so uh sometimes judges interpret um uh regulatory schemes as uh having a pramp of effect either because uh they're they're viewed as conflicting uh W with with uh Tor law or they're they're viewed as occupying the the regulatory scheme is viewed as having occupied the field um and so impliedly preempting uh the Tor regime I think that would be a really bad outcome so you can avoid that pretty easily in any sort of legislation that's enabling a new regulatory program by including what's called a savings Clause that expressly disavows any preemptive effect um and once that in place I think we we can talk about Are there specific measures that would buy some safety if we require them I don't think those are necessarily bad ideas I think some are are more valuable than others um but I don't think we want to rely entirely on that yeah yeah I guess it's sort of um to me it seems like the distinction is uh the these sorts of rule making schemes they're sort of like like the the rules you know the stuff you have to follow it kind of comes like very early in time um maybe like before you know as much what's happening whereas if you can do it right something like a tort law scheme like it brings in the legal Force you know at a time where there's some like medium- range problem with your AI like in my mind it seems like the advantage is that it's sort of a more informed place to make these decisions um such that like you know AI companies optimizing for that are you know basically going to be doing better things than if they're just optimizing for like following like set rules does that right so they you know a rule that you you set to regulatory policy may not pass a cost benefit test right you know uh the government's going to like so you might have some unnecessary rules and there also might just be things you didn't think of to require or you decided not to require that like would have bought you a lot of safety right so if you get the rules perfect right if you require everything that passes a cost benefit test and you don't require anything that doesn't then maybe a regulatory regime is sufficient and better right but I don't have confidence in this domain uh that Reg waers are likely to approach that yeah I I guess there's a difficulty where like um on on some level you're sort of hoping that you have that developers are able to figure out like what the right cost benefit is for themselves to do but also there are kind of obvious problems with them like setting regulatory policy and yeah I guess I think of it is like just an interesting way to solve a knowledge problem um yeah it's also worth pointing out that some of that some forms of prescription break regulation work really well with liability so in particular there's proposals for these AI model licensing machines right and I think that would pair really well with a liability insurance requirement system right so instead of the decision being binary yes or no do you get a license or not what the regulator would do is decide here's how much liability coverage you need in order to deploy this system here's the the worst amount of harm we think you could do right and then you could deploy it if you can convince an insurance company to write you a policy you can afford and that's going to depend on you know U maybe there would be some some set of alignment evaluations or safety evaluations that that they rely on in order to do that underwriting process yeah so I think you want the decision about whether a system is deployed to depend on sort of whether its expected benefits for society are more than its expected costs and if if they have to buy insurance against uh against the worst case outcomes and convince an insurance company they can afford it that's a pretty good proxy for that whereas I think I'm I'm less trusting of a sort of a binary government decision like are we going to license this model or not yeah I guess yeah I actually want to talk about synergies because I think there's also Synergy like in the fork of the plan where you know like we're going to have nist figure out like what kinds of AI designs are more or less safe or figure out like ways of evaluating um AIS for danger it seems like this potentially has a Synergy with the torw plan I I think yes uh certainly this uh I guess there's two different ways that could work so one is if we're still in sort of negligence world if if my ideas don't take the World by storm and we don't have strict liability on MD Bo dangerous activities Theory then this promulgating these standards if you're not following those standard that's at least going to be evidence that evidence of negligence now there's there's a Doctrine called negligence per se that if you had actual regulatory requirements and and you don't meet those um then that would automatically be neged but if they're just sort of guidelines and this is issuing um that would be indication that you're not exercising reasonable care but it wouldn't be disp positive yeah I think there's I was imagining also like if if we do adopt your um proposal it seems like this kind of thing might be informative of like how risky that activity actually was um so how much uninsurable risk you took when you deployed it if you didn't follow the standard is that the idea yeah or like maybe it's not a standard maybe it's just like a you know um some kind of measurements of like oh we've you know there's some rule that you have to submit models to this organization and like this model got a like blaring red light and then it caused some sort of problem and you know that's like even more evidence that like there was something like pretty dangerous about it yeah so I definitely think there there's there's scope for more technical work in terms of you know evaluations of these models both in uh in deciding whether to deploy them and deciding how much insurance you have to take out and for these damages calculations if harm has happened um can we try to use uh sort of uh post Haw evaluations to try to figure out well could have gone a lot worse right what would that have looked like sure um I guess I guess the next thing I want to compare to is um the so in in your paper you cite this um blog post by Robin Hansen about F liability which to my knowledge I think is the only previous time people have talked about scheme roughly like this and he imagined uh proposal sort of similar to yours except there's kind of a fixed formula where they say like okay um you're going to assess punitive damages and the punitive damages are going to be based on like how many items on this checklist of like ways AI could kill everyone how many items did you check off and the more items on that list the worse the P of Damages are by kind of set formula um so your proposal I take the main difference to be that instead of being the strict formula you know you like people just have to figure out like how how much would um trying to prevent you know this certain harm that actually occurred how much would that prevent like really bad catastrophes that could have happened um so I'm wondering what do you think about kind of pros and cons of each one sure so I think I talked to Robin about this his motivation for having that formula was to Adit the discretion of judges and juries um I see that as not particularly viable in this context since his formula at least strikes me as fairly arbitrary it's taking it to the power of the number of of these different criteria that are checked off I think a lot of those criteria are not actually binary right so it's unclear how you would implement it in cases where it's like sort of kind of self-improving or something um uh so I think that's that's an issue I think weeding all these factors equally doesn't seem super persuasive to me um but I do see the value in sort of having a formula that said I I do actually I provide a formula for my formulation of period of Damages um there now there a lot of those those sort of variables in that formula are going to be difficult to estimate so that's a real challenge but I think the advantage of it is it's tracking the thing we ultimately care about right it's like how much harm did you risk and how sort of elastic is that with the with the harm that that we actually realized here right um yeah and so um I think there I would like to see a lot more work to sort of put me on the bones of how to estimate the parameters in that formula uh but in my mind you should be like aiming at the Target and doing as well as you can as opposed to sort of I think it's like At first blush it looks like straightforward to apply Hanson's formula and that when you when you really like unpack it I think there's still going to be a lot of discretion there and so I don't think it uh I think maybe it limits discretion a little bit but not as much as you'd like and it's not like reliably it's like Loosely correlated with the thing we care about um but it's not going to reliably track it in the way that that my preferred approach would does that make sense that that makes sense I I guess I could imagine suppose we don't use Hansen's formula in particular but suppose like what we do is like the world just spends the year we look at your formula and then we say like okay what's something kind of like the Hanson's formula that really would approximate um what your formula tells us to do but we're going to we're going to try and like have something that you know you can really nail down we're going to to the greatest extent possible we're going to have something where there's very little discretion on judges and juries so they can apply it like sort of automatically and we're going to lose something along the way it's going to be like a bit of a bad approximation but um hopefully it's going to be like really predictable and you know we're we're going to lose we're going to lose some value of aiming for the ideally optimal thing but we're going to gain some value of predictability and I'm wondering like how do you see that trade-off like how should we think about it so I think that would be really valuable um there's a question as sort of how that would be implemented right so one thing you could say is like look someone comes up with that formula you have your expert testify about it juries can incorporate that right it's the same way that we incorporate any sort of sort of scientific expertise in that domain right I think that is the most likely pathway if this is going to happen and sort of a common law judge way right I think it's unlikely that judges are going to like create a rule of law that say Jes have to follow the specific formula somebody came with on the other hand if this is done via legislation right um certainly legislators if they want to can sort of hardcode that formula into the statute right and then juries have to follow it right so if you if you have a formulate that you think is like really good and unlikely to be improved upon or if you think that okay if we ACC something better we can amend the legislation uh if it's good enough then I could see uh you know sort of compelling judges and juries to follow it um it would sort of just depend on sort of how good you think the formula is and how easy it is to estimate the parameters in it so if you have a really good formula that if you know the parameters I think I think my formula is totally determined if you know the parameters right the problem is estimating those is really hard right um if you have one that has like easily estimable parameters right and you're just saying jury you have this narrow task of of like coming up with good estimates at these parameters and then that's all we we're going to ask you and then like mechanically that will produce a damages award um I think that'll be great if you can do it I I I don't think technically we're there right now yeah I guess it also like one difficulty of this is like this seems sort of similar to formulas that get used in criminal law right like sometimes like uh legislators want to say okay we're going to have some sort sort of mandatory minimums or we're going to like you know we're going to just have some Rule and we're going to ask judges and juries to like or I I guess in criminal law it's judges but we're going to ask judges to like basically mechanically apply the rule and I get the sense that the legal profession kind of dislikes this or judges just kind of dislike this and F so firstly I'm wondering if uh you think I'm right and secondly like to what extent does that suggest that we should like be shy of implementing a like rigid formula here so I think mandatory minimums in particular are fairly crude right um and there's this General tradeoff in law uh what you may call rule standards uh discretion versus versus rules right there's this idea that the more sort of discretion you give judges individual cases the more you're going to be able to sort of accommodate details of cases that might uh not be captured by like an over broad rule on the other hand you're going to have like a lot more noise and potential for bias if you let judges and juries have more discre right and so there's this basic tradeoff um I think what's new in this context is there's like a competence issue right that it sounds like you don't totally trust juries to be able to evaluate these questions and so you want to like make their job a little easier right um and so you know I think we do sort of like have a way of dealing with that that you know you can be different people have different ways of of of judging sort of how well it works right of you know weding jur juries here you know we juries nonexpert juries hear from experts and then adjudicate The credibility of those experts and and then come to termination um but I think if we had again if we had a formula that was good enough I think it would probably want something better than just like you commit X felony you get a minimum of 10 years right I don't think something um of that level of Simplicity is gonna is going to work for estimating the uninsurable risk uh arising from an A system I don't know what that would look like um but I think if you had something sufficiently sophisticated where the parameters were easier for the jury to estimate again I don't I don't have a strong sense of what that would look like I think that could be really useful okay fair enough um so another kind of proposal I want to compare against is I thing you mentioned like very briefly early on something like pigui and Taxation where we say that like you know doing this kind of um activity is broadly dangerous and we're just going to say like whenever you make a model that's X big or maybe a model that like trips up you know X number of dangerous capabilities or something that's just like inherently risky and therefore or you have to like pay a certain fine or certain tax um you know kind of regardless what of what happens so similar to like a carbon taxation scheme um and these kinds of schemes are often like considered pretty desirable in the in the settings where like there are pretty broad-based um harms that could occur um so I'm wondering what you think about schemes like that so I'm a big fan of PAG taxes generally in my earlier life I was a carbon tax advocate a lot of my work is on is on climate change lawn policy um I think that there's two big differences between so say the climate context and the AI risk context um so one is uh if you are harmed by some climate outcome right um it would be really hard to come up with like how you can bring a lawsuit right um because everyone in the world contributed to that right to say that there's like a for cause of any particular ton of of carbon being emitted that that caused your injury that's going to be a total mess and you need basically need to sue the whole world right that's one right um the the so that's like the thing that makes climate change harder to to use this liability tool for conversely um it's really easy in the climate context to say okay we know what the unit of generating risk or harm is it's it's like a ton of CO2 equivalent right um and so we can say okay you know we might disagree exactly about you know how much risk or how much harm you generate by emitting a ton uh there's different estimates of like the social cost of carbon right but it it it's like we can we can measure like how much you did of that right we can come up with some tax and apply it right um I think both of those are flipped when we talk about the AI context right so AI systems are likely to harm specific people right and more importantly it'll be specific systems that harm them right so it's not like oh it's like the existence of AI is what is what har me no it's like some specific system was deployed and that harmed me that's who I know how to go Sue right and it's not as if you know so all tons of CO2 emitted to the atmosphere um do the same amount of damage right um now maybe like a marginal ton at some point is worse than others but like two different people admitting them at the same time right um you know me driving my car and you driving yours right are doing just the same damage right um and that's not true reliably for building an AI system of a certain size right you want to differentiate between companies that are or Labs that are are taking more precautions are are doing more you know doing more alignment research doing taking more steps to uh to make their system safer we don't want to just say we want we don't want to just like tax AI in general right we want it particularly we want to tax misalignment right um so one framing that I really like is people worry about you know alignment tax that it's costlier both financially and in terms of time other resources to build align systems right and so one thing you can think about AI liability doing is creating a misalignment tax right hopefully that's bigger than the alignment tax right and so um but if we if we could perfectly assess at a time a model is deployed how risky it is um then maybe that would work well but then if you could do that then you could just have you know some binary decision about whether you all to deploy right um maybe you might still want to tax because you because there's uncertainty about what the benefit of the system are right so you want to um uh but but I think we're not in that epistemic position right we don't have the ability to assess X an the how risk it system is um once it's done harm in particular ways I think we'll have more visibility into that and so that's why I think a liability regime works better in this context yeah that makes sense um I guess a final thing I want to ask is a thing that that it seems like this proposal is kind of really targeting is externalities from um from AI research so so it's sort of imagining a world where people who run AI companies they like basically know what's up they basically know like how risky systems are and you know what they would have to do to make it them less risky and the reason they don't is that they're like inadequately incentivized to because the world ending is like only so bad for them or you know really bad catastrophes they're only so bad for the AI company but they're much worse for the world and I think like it's not obvious to me like if this is the right picture to have right like we see like pretty different um assessments of like how you know what what are the chances that AI could cause really serious harm um you know like the world ending like like the really serious harms that like you know exis people talk about you know they're not fully internalized but they they're like quite bad um for the the companies involved what is Mega 2 West for them than for the world it's it's true but like I guess so if you have a one in a million chance of destroying the world but like a 50% chance of making hundred billion dollar right the calculation for you looks a lot different than the calculation for the world right that's a negative expected value bet for the world but a positive expected value bet for you I think I think that's right I think like on views where the probability of Doom is like way higher than one in a million like like I think a lot of people think that the probability of Doom is like higher than 10% from a specific system yeah maybe not from a specific system maybe maybe from AI development in general so I guess my question is like how how how do you think we should figure out like if if I'm assessing this how do I tell if like most of the risk is like externalities versus like individual irrationality or stuff like that okay so I think that's a fair critique that's say okay you know maybe the people who buy these x-r arguments the people at say anthropic or you know some of the people at open AI at least right um the deide right um are going to have even more incentive to be cautious but like you know meta Yun doesn't believe in X risk really right and so he's not going to be so worried about this and I think that's true if you only have the the liability part of my framework if you have the liability insurance requirement uh part of it then you have to convince the insurance company that's going to be a much more cautious actor um that you're not gener generating risk that's going to introduce that that more cautious decision maker into the loop and put a break on the process and so um I think I'm less worried about insurance companies that sort of their whole job is to be cautious right and to to avoid um you know writing insurance policies they're going to are going to pay out more than they cost in expectation right um I think that's going to be an important framework for for for the actors that that the main problem is sort of their um their assessment of the risk rather than their incentives to to account for it yeah I so I I think this works um for the problem of AI developers who have like abnormally low estimates of the risk so I I guess I'm I'm looking at a world where I feel like there's a lot of disagreement about AI risk and it seems like this kind of disagreement like on the one hand it's it seems like it's kind of the motivation behind like some sort of Tor law scheme rather than like well we just know what to do so we're going to make a law that says you have to do that but it seems like it causes some problems partly in that like uh AI developers or maybe even um you know insurance companies that AI developers have to buy legal liability insurance from they might know like they might not know what kinds of things to avoid um it also seems like it means that like these sorts of pervasive disagreements are going to make it really hard for jurries to assess like how big should the peo of Damages be so one might worry that like we we just have so much disagreement that this kind of liability scheme can't really help us um what do you think so I think I want to distinguish two different objections there so there's an objection that this disagreement makes it hard to implement the formula or makes it hard to implement the framework or that or that this agreement makes it so that if you could implement this framework it wouldn't buy us that much safety um and so I think um the concern that it's hard to implement I think is true right um I think a lot of technical work needs to be hashed out um needs to be done to to implement this framework reliably I think you can Implement in sort of Rough and Ready R way now that would still buy you a lot of risk mitigation um but there's a lot of refinement that could be done a lot of knowledge that could be built um consensus that could be built that would that would allow you to more reliably track what the risks that that these companies are taking are um and that would make the framework more valuable uh in terms of and and the other point I want to make on that is that whatever you think the sort of epistemic burins of implementing this formula they are lower this framework are they are lower than for implementing prescriptive regulations right because you not only have to know how big the risks are you need to know what to do about them right um yeah and so yeah I think if you're concerned is is our like poor epistemic position with regard to AI risk I think that tends to favor liability relative to other approaches not disfavor it then there's the question of is it g to like is it going to influence behavior in the right way because people might have different beliefs so I made the point already about liability insurance and how that introduces more cautious actors I think of what you're ultimately saying is like look um people building these labs are still going to make mistakes right they might deploy A system that you know BAS on everything anyone could know right looked like it was going to be safe and then it wasn't and then we're all dead and like who cares if in theory there should be liability for that and I think what I want to say is short of you know an outright ban on on building um Building Systems Beyond a certain level or certain kinds of systems I just think policy is not going to solve that particular scenaria right what we want from policy is aligning the incentives of these companies uh with with social welfare uh maybe we also wanted to subsidize alignment research a in various ways right um but there is a sort of irreducible technical challenge here right that that I think you're asking too much of policy if you wanted to solve all of that yeah I guess like it if I kind of think about the question like it makes it's most persuasive like the case for regulation is most persuasive in a world where like these AI labs they don't know what they're doing but I know what they should do right um and but in a world where like you know we're all in kind of similar epistemic positions then maybe like uh you know the sort of Tor law approach seems like it makes more sense or if the labs know better than so or if they know better I Me Maybe you like Daniel fad know better than uh than what the folks in open AI do I don't think Congress knows better right um so like I don't I think Congress is going to listen to a lot of experts but I I don't know if if you watch what goes out in DC right the idea that they're going to that they're going to write legislation that Regulators are going to come up with something that like reliably makes us safe I I'm just very skeptical I think they can do some things that are helpful um but that it's not going to be anywhere near sufficient I think some of the things they end up doing might be harmful right um I think po politics and Regulatory policymaking is is very messy um and so and so I think if you're relying on that to to make us like absolutely safe um yeah I I want to I want to pour some salt on that um also even if you're a is okay let's just like you know the thing I was throwing out there is the extreme position like let's ban outright ban development of systems Beyond a certain level I think that even if you could make the domestic politics in the US work which I don't think you probably can um and even if you thought that was desirable I think enforcing that globally is going to be extraordinarily difficult right now some of that you could apply some of that critique to to liability too I think that's a much easier left gotcha gotcha um I also want to ask for the approach to work it seems like we need in Worlds where the risk of misaligned AI causing like tons and tons of uninsurable damage in Worlds where that risk is really high we need there to be a bunch of like intermediate warning shots where there are problems that are kind of like you know uh really bad AI causing like untold amounts of harm um but they only caused like you know a few Million doar worth of harm so we can like SU about it and like actually have these cases come up um can you paint a picture of like what these kinds of cases would look like and yeah How likely do you think they are sure so before I do that I just want to offer some clarifications there on the framing of your question so I don't think we necessarily need a lot of them right we need um we need there to be a high probability that you get one for every for for a particular system before we get the catastrophe before we get the UN insurable catastrophe right um and we also you don't need like thousands of them right um You also um don't actually need them what you need more if you have the if you have the liability rule in place just say you've done this via legislation as opposed to sort of common law accumulation of cases then what you really need is the expectation that these cases are likely to happen you don't actually need them to happen because what you wanted yeah um you would ideally you know the companies are are expected to to be liable therefore they're like trying so hard to avoid these period of damage and judgment and so you get right um that you might worry about some good Harding problem there where they iron out all the um all the practically compensable cases without all actually solving the catastrophic risk that I think that is a failure mode I'm worried about um but um but I think this could work without ever actually forking over any money right if it's just like if if you just have that expectation right okay yeah now Al although it see it seems good if they're like well well firstly I guess like right now we're in this position where we're wondering how many how many of them to expect it also seems good if there are going to be 10 such cases because you know there's some uncertainty about whether people like get around to suing and like maybe the you know like you'd want to like average out some variance in like what juries are going to do to like make it a little bit more predictable like it it seems like like maybe you don't need a thousand but it seems like 10 would be much better at the risk of seeming callow us about the the like real people that would be harmed in these cerios yes I think from the perspective of catastrophic risk mitigation more of these is better and that you would want a few I'm just saying in principle right you don't need very many and in like in like you know if if you really take like my expectations argument seriously you actually don't need any you just need the expectation of some the expectation of a high probability of some um okay now to your question about what these look like um so the example I use in the paper is you you task an AI system with running a clinical trial for a risky new drug uh it has trouble recruiting participants honestly and so instead of reporting that to sort of the human overseers of the study uh it resorts to some combination of deception and coercion to get people to participate uh they suffer some nasty health effects that are the reasons it was the the was hard to recruit people in the first place uh and they sue right so it seems like here we have a misaligned system it was not doing what it's its deployers or or programmers wanted it to do right it wanted to to sort of honestly recruit people um but it sort of worn the goal of uh of successfully run the study right um it didn't worn the sort of like theological constraints on that right but it sort of so it seems like we have a misaligned system but it for whatever reason uh was willing to display its misalignment in this non-catastrophic way so maybe you know a few hundred people uh suffered health effects right um but this isn't the system trying to take over the world and now the system is probably going to be shut down trained whatever now that we know it has this failure V but probably the people who deployed it xane couldn't have been confident that it would fail in this non-catastrophic way right presumably they thought it was aligned or they wouldn't have deployed it right um but they probably couldn't have been that confident they couldn't have been you know more than um you know confident to to more than like one in a million uh you know uh that it wouldn't uh fail in a more in a more catastrophic way right and so that's the kind of case I'm thinking about okay so it seems like yeah the the general pattern is like AI does something like sketchy like it you know it like lies to people or it steals some stuff and it does it like pretty well but eventually like we catch it and because we catch it then like someone can sue um you know because we've like noticed these harms I wonder it seems like this applies to the kinds of AIS that are like nasty enough that they do like really bad stuff but also like not quite good enough to just like totally get away with it Without a Trace so not good enough to get away with it or not or like they're myopic in various ways right so maybe it's the only the system doesn't want to take over the world right all it wants to do is like really wants the results of this clinical trial right and because that's all it cares about right um it's willing to risk getting caught right it actually maybe it doesn't mind at all cuz like by the time it's caught right it it's it's achieved its goal right um and if the if the people who deployed it could show they were like really confident that it was myopic in this way or had narrow goals in this way then maybe they didn't risk anything that bad but I'm sort of skeptical in that generic case that they can show that right um and so uh so but but yeah another scenario you might think about is like a a fail takeover attempt right so you have a system that's like you know scamming people on the internet to to to to build up resources right it's doing various other things maybe it even like takes over the server on right but at some point we're able to shut it down right um and harm some people along the way I think that's another sort of like near Miss case right there's different ways you can imagine this going where it either has really ambitious goals but like isn't capable enough to achieve them or maybe it is like potentially capable enough and we just got lucky right um so I think there there's different ways to think about this because there's you know um when you think about like Joe Carl Smith's work on like um you know these systems are tra facing trade-offs they're going to uh they they worry that they're that their goal gos are going to be sort of um gr descended away right and so that there's trade-off between like do you help with alignment research now do you do you risk uh your goals being changed or do you act now um even though there's some possibility that you might fail right um so there like could be that like even this particular system if you had perfect knowledge about the system like there would have been some non non-trivial risk that that this specific system would have uh you know caused you know an exential catastrophe right we just got really lucky right or we got moderately lucky however much luck you needed in that situation right um I mean theoretically it could be like oh it was a 50% chance right then you're already in uncurable Risk territory right um and the actual damage award is going to be too big right um but certainly there's going to be cases where it's like one in 100 one in one a thousand one in a million right um where a reasonable person would have thought that that was the risk okay sure so I guess my underlying concern is that this kind scheme might under deter behaviors that like or it might under deter safety for like really really capable AI systems um perhaps because I'm imagining a binary of either it's fine or it does really naasty stuff and gets away with it but I guess the thing you're suggesting is like even for those systems there's a good chance of like fail takeover attempts or like maybe it's just myopic and like like you said if we only need like you know a couple of those even just in expect maybe that like makes it fine and align the incentives yeah so look I I want to be upfront about this I think there is a worry that we don't get enough of the right like the expectation the right kind of warning shots right and so um or near misses and if there are certain classes of harms for which there aren't near misses or certain scenarios for which there aren't near misses and therefore this doesn't give the labs enough incentive to protect against those I think that's a real concern I don't know what the shape of like of of the sort of the harm curve looks like right like how probable different kinds of harms are and whether there's like um sort of qualitatively different failure modes for a system some of which aren't really correlated with near misses uh it seems like there should be right if there's like lots of different variables you can tweak about the system um maybe that particular system isn't going to have near misses but like a very near system would um and so still uh the labs are going to have incentive to guide against that to guard against that um but I think yes I think think that is a real uncertainty about the world that that um if you think uh we can be confident X that certain types of warning shots or or or near misses are unlikely um that you're going to want other policy tools to deal with those kind of situations I don't want to hide the ball on that fair enough um so I next want to talk about sort of just the this proposal as law um so the first question I want to ask is you're proposing or the proposal in this paper is like uh you know some amount of change to how liability law currently Works um but I don't know that much about liability law so I don't have a good feel for like how big a change this is can you tell me sure so there's a few different as I said as I've been saying levers that I want to pull here some of these are are pretty small asks and then at least least one of them is really big right yeah so uh we haven't talked as much about the sort of negligence versus strict liability calculation um so right now there's there's three kinds of liability that are typically going to be available there's negligence liability there's products liability which is called uh strict liability but in in in some details has some very negligence like features uh and then there's this abnormally dangerous activity strict liability um which is a I would call more genuinely strict form of liability um negligence liability is clearly available but I think going to be hard to establish um wait uh before we go into this can like what is negligence versus strict liability what's the difference so when you're suing someone for negligence you're suing them saying alleging that they breached a duty of reasonable care uh so they fail to take some reasonable precaution that would have prevented your injury um and so the the the general principle of negligence law is we all have a general duty uh to exercise reasonable care to harm other people um and when we don't do that when we when we fail to exercise that reasonable care that can give rise to liability uh when when that causes harm um so that's clearly available uh but I think going to be hard to prove in this context um because you would have to point to some specific precautionary measure that a lab could have taken that would have prevented your injury um when we don't know how to build Safe Systems it seems like it's going to be really hard to to say oh if we' done this the system would have been safe right not impossible um could be that there's some standard precautionary measure that say um anthropic and and Deep Mind are doing but meta isn't and then their system harms someone um not saying there's never going to be negligence liability but I'm saying even if you had uh negligence liability for all harms where there's a breach of Duty or where there's provable breach of Duty um that's unlikely to bias as much risk mitigation as we'd like um in particular that's because the scope of the negligence inquiry is pretty narrow so say you're driving your car and you hit a pedestrian we don't ask should you have been driving at all say you're a licensed driver but like was the value of your trip really high enough that uh it was worth the risk pedestrian that you might hit them or say you're driving an SUV um you could have been you know you you weren't hauling anything it was just you by yourself you could have been driving a compact sedan we don't say did you really need to be driving that heavy a vehicle um you know that increased the risk that you would kill a pedestrian we just ask things like were you speeding were you drunk uh were you texting while driving right um those are the kind of things that are in the scope of the negligence and so in the AI context I think you can't say oh you just shouldn't have built this system right you shouldn't have deployed a system like this of this General uh nature you um uh you should have been building Stam a instead of large language models right those aren't the kind of things that are typically going to be part of the negligence inqu okay um that's why I think Nance is not super promising um then there's uh products liability um which is clearly available if if certain criteria are met so first of all it has to be uh sold by a commercial seller so if if someone's just deploying an AI system for their own purposes that they made product liability isn't going to apply there's also this question is whether it's a product or a service um ultimately I don't think these distinctions matter a lot um for the kind of risks I'm worried about um because I think the test that's going to end up being applied is going to be very negligence likee um um so when you're in the products liability game there are three kinds of of products liability there's what's called manufacturing defects uh this is like you ship a product off an assembly line that that that doesn't conform to the specifications and that makes it unreasonably unsafe and that is more genuinely strict liability in the sense that no matter how much sort of effort you put into your QC process say your your QC process is like totally reasonable um and it would like be unreasonably expensive to spend more um to to eliminate like one in a million products coming off the line unsafe still you're going to be liable if that that one in a million harm someone right but I don't think you're really going to have manufacturing defects in the AI context that would be like you ship an instance of the model with the wrong weights or something I just don't think that's a that's a failure mode that we're really worried about um and so we're more likely to be dealing with what are called design defects and there the test is whether there was some reasonable alternative design that would have prevented the injury and you can see through the the the you know the presence of the word reasonable there that you end up in a similar sort of cost benefit balancing mode that you are with negligence um again if we don't know how to build Safe Systems right it's hard to to show like yes you don't have to show that they behaved unreasonably that that the humans behaved unreasonably you have to show that the system was unreasonably unsafe but I think that distinction doesn't end up mattering that much and and in practice it's going to function a lot like that there's also warning defects which I think you could potentially have liability on but even if you have all the warnings in the world I think that's going to solve the problems you're worried about okay that leaves the the third pathway um which is this abnormally dangerous activities idea um so there there are certain activities that we say they're really risky even when you exercise reasonable care and so uh we're going to hold you liable for harms that arise from the inherently dangerous nature of those activities um and there's a sort of metad Doctrine as to what uh what activities qualify as abnormally dangerous uh that I go through in the paper I think plausibly under those under that metad Doctrine uh training and deploying certain kinds of AI systems should qualify as obably dangerous but think they cours are unlikely to sort of recognize uh software development of any kind is ably dangerous sort of on the you know status quo business as usual I think it's clearly within their powers to do this um to to treat uh you know training and deploying AI systems that have unpredictable capabilities uncontrollable goals um as of normally dangerous activity I think it does sort of meet the the sort of technical uh parameters there but I think it would require an understanding of AI risk that courts have not currently been persuaded of um but I think this is a move they should make uh the courts could make on their own it would not be a radical departure from existing law I think it would be be consistent with this this broad Doctrine which sort of be recognizing a new instance in which it applies um and so I think that's a relatively modest ask to make of Courts though again I want to be clear not the default that's likely um okay so so that's one step um and so that solves the like can you get liability at all right for compensatory damages then there's the punitive damages piece which is decided to get at these uninsurable risks there I think there's a much heavier lift so there's longstanding pun of Damages Doctrine uh that requires what's called malice or recklessness um Reckless disregard uh for for risk of harm um and so you know we talked before about how um you know even having provable negligence is going to be difficult in these cases um malice or recklessness is like a step higher than that you can think of it as basically like gross negligence right um and so if you if it's hard hang on I I don't know what G negligence really bad negligence like okay uh you like it was really really unreasonable what you did not just like not just a reasonable person would have done it but like even a even a normal unreasonable person wouldn't have done it right it's like a lot worse it's like the cost benefit calculus is like lopsided right yeah the the image I have in my head is of someone saying yeah I know this could be risky but I don't care I'm doing it anyway um which seems like a pretty high bar yeah it's a pretty high bar um and so yeah I think courts are unlikely to take the step of reforming punitive damages Doctrine in the ways that I would like them to um because this would be such a significant change now I do want to point out that if you think about the normative rationals for punitive damages uh at least the one that I find most compelling is that uh and that I think is the is the central normative rationale um is that compensatory damages would underd deter the underlying torous conduct um that doesn't require Mal re with this to be true um it requires something about the features of the situation that suggests compensatory damage are going to be adequate it might be uninsurable risk it might also be like most people who um who will suffer this kind of harm won't know that you caused it or won't Sue because maybe the harm is small relative to the cost of proving it right um and so maybe only one in 10,000 people who suffer will end up suing and so maybe you should get pun of damage as to account for the people that don't Sue um uh but nothing about that uh requires Mal or recklessness and there there is existing scholarship that are used for getting rid of this requirement so it's not it's not coming new in this AI context um but again I think courts are unlikely to do this and it would be a major doctrinal change um would require both state courts um to change their pun of Damages uh Doctrine and it would also require the US Supreme Court in applying again the due process clause to say that it that applying P of damage in this context uh with without these threshold requirements uh doesn't violate due process that that companies that that deployed these systems were on adequate notice um I think that's that's not totally in superable but I think I think it's pretty unlikely um as a matter of just the Natural Evolution of Courts um yeah I mean in actually I kind of want to ask about this because um so your paper cites this um this work by palinsky and chel I guess their names basically saying like hey punitive damages should just compensate for the probability of being caught at least that's how I understood their abstract that seems kind of intuitive to me but also my understanding is that this was published like what 20 25 30 years ago or something and apparently it still hasn't happened so the fact that it hasn't happened makes me feel like a a little bit nervous about making these you should be I I think that like I think you should not expect the courts are going to follow my advice here right they didn't follow pinsk and chelle's advice they're they're much more prestigious people that I am they're like Harvard and Stanford right um uh they're doing fancy economics Ms on this right um I think you should not expect this change to happen from courts I think they should I think it's within their powers um um uh I think we should try to persuade courts I think litig against should bring these arguments and force them to confront them um and ask courts to do it I think all that should happen but I would not count on that happening um what I do think is a more but again I want to be clear I really think people should try um both because you never know and one state doing it uh would get you a lot of value and uh because I think you would put it on the table politically then you would say look we need legislation to overturn this right and so I you think um at least with regard to the common law issues with regard to sort of what state Tor law says clearly state legislators if they want to can um can change that requirement so there's sort of a hierarchy of law in which statutes always Trump common law um and so if a state wants to pass a statute saying um either in this AI context or more generally that that uh pun pun of Damages don't require malice or requisites that's clearly something the state legislation can do uh there's still the constitutional issues with that although I think if you have a statute putting the labs on notice or the companies on notice that um that might get that might accomplish a lot of the the two process notice functions that the Supreme Court's worried about and so it's not clear to me that um that would be a constitutional barrier in the context of legislation yeah can I can I ask about the Speen of Damages change so this um this case for like having punitive damages compensate like the probability of like a case being brought um is that like is the thing holding that up that like legal scholarship broadly is not persuaded by it or is it is it an issue where like yeah legal Scholars are persuaded by it but judges aren't or is is there some other issue going on so the short answer is I don't know but if I were to speculate um I think you know Pinsky and chelle's argument is really persuasive if you're thinking about this in a law and economics frame and like that's all you think Tor W is about right and yeah that's basically law and economic uh basically being thingy of law just in terms of like economic efficiency and like maximizing social utility stuff is that roughly right yeah and so that's the frame that I tend to prefer um but that is not dominant right and so there are other sort of ways of thinking about what uh what Tor was for uh particularly what pu of Damages were for there sort of an expressive function um expressing society's disapproval for the behavior that would more map on to this recklessness uh malice requirement right um and so if you have someone that's that's doing something maybe isn't even negligent at all or or um you know it's a strict liability toward it or its ordinary negligence the idea that we want to like punish you over and above the harm you did um doesn't sit right with some people um honestly I don't think courts have really Revisited this issue in a long time um mostly what courts do is just follow precedent right unless they have some good reason to reconsider it um I think AI arguably should give them a reason to reconsider it that like we have this pressing social problem that you are well positioned to solve um maybe the fact that that that this requirement doesn't really make sense from a lot an economics perspective more broadly hasn't been that big of a deal in the past um a lot of the sorts of problems that you'd want P of damages to deal with we've dealt with through other policy tools for reasons we've talked about earlier I think there's reason to be skeptical that those policy tools are going to be adequate in this context we need to lean more heavily on Tor law so it makes it really important that you get pun of Damages right from this law on economics perspective and they should reconsider it again I don't think they're they're super likely to do that but I think we should try um and yeah I I think I maybe talking myself into thinking there's like there's a little bit of chance that they that they would reconsider in this context um but yeah I also wna okay I I'm I'm gonna try and like suggest some hopium and you can like talk me out of it so this so I don't know I glanced at this like pinsk paper because it seemed like I don't know not knowing anything about the law it seemed like a kind of obvious change right and I I read the paper and I I noticed that like or well I read like the first page right it's got a table of contents and it's got a little disclaimer at the bottom and I noticed that in the table of contents it's like hey you know it should be based on like you know the probability of like this thing being like actually like it it should be based on the probability of a lawsuit actually being brought not a based on other things in particular you know it shouldn't be based just on the wealth of the defendant you know because that's economically inefficient and I see the thing at the bottom saying like yeah this was like sponsored by uh was it Exxon Exxon Mobile or something like my understanding is that basically like a big Oil Company paid them to write this paper basically in the hopes that this would be like more friendly it's a really big business uhuh and I have this impression that like people in the world kind of don't like the idea of changing the law to make it better for really big businesses uhhuh but this change it seems like would make life life kind of worse for really big businesses and therefore maybe like everyone's going to be like a little bit more friendly to it because you know people don't like the big guy did does that sound right am I being like too cynical okay so there's a few different ways I want to answer that so first of all I think that's like good and bad right so um you know I'm talking to state legislators about you know different you know legislation to implement different elements of this proposal right and you know a lot of them are afraid of doing anything that runs a foul of like the tech Lobby right or like they want to at least neutralize them right they want to like um it's okay if they're not like really supportive right um but uh in a lot of States like you know having having all the the tech companies be against your legislation is like pretty bad right so that that's one U that's one answer that like not obvious that's a net good uh but yeah I mean I do think there's sort of like a populist like C in certain circles at least there is like backlash against big Tech and so like if your if your strategy is not like the inside game if it's like making a big public case then maybe that's helpful um you know I'm going to leave it to sort of like political entrepreneurs to to sort of make those judgments um then there's the there's the sort of way this fits into the broader sort of AI policy ecosystem and for that purpose I think it's actually really valuable that this is like a relatively hostile to the the incumbent big players proposal right not as hostile as some of the like AI pause stuff right um but like when you compare it to like licensing regimes that have sort of anti-competitive effects on some margins um you know I think there's a strong case that we should have a trust exemptions for them you know cooperating to share um to share alignment research to to maybe to coordinate the Slowdown on capabilities enhancements right I think that there's reasons to think that like under current law that would violate uh you know anti- clusion uh principles right I think that there's there's good reasons for having exemptions to that I think those ideas are generally like pretty friendly to the incumbent players um and there is an accusation that sometimes thrown around that like you know AI safety is this like scop or whatever by the big by the big by the big tech companies um to like avoid competition and stuff like that and so I think think having some policy proposals uh in your in your package that are clearly like not not in the interest of of those big companies is is is useful at least rhetorically um and so I I think it does play that role but I don't think oh it's bad for big Tech therefore like automatically it's going to happen that that's definitely not my model fair enough um so okay all of this was sort of a tangent um I was originally asking uh like how big a lift is this terms of changes to Tor law that happen you mentioned that like yeah you have to make this change to like strict liability which is maybe not so big you mentioned that there's this changes to this change to punitive damages which like is kind of big um and I sort of interrupted you there but I think maybe you were going to say more okay yeah so pun of Damages is a pretty big lift I think we've we've be in that that horse plenty um then there's there's there's other things so liability insurance like Court just can't do that right liability insurance requirements so that would require legislation I don't think it's like a big legislative lift but it's just not something courts can do so like you know yeah um and then there's there's other there's other things like that we haven't talked about so there's this damage there's this uh doctrine of uh what's called proximate cause or scope of liability so say I cut you off in traffic and you have to slam on your brakes but like we don't Collide you're fine but it slows you down like 30 seconds and then uh two miles down the road you get sideswiped in an intersection right um and and you want to sue me and say like look but for your negligence in cutting me off right I I wouldn't have suffered that later injury so you owe me money right and I say no it wasn't foreseeable right when I cut you off that you would get an A collision two miles down the road in fact it's just as likely that I could have prevented a similar Collision for you right and the courts are going to side with me there they're going to say I'm not going to be liable for that even though I was negligent and my negligence did cause your injury this is like an independent element of the tour of negligence um and so in the AI context the question is like what does it mean for the injury to have been foreseeable in some sense uh misalignment causing harm just cly foreseeable but right like Sam Alman talks about it right like if his system does it like he's not going to be say like couldn't see this coming right um but the specific like mode of a misalign system harming someone um almost certainly won't be foreseeable specific details right and so it really depends on what level of generality that question uh is evaluated uh there is this manner of harm rule that says you know that like the specific manner of harm doesn't have to be foreseeable as long with the sort of General type of harm is um that helps a little bit but there's still like a lot of wiggle room in how this doctor is applied there's not some sort of like high level change to precedent that I can ask for to say like oh you need to change this rule so that like there will be liability in these cases it's really just courts need to be willing to apply a relatively high level of generality in their in their you know scope of liability or or approximate cause assessments uh for AI harms so I think you know how big of a lift is that I think not a huge lift but like also not necessarily going to be consistent across cases and you just want it to like generally be fairly friendly to liability um but it's a pretty mushy Doctrine in that sense um then there's uh something we talked about earlier where the way mortality damages are dealt with under current law so there's two kinds of lawsuits you can bring when someone dies uh there's what's called a surv Ral action which is uh basically all the torts that the person could have the decedent could have sued for the second before they die so say you know I crashed my car into you and you're in the hospital for six months and then you die right and in those six months you racked up lots of hospital bills you had lots of pain and suffering you you lost six months with wages you you could have sued me for all that um your your estate can still sue for all those things after your Deb right that wasn't true common law but there are these survival statutes that allow the basically the the claims that you had at the moment you died to to be brought by your estate um then there's what's called wrongful death claims which are also creatures of Statute that say that designated survivors so this is no longer State suing this is specific people with a specific relationship to you say your kids or your spouse or whatever um can sue for harms they suffered um because you die um so maybe you know your kid suing because um they were gonna benefit financially from you that they were going to get caretaking Services whatever right in neither of those of those lawsuits is the fact that like it kind of sucks for you that you're dead right that's not something that can be suit for right in almost every state right and so if you think about like a quick and Payless human extinction um where there's no survivors left to be suffering from the fact that their relatives are dead um if you take this like to its logical conclusion the damages for that are zero right no lost wages right because because you died quickly no pain and suffering uh no hospital bills no one no one's around uh not only not around to sue but like there's no claim right because they haven't suffered from your death right because they're dead too right right um and so now I don't think courts are like likely to take that so literally if they're like figuring out if they buy everything else in my paper and they're like okay we're trying to do damages for for how bad human extinction is I don't think they're going to actually like take it that to its logical conclusion say the damages are zero but I think there's a reason to be worried that in general if we think most of the harm from AI uh misalignment or misuse is going to come in the form of mortality that th those harms are going to tend to be undervalued um so that would require a statutory tweak in in individual states to say that wrongful death damages should include the value of the person's life to them um we have we have ways of estimating that so as we you mentioned earlier Regulatory Agencies used is a value of a cyal life um on the order of $10 million I think that would be fine uh which for us lives um in um in the Tor context but that would require a statutory change I think I think not a huge lift but would require a statute I think because wrongful death claims are statutory to begin with I think it's very unlikely that courts would try to change that on their own sure um so I don't quite understand so it seems like uh the the case where that we're really thinking about is like you know an AI causes an intermediate amount of harm and we want to assess these punitive damages for like how bad it would be if like you know some some sort of like really bad catastrophe happened um it strikes me that it strikes me as a prior possible that that kind of calculation could take into account the value of the lives um you know the the value of the life years not lived but like that could be different from actually suing for loss of life well if your theory of period of Damages is that you can't sue for these compensatory damages if they actually arise because they're practically not-c compensable then presumably the pun of Damages should be pulling forward those hypothetical compensatory damages right and like again I don't I I I'm not so worried that like but if if you if you just are like hyper logical about this and you apply existing compensatory damages Doctrine in that scenario the number you get is zero right now again I don't think like I think if courts are persuaded by all the my other arguments they' be like really dumb to go there right sure um and to say well if it causes lots of pain along the way you can sue for that but the actual human extinction is like okay that does seem crazy I I'm just pointing out like that is The Logical entailment of the existing structure right now you could say we're going to tweak you instead of instead of changing the statue you could say um well I'm going to have a slightly different conception of what of what these pun of Damages are doing right um and and and they're not quite just pulling forward the compensatory damages I think that you could do courts could do on their own I just want to point out like I don't think this is at all the biggest obstacle to making my my framework work but it just seems worth being like uh transparent about this Quirk of of mortality the way mortality damages work that like in theory at least could cause a problem here and like if if we can if we can pass legislation fixing that would it would make it a lot simpler more straightforward fair enough so basically we've got this like bundle of changes that where like that you might hope that um courts or legislators make you know this bundle is like you know some amount of a big ask like how often do those kinds of changes actually happen so if what you mean by those kinds of changes you mean reforms to make liability easier the answer is leg just atively they almost never happen right so Tor reform statutes are almost always with the exception of some of the statutes I was talking about like you know wrongful death and Survival action statutes um yeah are almost always to limit liability right so when people talk about tort reform they tend to make it like oh uh you know liability insurance is too expensive we need you know it's making healthc care too expensive or whatever we need Tor reform what that what they typically mean is like making it harder to sue right um so if that's like your your like your reference class that you're drawing your base grade from it like doesn't look so attractive now maybe you think like AI is different enough that that that we shouldn't think of that as the right reference class I think that's a plausible move to make um but if that's what you're thinking about then like you shouldn't be too optimistic um courts are I I think are are more um inclined to make changes they are more symmetric and whether they make changes that are like more Pro plti or Pro defendant um they're been changes like market share liability um like recognizing various forms of strict liability um that have been player friendly and so I think I'm as I said I'm like mildly optimistic about them making the strict liability and abnormally dangerous activities change again I think that the the pun of Damages thing is like too big of a doctrinal move that I think courts are unlikely to make um and so I think we would probably are going to need to R legislation there um the other thing we're saying in this context is like if you think about this as like a Tor reform problem then M you should think it's unlikely if you think that you're going to have a lot of energy at some point a lot of political will to do something about AI law and policy right and those things include you know some things that would be more Draconian like sort of just Banning or or you know creating moratoria on like training models above a certain level well like saying you have to pay for the harm you cause is like a less extreme step than a ban right or than or than a pause right um and so once you think those things are on the table I think you should think that this is more you should be more optimistic about ideas like my liability framework and so maybe you don't think like this is likely to happen tomorrow but if you think there's going to be a future moment where there's political will I want this like idea to be fleshed out and ready to go so that that you know States or or the federal government can pass it fair enough um so another question I have about this is Nally this is like uh catastrophic risks from AI but the from AI part doesn't I I mean you talk about some AI specific things but this seems like a fairly General type of proposal right like um ju just whenever there's some risk of some sort of uninsurable harm happening you know we could have like a pretty similar scheme so yeah I'm wondering like what other effects do you think these kinds of changes could have and have people also have people talked about these kinds of changes before so I'm not aware of anyone proposing using Torla to deal with uninsurable risks before um I think typically the way we handle uninsurable risk is through some kind of prescriptive regulation and those regulations often preamp toward liability right so if you think of like a nuclear power plant right there there there is some liability for nuclear power plant but the liability isn't really the goal of it isn't really to to change the incentives right there's like there's like various government subsidies to make the insurance affordable to like um but we mostly rely on especially in the US like regulating these things to death it's like impossible to build a nuclear power PL the US it's easier in France but even there they're relying on prescriptive regulations right um and so I think that's true broadly so if you think about sort of like biolabs that are doing uh gain of function research right um I think it's it's hard to bring a Tor lawsuit for that um we most rely on these like BSL certifications and stuff you know safety level certifications uh prescriptive regulations right so I think um generally the thought has been it's hard for the Tor system to handle this we should we should lead on other policy tools um I think it is harder for the tort system to handle this um but I think in the AI context it's even harder for other policy tools to handle it or at least to handle it sufficiently right I'm not saying it should be sort exclusive policy tool right um but I think uh there are there are real limits to what you can do with prescriptive regulation in this context and so um I want to lean more heavily on the Tor system than you would when you otherwise would I think if you made these doctrinal changes um it would you know so the the strict viability change would only really apply to to to AI I think the the pun of damage is change in principle would would tend to be more broad it would be sort of weird to change it just for AI um but I think the the implications of that might be pretty minor since a lot of the areas where there are these um these these catastrophic risks are the Tor wall is going to be preempted anyway sure I guess like one so one one imagination I have is so during the um Cold War my understanding is that there were a bunch of near misses where like somebody like we almost set off a bunch of nuclear weapons but we ended up like not quite doing it right like maybe like the US Air Force accidentally drops a new bomb like on the US and like it doesn't explode but like five the six safe guards are off or something and I'm imagining a world so uh in my understanding there's a thing you can bring called a section 1983 lawsuit where if a government official violates my constitutional rights I can sue them uh you know for the damages I faced and one thing I could kind of Imagine is like suppose the US suppose that the military like accidentally drops a nuclear bomb uh it doesn't detonate but it like you know five of six safeguards are off they drop it off on my field it like you know damages my crops a little bit or it's kind of nasty I can imagine a world in which I bring like a 1983 lawsuit to the government and not only do I like try and sue them for like the minor you know damages to my actual property but I also try and sue them for like hey you nearly set off a nuclear bomb that would have been like super bad um does that strike you as a as a way that this kind of change could be implemented so may but there's a there's a lot of complications in that context so 198 section 1983 there's like lots of rules about what it applies when this like waiver of sovereign immunity uh works but yeah I think that lawsuit is going to be tough I also think like I don't know it doesn't necessarily make sense to me normatively in that context that you would you know um the government's not like a profit maximizing actor in any sense and so like is liability the right tool to you know basically you'd be putting like the government paying means the Public's paying right um does that like change the incentives of the military in the right way uh not obvious to me that it does um so you can think of of torad generally serving two functions right there's a compensation function and a deterrence function right um and so I'll in the in the context of suing the government I tend to think the compensation function is a lot more important um where whereas in the private context um I tend to think the the deterrence function is more important and the compensation is sort of like a happy byproduct of that um and so pun of Damages are really about deterrence right they're not about compensation right um there's even a proposal I have in the paper that maybe not all the pun damage should even go to the plaintiff right um and so do do I really think like the government's going to be that much more careful the military is could be that much more careful with nuclear weapons if there's this liability like maybe but if not obvious to me okay fair enough um yeah I guess like you mentioned you could also imagine having a similar sort of scheme for like lab bios safety accidents like presumably some of those are run by like private companies um maybe something there could happen yeah no I think to the extent that's not preempted by the regulations I think that would be a benign effec of this that that like you know and maybe it would be really tough to ensure a lab that's doing uh that's doing you know g a function research right and maybe that would be okay right um and you make it a lot more expensive and then you'd have to say well like if the social value of this is large enough fine you can get a big enough Grant or a big enough what um you know expected profit from doing this research then then okay but like if you can't then that suggests that this is an socially valuable activity at the level of safety that you're able to achieve and so you just shouldn't be doing it sure um so another question I have um more about the punitive damages change is there stuff like this in criminal law where there are additional like penalties for things that like that the government might not have caught you doing so if they didn't catch you like it's hard to know like how we know that you did them um oh I I mean like uh like punishing a crime like more severely because we thought we might not have caught you so certainly your past record even if it's not like a conviction is taken into account in the sentencing context um but but also I think that an example that might map onto this that might be getting at the the same sort of idea is like you know attempted murder is different from assault right so right right so you get a longer prison sentence if you like attack someone and you're trying to kill them even if you don't than you do if you if you just like beat someone up with no with there being like no indication that you were trying to kill them right and so I think that that's a similar idea going on in this in that context interesting yeah so actually just picking up on a thing you said earlier um yeah in terms of difficulties of applying this kind of liability scheme to State actors sometimes a thing people talk about is this possibility that AI laabs will get nationalized or somehow there's going to be like pretty tight intermingling between like the government and like AI development um would that pose difficulties for this sort of liability scheme so I think in a world where the the government is taking over uh AI companies I think they're unlikely you know there's something called sovereign immunity so you can only sue the government when they wave that when they allow you to sue them right I don't think it's super likely as sort of a predictive matter that the government's going to want to expose itself to a lot of liability and punitive damages in that scenario so that that's one question is like whether this would be likely to happen in that world and then there's another question of of is liability like a useful tool in that world I don't think the government responds to Financial incentives in the same way that private parties do right and so like if we're in that world where they where they're nationalizing it both maybe because they're worried about about risks but also because like you know they're worried about like an AI arms race between like us and China or whatever right um is the risk of getting sued like really going to change their calculations that much not obvious to me that that that has the same uh sem instead of alignment effects that it does in the private context and so um you know I think in some ways you have lower risk in that world but I think in other ways that's a more dangerous world it's not not obvious to me on balance like whether I'd rather live in that world uh you're sort of moving all the like the key decisions like to the political process and and part of the the advantage of liability is like yes you need you need the political process to like get the like high level decisions in place but then you're like Shifting the onus to these like private actors that have like that have like in theory at least more aligned incentives right as opposed to trusting like elections and Regulatory processes and whatever military decision-making to make the right decisions fair enough okay so I'd like to change topic a bit now um mostly when I see people coming into um you know doing work on AI alignment um you know usually they're they're either you know an AI researcher who's like you know come across alignment concerns in the course of AI being in the AI sphere or like know one of these like died in the wool young EA professionals who like got into AI that way it seems like uh my my understanding is that you have like a decent background in environmental law um how how did you come across AI stuff so I've been sort of loosely affiliated with the rational SDA communities for a long time um and so I've been aware of the AI risk problem for over a decade um I never really until pretty recently considered work out it professionally it wasn't obvious what role I would play um I wasn't thinking in terms of sort of what policy tools or legal tools would be relevant um uh and so you know it seemed like I'm glad some like technical researchers are working on this but I thought it it as like a technical problem um I guess in the last couple years I started to reconsider that and then uh last summer I did this Fellowship called pibs principles of intelligent behavior in biological and social systems uh that brings together sort of like you know non-computer I mean there were a couple computer science ml types there but it was mostly um you know people in social sciences economics philosophy uh I was the only lawyer that was part of this program but it was about you know 15 20 people um that each are assigned a mentor so I was working with Alan Chan from MAA um and he was really helpful on sort of like helping getting me up to speed on the technical side of some of these questions um and so um yeah I did this Fellowship it you know we all spent the second half of the summer in prag together working out of a co-working space there and got to learn from other people or doing a project in this area um so that that's the sort of causal story of how I got involved in this um sort of intellectually um I think it's like not as big of a departure for my work on environmental La policy as it might seem um you know we talked earlier about how like you know Carbon tax is sort of like um AI liability and so um I tend to approach you know my scholarship with a sort of you know law economics frame and sort it's thinking through in a different context but I'm um um you know thinking through a lot of issues that I'm I'm comfortable you know with sort of the the principles involved uh from other contexts um I also teach CH law and so um it was sort of natural to to think about how to apply how that domain could be helpful um to aisk yeah like in some ways I feel like there's a sort of strong through line of um it seems like some of your work is on like you know liability you know like changes the liability system um and it seems sort of of a kind with that kind of work yeah so I had a recent paper I think you read on on the hand formula which is this test for breach of Duty and negligence cases and ways in which it might fail um so that was a more General um you know critique of of the way torw works and now um I think this paper sort of implicitly has a lot of has a lot of broad critiques I think you sort of suggested this like a lot of the things that I think should be changed in this context really um are problems with with Tor Doctrine generally that that the AI risk context is really just pointing up um uh and like really exposing um and so in principle you could have written this paper without ever mentioning AI um I think it's it's worth being exed about like why I care about it um yeah but uh yeah so suppose we like really do want to make these kinds of changes um how you know if the like AI exess Community wants to push for this kind of thing you know into legal profession um or you know among legislators what do you think that looks like so I think there's a few different channels of influence so the one that I have the most control over is just like convincing other legal Scholars that this is a good idea and then like you create a consensus around that other people write articles saying it's a good idea and then like there's a bunch of articles for for litigant to site to judges when they try to persuade them uh to adopt this right so that that's one channel of info that's like through the academic pathway um another is just like lawyers out there bring these cases right um and and try to convince judges to adopt these different um these different doctrinal changes um you know you can do strict liability even before you have a case where there's catastrophic risk implicated and then as soon as there's any case where there's a plausible argument for um for insurable risk then then try to raise the per of Damages and and get courts to consider that um and then on a parallel track I think we should be talking about legislation um both for strict liability and for punitive damages and potentially for other things like liability insurance requirements and changing the way mortality damages work um those are all things that could be done by state legislation and so I think that's that's people should be who are interested in doing policy advocacy that's definitely um an Avenue that um that I said I'm involved with talking some state legislators I'd like to see more work on in some states this could be done via B initiative so certainly in California um it's pretty easy to get a initiative on the on the ballot um I think strict liability is like a pretty straightforward yes or no question that you could um that you could have a vote on I think it would be a little tougher to do it for um punitive damages but I wouldn't put that off the table um uh liability insurance I think would be hard but you know California seems to let lots of crazy stuff on the Bell initiative so maybe um yeah it's it's pretty in I don't know if you if you're familiar with the state constitution of California but just like every two years it gets added with some like random stuff that some ballot measure passed like like I think the California state constitution includes text from a ballot measure that's basically about a scheme to like do some sort of trixy accounting stuff to maximize the amount of Medicaid money we get from the federal government so yeah seems like a lot of stuff happens in California great way to like I don't think California has adopted the optimal uh initiative process but given that it exists I think it should be used for this good purpose and so um I'd love to see someone and I'd be happy to you know advise on on any project that wanted to pursue an initiative like that in California yeah one one thing I wonder um so you mentioned that like you know this is kind of a law and economics framing on um the problem of AI risk and my my impression is that law and economics has like some kind of market share among the legal profession but like not overwhelmingly so such that like every idea that the law and economics people think is good gets implemented I wonder like if it makes sense to kind of make a more make a different kind of case for these kinds of reforms that looks less law and economics and more something else but I law and economics is the part of the legal profession that I know the most about so I don't actually know any examples of other ways of thinking yeah I mean I think there's a sort of critical economy argument that that I think is maybe more popular on the left that would be uh um you know one economics tend to be right coded right I don't think that's inherent in the sort of Paradigm but I think like because of the way political coalitions are and because of like um uh because of the way that you know like uh the people who are funding a lot of the LA economics research you know tend to have more sort of right-wing political goals I don't think like my proposal here is particularly like right-wing um but um but I think a lot of the skepticism of lot economics tends to be from um from more Progressive or liberal uh folks um so I think you would want framings that appeal more to them I think um I think this proposal is like is like more I think susceptible to critiques from the right um since I'm just like arguing for more liability for like to make life like less friendly for these big companies right um but I think there's also a lot of um you know Tech backlash on the right so I'm not sure that like that um and it's not obvious to me how that plays into the politics of this um so yeah I guess it depends whether you're you're asking about sort of like how to convince like fellow academics and should there be people like writing other Traditions like providing like a different case for this I think there's may be scope for that I don't know exactly what that would look like um and then certainly you're going to want to use different arguments in different context when you're trying to persuade like a political audience fair enough um I guess uh my next question is on the rather than on the advocacy side you know a bunch of people listen listening to this are kind of technical researchers um what kinds of things what kinds of technical research would be good compliments to this kind of proposal would make it work better right so I think in particular uh you you want research to be able to implement various aspects of of the the damages formula um the per of Damages formula and also uh to be able to implement the liability insurance requirements um so I could see a role for model evaluations both in uh deciding uh how much coverage like what what the coverage requirement is like a regulator could use a set of like dangerous capabilities evaluations to um decide how much Insurance you need to take out before you can uh deploy the model or or if it's like pre-training Insurance then then to train like the next model um uh and similarly I think insurance companies could use a slightly different set of evaluations in their underwriting process um um and then um in the the sort of liability um or punitive damages context um you know we need to estimate these different parameters in in my my liability or or damages formula right so so one thing we want to know is like how how much uh risk should the uh trainer or deployer of this model have known that they were undertaking um when they when they made the the key decision right and I could see a role for technical research trying to um to get a handle and sort of like reduce our uncertainty about that question um there's also um the question of like well this particular harm that was caused like how elastic is that with the uninsurable risk So like um for every unit of risk mitigation you get um say you spend a million dollars to reduce um the risk of this particular harm by 20% like how much does that reduce the unendurable risk um that's another key key parameter um like and how much does that do that like relative to like a generic harm that this system might have caused right and so yeah I think work on like trying to figure out how to estimate those parameters would be really useful I have a uh a blog post up on the EA Forum that we can link to um that that lays out sort of like lays out the formula and like the ways in which technical researchers can help solve these problems that you can point people to sure yeah and it seems like so lot of those seem like sort of General ways in which like any AI governance effort um researchers could help with it strikes me that like trying to get a sense of like for any given failure how much would mitigating that failure have you know mitigated against like really bad catastrophic AI risks I think like perhaps that's like a unique thing about your proposal that like researchers might not have already been think about great so before we wrap up um is there anything that um I didn't ask but you kind of wish that I did so I want to say something along the lines of like what decisions are you trying to influ yeah the question is maybe like what decisions are you trying to influence with this right um yeah so what what decisions are you uh trying to influence with it okay so there there's there's a few different ways I can I think you can think about like how um how this would influence the behavior of AI Labs right so one scenario you think about is you know say open AI trains uh GPT 6 right and um they they run it through a meter evaluation and it shows some some dangerous capabilities and like maybe they they've come up with some alignment valuations it's like well maybe this system is misaligned right um You shouldn't deploy it right and the question is like what do we do now right and there's like a a a sort of like tradeoff or like um a different set of options where like there's like cheap dumb easy things you could do that like wouldn't really solve the problem you could just like run rhf to Iron Out the like specific failure mode that that you that you noticed and like almost certainly that wouldn't solve the underlying misalignment but it would be really easy to do and there's like some moderately costly thing where you like roll it back a little bit and then retrain it and then there's some like somewhat more expensive thing um where you do some adversarial training maybe you roll it back further right um uh and then there's like a really expensive thing where you say like none of the tools we have right now are good enough we need to like wait until we have better interpretability tools or we like make some fundamental breakthroughs in in in alignment Theory or whatever it is right um and you've got different actors within the lab sort of you know somewhere more cautious somewhere are less right and there's a debate about like which one of these options we should take um and I want to like empower the voices um for more caution saying like well like maybe they're motivated arily by like altruistic impulses or whatever but I want to arm them with arguments saying like even if all you care about is the bottom line like we should do the thing that is in like the in like you know produces you know the best expected social returns right because like that's going to actually be what favors the bottom line right um and so a lot of this is like you know I think you see um when uh that a lot of these these leading Labs were sort of like founded with these high ideals right so open AI what what was was founded by people really worried about I risk and now there's a lot of criticism of them that they're moving too fast that they're taking too many risks right um you know Sam Alman was was saying well oh we need to we need to move fast so we we don't have a compute over overhang but then wants to get S trillion dollars to invest in in improved compute so there there seems to be something uh a little confusing there um and obviously there was the whole uh Kur fuffle with the board over him being fired right and so we we've seen that sort of these um these sort of internal governance mechanisms are not things we can we can totally rely on I think similarly even for for a lab like anthropic which was founded by people you know you know people who defected from the uh the alignment team at open AI um and there were statements like well we're not going to try to push forward uh the frontier on capabilities um we just want to like have near Frontier models um so we can do alignment research on them and then CLA 3 comes out and there's these claims that well it's better on all these metrics than any any M that's been out there before so it seems like you know there there's very powerful Financial incentives and other incentives for for these companies to like build commercializable products and to push forward um on capabilities I think even people that are like very well motivated like are having trouble resisting uh those forces and so um I think having a liability regime that sort of puts a a thumb on the other side of the scale that like gets that like makes it in their sort of narrow interest um to to do the thing that they say they want to do that is like in in the interest of society at large um would be really valuable and so however you want to think about whether do you think about this in terms of like competitiveness or or alignment taxes right um if we can tax misalignment effectively through this liability I think um that could be really valuable and and and you don't have to think of it necessarily as being hostile to to at least to everyone in these these AI Labs I think I think some some people at least Within These Labs would sort of welcome the fact that it's it's it's sort of empowering them to stand up for safety and to not um just seem like oh this is some altruistic concern it's actually like part of the interest of the company gotcha so I guess to wrap up um if people are interested in following your research on this or just on other topics how should they do so sure so you can find all my papers on ssrn once I put them publicly we can include a link to that um there's only one on AI so far but I expect to do two more more work in the future they can follow me on Twitter uh at gelore Wild we um and then I've got a couple posts on the EA Forum one just providing a high Lev summary of the paper and then another that I mentioned uh sort of explaining how technical AI safety researchers can help Implement uh this framework um and so I would direct people to those um Dyan Matthews also did a write up of the paper in box that we can link to um I think that's about it gotcha well thank you for coming on the show thanks this was great this episode is edited by Jack Garrett and Amber Don's helped with transcription the opening and closing themes are also by Jack Garett financial support for this episode was provided by the long-term future fund and light speed grants along with patrons such as Alexi maaf to read a transcript of this episode or to learn how to support the podcast yourself you can visit axr p.net finally if you have any feedback about this podcast you can email me at feedback atrp don't now [Music] [Laughter] [Music] [Music]
Related conversations
AXRP
3 Jan 2026
David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -0 · 108 segs
AXRP
7 Aug 2025
Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -5 · 133 segs
AXRP
6 Jul 2025
Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -4 · 72 segs
AXRP
1 Dec 2024
Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -6 · avg -7 · 120 segs
Counterbalance on this topic
Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.
Mirror pick 1
AXRP
3 Jan 2026
David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Spectrum vs this page
This page -14.98This pick -10.64Δ +4.34
This pageThis pick
Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.
Spectrum trail (transcript)
Med 0 · avg -0 · 108 segs
Mirror pick 2
AXRP
7 Aug 2025
Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Spectrum vs this page
This page -14.98This pick -10.64Δ +4.34
This pageThis pick
Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.
Spectrum trail (transcript)
Med 0 · avg -5 · 133 segs
Mirror pick 3
AXRP
6 Jul 2025
Samuel Albanie on DeepMind's AGI Safety Approach

Spectrum vs this page
This page -14.98This pick -10.64Δ +4.34
This pageThis pick
Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.
Spectrum trail (transcript)
Med 0 · avg -4 · 72 segs