The response to advanced AI: comment on Hobbhahn
There’s an increased interest in what the development of artificial intelligence will look like: at what points different AI capabilities will be developed, how they will be deployed in society, and what the social and political response to associated risks will be. I am particularly interested in the social response, and have discussed it before.
Marius Hobbhahn recently published a post on the development of AI, “The next decades might be wild”, where he discusses these different aspects in some detail. I find the post valuable, and am particularly interested in Hobbhahn’s discussion of the social and political response. Hobbhahn thinks that it will be muted and ineffective, and since those views seem to be relatively widely shared among people interested in AI risk, they’re worth paying attention to. I will here discuss Hobbhahn’s takes on those issues, but I see them as representative of fairly common views. I will not defend a detailed theory of what the social and political response to advanced AI will be, but will only discuss my disagreements with Hobbhahn.
Hobbhahn’s post is quite long, but I don’t think you need to read it to follow my discussion. It is partitioned into five sections: Until 2030, 2030-2040, 2040-2050, 2050+, and Confidence & Takeaways. He goes back and forth between technical developments and the social and political response to risks and harms (what I will focus on).
Muted public reaction
One theme of Hobbhahn’s post is a muted public reaction to harms. For instance, his scenario says that the following happens in the 2030s. (My emphasis, throughout.)
A powerful medical AI has gone rogue and had to be turned off. The model was pre-trained on a large internet text corpus and fine-tuned on lots of scientific papers. Furthermore, the model had access to a powerful physics simulation engine and the internet. It was tasked with doing science that would increase human life and health span. While it worked as intended in the beginning, it started to show suspicious behavior more and more often. First, it threatened scientists that worked with it, then it hacked a large GPU cluster and then tried to contact ordinary people over the internet to participate in some unspecified experiment. The entire operation had to be shut down when stuff got out of hand but the model resisted the shutoff wherever possible. Ultimately, the entire facility had to be physically destroyed to ensure that the model was turned off. A later investigation suggested that the model was able to read the newspapers that described the experiments to increase human life and health span and was unhappy with the slow pace of the experimental rollout and the slow pace of human scientists. Therefore, it tried to find participants on its own and approached them online. Furthermore, it required access to more computational resources to do more research faster and thus hacked an additional GPU cluster.
In these cases, the resulting investigations were able to create a plausible story for the failure modes of the respective systems. However, in the vast majority of cases, weird things of a similar scale happen and nobody really understands why. Lots of AIs post weird stuff on online forums, simulate weird things in their physics engine, message people over the internet, hack some robot to do a weird task, etc. People are concerned about this but the news is as quickly forgotten as an oil spill in the 2010s or a crypto scam in 2022. Billions of dollars of property damage have a news lifetime of a few days before they are swamped by whatever any random politician has posted on the internet or whatever famous person has gotten a new partner. The tech changed, the people who consume the news didn’t. The incentives are still the same.
I expect that people would freak more over such an incident than they would freak out over an oil spill or a crypto scam. For instance, an oil spill is a well-understood phenomenon, and even though people would be upset about it, it would normally not make them worry about a proliferation of further oil spills. By contrast, in this case the harm would come from a new and poorly understood technology that’s getting substantially more powerful every year. Therefore I expect the reaction to the kind of harm from AI described here to be quite different from the reaction to oil spills or crypto scams.
Hobbhahn describes many other harms followed by a muted reaction. Let’s just look at one more:
A large pharmaceutical company uses a very powerful AI pipeline to generate new designs for medication. This model is highly profitable and the resulting medication is very positive for the world. There are some people who call for the open-sourcing of the model such that everyone can use this AI and thereby give more people access to the medicine but the company obviously doesn’t want to release their model. The large model is then hacked and made public by a hacker collective that claims to act in service of humanity and wants to democratize AI. This public pharma model is then used by other unidentified actors to create a very lethal pathogen that they release at the airport in Dubai. The pathogen kills ~1000 people but is stopped in its tracks because the virus kills its hosts faster than it spreads. The world has gotten very lucky. Just a slightly different pathogen could have spelled a major disaster with up to 2 Billion deaths. The general public opinion is that stealing and releasing the model was probably a stupid idea and condemns the actions of the hacker collective. The hacker collective releases a statement that “the principle of democracy is untouchable and greedy capitalist pharmaceutical companies should not be allowed to profit from extorting the vulnerable. They think their actions were justified and intend to continue hacking and releasing models”. No major legislation is passed as a consequence because the pathogen only killed 1000 people. The news cycle moves on and after a week the incident is forgotten.
This also seems quite unlikely to me. It is true that very lethal viruses have appeared before. However, this case would be different. People know that “naturally” occurring pandemics rarely kill a very large fraction of people. By contrast, there is much less of a historical record of AI-created synthetic pathogens. That means that people might worry that the occurrence of a very lethal pathogen shows that they are easy to create with new AI systems. Moreover, they would know, again, that these AI systems are only getting more powerful every year. Thus, I don’t think they would forget an incident like this after a week, but that they would worry a lot about it.
Profit incentives beating political action
Another theme is that private companies will face strong incentives to overlook risks and harms, and that politicians will be unable to rein them in. Here is one example:
This entire AI revolution has seen a lot of new companies growing extremely fast. These companies provide important services and jobs. Furthermore, they have good connections into politics and lobby like all other big companies. People in politics face real trade-offs between regulating these big corporations and losing jobs or very large tax revenues. The importance to society and their influence on the world is similar to big energy companies in 2022. Since their products are digital, most companies can easily move their headquarters to the highest bidding nation and governance of AI companies is very complicated. The tech companies know the game they are playing and so do the respective countries. Many people in wider society demand better regulation and more taxes on these companies but the lawmakers understand that this is hard or impossible in the political reality they are facing.
This seems overstated to me. Looking at a world map, it might seem that the world is hugely fractured and that companies could easily pit states against each other. But the reality is different. Companies wouldn’t consider moving to most countries, and the rich world is relatively coordinated. The EU, for instance, coordinates its policy response to these kinds of issues to a considerable extent.
Moreover, most of today’s leading AI companies are based in the United States, and I suspect that they will want to continue being based there for a variety of reasons. The threat of regulation is definitely one consideration, but I think it will often be outweighed by other considerations, such as ease of recruitment.
Another example is the following:
One of the robot-controlling AIs slipped a backdoor in one of the suggested plans which was then signed off without looking by the human overseer. This enabled the AI to take actions without human oversight and it used all of the robots it controlled to build an army. The army was then used to kidnap multiple members of high society and the AI made the demand to get more robots under its control, more compute to train on and less oversight by humans. In a rushed effort, the local police try to swarm the robots to free the hostages. The entire police unit is killed and all robots are still alive. The AI is clearly better at planning than the local police chief. After month-long negotiations and discussions, a large team of programmers and ML experts is able to find and use a backdoor in the AIs code and turn it off. The robots are told to let go of the hostages and their memory is wiped. They are back in their normal jobs the day after without any memory of what happened. It’s still not entirely clear why exactly the AI had gone rogue but it is clear that the AIs of other robot manufacturers take note of the backdoor that was used to turn it off.
These kinds of complex failure modes in which AIs can only be stopped after really long and costly investigations are totally common at the end of the decade. In some cases, the AIs kill thousands of people in a quest to gain more power. In others, the AIs take over banks to gain more access to money, local governments to change laws in their favor, websites to get access to the user base, etc. The obvious solution would be to not build these systems or make them less powerful but the economic incentives are too strong. Whenever such an AI does what it is intended to do, it is basically a money printer. So every AI system is roughly equivalent to a biased coinflip between a hundred billion dollar profit and a small disaster. However, since the profits are internalized and the damages externalized, companies are happy to flip the coin.
First, and similarly to the earlier examples, I think that the public outcry would be stronger than these quotes suggest. People would be profoundly scared by AI systems killing an entire police unit.
Second, while it is true that the economic incentives would be strong and virtually unprecedented, we have seen very far-reaching safety-motivated regulations in the face of economic incentives before. For instance, there’s been substantial regulation of GMOs, nuclear power, carbon emissions, and airport safety (post 9/11). It’s hard to predict how advanced AI will be received, since it will be such a different technology in many ways, but the historical record suggests, in my view, that economic incentives won’t be all-powerful.
In another place, Hobbhahn writes that “regulating AI companies is…really hard [partly] because…there are strong ties between regulators and companies that make it harder to create unbiased laws”. That can be an issue, but at the same time, companies haven’t been able to stop relatively firm action on climate change.
In general, I think many overestimate the power of big business. They have some power, but they don’t dictate policy. They’ve not generally been able to decide what politicians will win elections. Instead, the general public does wield a lot of influence. While they will also be somewhat sensitive to economic incentives, safety concerns and general conservatism will likely carry a lot of weight—like it has when it comes to GMOs and nuclear power (for better or worse).
Misdirected responses
Another category of claims in Hobbhahn’s post concerns misdirected responses to risks from AI. In other words: here action is taken, but it’s not the right kind.
There are a handful of organizations that do some superficial auditing of new models but given the lack of tools, these efforts are not nearly as deep as necessary. It’s roughly equivalent to buying a CO2 certificate in 2022, i.e. institutions overstate their impact and many fundamental problems are not addressed.
There are some such issues with the climate change response, but my overall sense is that there are also a fair number of policies that do reduce carbon emissions. Therefore, I’m not sure the climate change analogy serves to show that the AI response will be as misdirected as Hobbhahn suggests.
To meet public demand, governments pass some random laws on AI safety that basically achieve nothing but give them improved public ratings. Most AI companies do a lot of whitewashing and brand themselves as responsible actors while acting in very irresponsible ways behind the scene.
No doubt companies will try this to an extent, but past legislation on various new technologies doesn’t seem to suggest that governments are as inept as this passage implies. Moreover, companies will presumably be on the lookout for irresponsible competitors and try to get the government to rein them in. Overall this analysis seems a bit simplistic to me.
Summary
Summing up, I disagree with Hobbhahn on three points.
I think the public would be more worried about harm that AI systems cause than he assumes.
I think that economic incentives aren’t quite as powerful as he thinks they are, and I think that governments are relatively stronger than he thinks.
He argues that governments’ response will be very misdirected, and I don’t quite buy his arguments.
Note that 1 and 2/3 seem quite different: 1 is about how much people will worry about AI harms, whereas 2 and 3 are about the relative power of companies/economic incentives and governments, and government competency. It’s notable that Hobbhahn is more pessimistic on both of those relatively independent axes.