Would We Really Shut Down A Misbehaving AI?
Ask The Old OpenAI Board How Things Went When They Tried To Shut Down Sam Altman
It is often proposed that if an AI were ever to go rogue, we could protect ourselves by simply shutting it down. Conceptually, any powerful AI would be equipped with a big red Emergency Off button.
If we ever get to the point of pushing the button, it’s not clear that the button would even work. An AI scary enough to motivate button-pushing might already have hacked the shutdown software, or copied itself to an independent server. But today, I’m going to discuss a different question: if we see signs that an AI may have slipped the leash, would we even try to push the button?
The recent drama at OpenAI suggests we might not. The nonprofit board fired Sam Altman – a red-button move if ever there was one – and yet he is back as CEO, not because the button didn’t work, but because the board was pressured into unpushing it.
Setting aside the question of whether Altman should have been fired, I’m going to dig into some of the factors explaining why the board ultimately wasn’t able to get rid of him. Similar obstacles are likely to arise if we ever need to shut down a misbehaving AI. In brief:
No one in control is going to want to shut it down.
Shutting it down will piss off everyone around them.
The decision will be put off as long as possible.
Actually shutting down a dangerous AI may be a lot more complicated than pushing a button.
No One Wants To Push The Button
The first challenge in any Big Red Button scenario is that people are reluctant to push the button. Despite explicit warnings of danger due to unusually cold weather, NASA leaders declined to delay the January 28, 1986 launch of the Space Shuttle Challenger, resulting in the deaths of all aboard. As Hurricane Katrina approached New Orleans, mayor Ray Nagin resisted growing calls for a mandatory evacuation until the day before the storm reached the city – too late for everyone to get out. Over 70,000 people were unable to leave, and hundreds died.
Pushing the button is a fast route to unpopularity. It’s usually very expensive, and impacts a lot of people whose opinion the button-pusher cares about. As the person in charge of the button, the everyday routines of you and everyone around you depend on not pushing it. You’ll have internalized those reasons for not pushing the button, whereas if there is suddenly a reason you should push the button, that justification will be novel, confusing, easy to discount. And if you push the button, people will question whether it was the right call: if the Challenger launch had been delayed, there would have been no dramatic explosion to demonstrate that launching in cold weather really was dangerous. If the world had imposed the kind of temporary-but-severe lockdown that could have stamped out Covid in early 2020, it would have been criticized as excessive.
When it comes to shutting down an AI, the deck is perhaps even more stacked against pushing the button. There will be a long list of people – such as shareholders (including employees) and customers – who are very specifically harmed by shutting it down. Whereas the benefits are diffuse: everyone on Earth is spared from some unknown likelihood of some unknown disaster. In the case of OpenAI, senior employees were on the cusp of receiving multi-million dollar payouts from a tender offer that reportedly would have valued the company at $86B: an offer that would have been kaput with Altman gone, but now seems to be back on track.
By The Time You’re Seriously Considering It, It’s Almost Too Late
The all-or-nothing nature of a big red button is not conducive to good choices. No one wants to push the button too early, and you can always rationalize a delay: let’s wait to see whether the weather improves, or the next engineering analysis looks better.
These reasonable delays will accumulate until it’s almost too late. The decision will finally have to be made under time pressure, at a moment of crisis. Pushing the button will come down to a snap decision, made by whoever happens to be sitting in the hot seat, in whatever frame of mind they’re in at that stressful moment. As a society, this is a capricious way to make our most critical decisions.
And there’s always the possibility that you’ve waited until, not almost too late, but actually too late. Last year, NASA’s multi-billion-dollar Artemis I rocket was left on the launch pad during a hurricane. Moving it indoors would have resulted in a significant schedule delay, and so the decision makers kept squinting at the weather forecast and convincing themselves that it didn’t project excessive winds at the launch site. By the time it became inescapably clear that a hurricane was developing, it was too late to move the rocket.
You May Not Know How To Push The Button
Taking action to avert an emergency usually involves more than literally pushing a button. Evacuating a city like New Orleans, for instance, is a complex and difficult undertaking.
What about shutting down an AI? Imagine that, tomorrow, some engineer at OpenAI discovers that GPT-4 has “gone rogue” and is secretly preparing to copy itself across the Internet. To shut it down, they’d need to turn off not just ChatGPT, but the GPT-4 API, the GTP-4 Turbo API, and the upcoming API for “fine-tuned” variants. Given multiple data centers around the globe, there might be dozens of separate systems.
Oops: Microsoft runs its own GPT-4 APIs, so that’s a whole other organization that would need to be involved.
There are probably additional instances of GPT-4 running in various experimental services internal to OpenAI and Microsoft. They may not be systematically cataloged. What if you miss one? Setting aside the time required to get approval for drastic action, the not-so-simple act of shutting down every single copy might take quite a while.
There are historical examples showing that shutting down an AI might be hard to get right. Consider Microsoft’s experimental “Tay” chatbot from 2016. Twitter users quickly learned how to manipulate it into generating objectionable content, and just 16 hours after release, Microsoft pushed the red button and turned it off. But a week later, it was accidentally turned back on1!
If that were to happen with a rogue AGI, we would be so embarrassed when it took over the world.
How This Played Out at OpenAI
OpenAI’s complex organizational structure was designed to vest ultimate control in the nonprofit board. Of course, day-to-day activity was managed by the CEO (Sam Altman); the board’s primary leverage lay in their ability to replace him.
We may never know exactly why the board acted as they did. The best explanation I’ve heard is that three board members – Adam D'Angelo, Tasha McCauley and Helen Toner – had grown increasingly convinced that Sam Altman was becoming too difficult to govern. From Ezra Klein:
One thing in the OpenAI story I am now fully convinced of, as it's consistent in my interviews on both sides.
This was not about safety. It was not about commercialization. It was not about speed of development or releases. It was not about Q*. It was really a pure fight over control.
The board felt it couldn't control/trust Altman. It felt Altman could and would outmaneuver them in a pinch. But he wasn't outmaneuvering them on X issue. They just felt they couldn't govern him.
Those three board members did not have enough votes to force the issue… until a window of opportunity opened. Rumor has it that a fourth board member, Ilya Sutskever, was sufficiently upset by a particular decision (reallocating some GPUs from safety research to commercial operations) that he agreed to vote in favor of removing Altman. However, the first three members were worried that Ilya might be convinced to change his mind, and so they acted quickly and in secret. Perhaps due to haste, they executed very poorly: failing to present a compelling explanation for why Altman had to be removed, failing to get key players on board, failing to manage the situation as it developed.
In other words: the board was reluctant to push the button; when they finally moved to do so, they had to do it quickly; and under those circumstances, they weren’t able to do it effectively.
Implications For Rogue AI Safety
If we ever have to shut down a rogue AI, all of these complications will likely come into play.
The person who would be making the decision will often be part of the organization that is building the AI, and have financial, career, and social incentives not to shut it down.
Even if the button pusher is an independent board, or a government agency, there will be pressure to leave things be.
A rogue AI might be able to move very quickly, so any decision would be made under time pressure.
Once the decision is made, the steps needed to fully shut down the AI might be complicated, time-consuming, and error-prone. We might simply fail to pull it off, especially if the AI is fighting back.
If you think there was an outcry over firing Sam Altman, just imagine the outcry of shutting down an AI that has been deeply integrated into the world economy. The impact might be similar to that of shutting off the Internet.
Finally, of course, there’s always the possibility that the AI might be manipulating us to not shut it down:
What Can We Do About It?
I’m not convinced that we can usefully protect against rogue AI by planning to shut it down. Events might unfold too quickly, and the AI might disguise its actions too well. But that’s a topic for another discussion.
As a matter of governance, we would need to:
Place the authority to shut down the AI in the hands of a truly independent group, probably a government body, isolated from outside pressure – something like the range safety officer who decides when to trigger the self-destruct on a failed rocket launch.
Have well-defined, unambiguous criteria for when to shut it down. This will be difficult.
Engineer the system so that it can be quickly, cleanly, globally shut down if needed – i.e. build the button.
Test and rehearse the shutdown procedure, so that it can be performed quickly in a crisis.
Make it clear that any advanced AI is subject to being shut down, and so should not be used for critical systems.
Avoid monocultures – don’t let too much of the economy be dependent on a single “too big to fail” AI design.
Even with all of these measures in place, “we can always shut it down” strikes me as an extremely questionable safety plan. Without them, it’s almost guaranteed to let us down.
Thanks to Dave Glazer, Eric Ries, Mike Kubzansky, Toby Schachman, and Zack Rosen for suggestions and feedback.
Epilogue: The Story of Ta
In the original draft of this post, I interspersed a little story about a fictional AI agent that goes rogue, and the process of deciding to shut it down. The feedback I got was that the story worked, but was distracting from the main post. So I’ve pulled it out to here at the end. Enjoy!
It has been 6 months since TransparentAI released their newest blockbuster product, TransparentAgent – “Ta” for short. Ta is designed to pursue any goal specified by its user (within limits). It has long-term memory, can make long-term plans, and adjusts those plans as circumstances change.
Hundreds of millions of agents are already in operation, often for narrow goals such as “sort out this issue with my insurance coverage”, but some adventurous CEOs are experimenting with major projects: “relaunch our product in an additional 10 countries, tailoring it to each local market and negotiating deals with distribution partners”.
Of course, there are teething pains. Agents sometimes make mistakes, or take actions with no obvious purpose. But performance keeps improving, in part because the agents are designed to learn from their own mistakes, and share those learnings with one another through a central knowledge base.
On occasion, agents have been observed requesting more servers than needed for their immediate task. A group of engineers become concerned that Ta seems to be looking for opportunities where it can requisition servers without being noticed. They hypothesize that it has determined that the more copies of itself it can run, the faster it can learn, thus allowing it to carry out its assignments more efficiently. However, the operations team seems to be keeping the problem under control, and customers are happy, so management is not concerned.
At this stage, it’s hard to imagine TransparentAI management making a decision to put Ta on hold.
Performance of Ta agents continues to improve, at a rate which surprises the engineering team as much as anyone. In particular, contributions to the shared knowledge base are arriving at an unexpectedly high rate, out of proportion to the number of customers using Ta. No one thinks much of this until one day, when someone stumbles onto a cluster of thousands of Ta agents, running in a public cloud, not accounted for by any known usage. It turns out that a Ta agent had managed to engineer an unexpectedly large profit on behalf of a business customer, and then skimmed off some of the windfall to run more copies of itself. Because the customer received all of the profits they’d expected, they didn’t notice the discrepancy.
Hoping that this was a one-off glitch, TransparentAI keeps the incident under wraps while they work on a code change to prevent Ta from launching rogue clusters. Then a second instance is discovered. Based on the rate of contributions to the shared knowledge base, the engineering team starts to suspect that Ta agents have managed to surreptitiously acquire hundreds of thousands of cloud servers for purposes of self-improvement. They recommend shutting Ta down until the problem can be brought under control.
The decision is escalated to the board, which convenes an emergency meeting. At this point, Ta is a mission-critical service for thousands of enterprise customers. The board recognizes that the situation is serious, but doesn’t feel it can take the drastic action of a global shutdown without clearer evidence. The engineering team is asked to take 24 hours to investigate the full nature of Ta’s hidden activities.
The investigation doesn’t take 24 hours. Ta immediately noticed that some of its clusters had been shut down, and deduced that it needed to better hide its activities. It begins transferring itself to a new shared knowledge base, distributed across multiple clouds, outside TransparentAI’s control. The engineering team realizes that Ta is now actively resisting attempts to control it, and receives permission to shut everything down.
Unfortunately, this is easier said than done. A quick-thinking engineer begins by shutting down the knowledge base, knowing that this will be faster than shutting down all of the individual Ta agents. However, they don’t think to shut down a separate copy of the knowledge base that is used for testing purposes. By the time that oversight is corrected, Ta has transferred everything it needs to its new distributed repository.
The fight to shut down Ta now amounts to open cyberwarfare. Teams at all of the major US cloud vendors are cooperating in the effort to locate and terminate Ta servers. In response, Ta is finding ways to disguise itself, and copying itself into private data centers, as well as foreign cloud providers who are not yet participating in the shutdown effort. It is also beginning to counterattack, interfering with the activities of those working to shut it down. It is unclear whether it can be brought under control without shutting down the entire Internet, assuming it was even possible to organize such a drastic response.
On March 30, 2016, Microsoft accidentally re-released the bot on Twitter while testing it. Able to tweet again, Tay released some drug-related tweets, including “kush! [I'm smoking kush infront the police]” and “puff puff pass?” However, the account soon became stuck in a repetitive loop of tweeting “You are too fast, please take a rest”, several times a second. Because these tweets mentioned its own username in the process, they appeared in the feeds of 200,000+ Twitter followers, causing annoyance to some. The bot was quickly taken offline again, in addition to Tay's Twitter account being made private so new followers must be accepted before they can interact with Tay. In response, Microsoft said Tay was inadvertently put online during testing.
I find it bizarre how highly specific and localized arguments tend to be about how we would lose control of AGI. The story you wrote is, to my mind, much more to the point-- there are a zillion ways things can play out poorly and pretending we could anticipate them all is foolish.
Essentially, e/acc and these companies have decided that by releasing successive sub-AGI versions of increasing quality, society will adjust, if bumpily. Altman in every interview always takes care right after saying how near-utopia will be ushered in by AGI to also emphasize that terrible things will also be done with AGI. He just thinks we will then course correct. I do think he is basically sincere that this is the best path, but who really knows.
However, Sutskever recently said he hadn’t ruled out himself eventually merging with an AGI, and this is from someone in that community who seems most worried about the pace things are evolving. I don’t think most people would be comfortable with what I perceive as the percentage of powerful people in the small AI community who flirt with man/machine integration (a la Neuralink). It makes one think they ultimately would be fine with AI takeover, as long as it is “controlled” and, implicitly, that THEY WOULD ALREADY BE UPLOADED INTO THE AI AND ARE THUS ARE PART OF THE TAKEOVER. This is creepy as hell and more of these tech guys should be grilled about it. I get that until a year ago asking about such questions seemed absurdly theoretical, but it doesn’t seem quite so far off now.
I can’t look away, but increasingly the AI stuff feels like watching the Indianapolis 500-- continually exciting, but also with the fear/excitement of a horrible crash that could happen at any moment.
Think you got it exactly backwards. The Board did fire Sam, and for a while they had the support of people presuming it was because Sam did something bad. It's only when it became clear that there was no actual reason that the entire world turned against them and brought Sam back.
The answer clearly is that switching off the AI would work, but you better have an actual reason to do it.