Contra LessWrong on AGI
Your laptop isn't going to accidentally take over the world, but you might want to panic anyway
Some of you will be familiar with the website LessWrong. For those of you who are not, it is, roughly, a web forum for fans of Eliezer Yudkowsky’s Sequences. One of the major topics of discussion on the site is “can we stop AI from taking over the world”, generally with a conclusion of “probably not”.
Our take: while they raise some good questions, on the whole their conclusions are wrong and their concerns can be largely ignored.
Some Literary Allusions
Colossus: The Forbin Project is a 1970 American science fiction thriller filmfrom Universal Pictures … The film is about an advanced American defense system, named Colossus, becoming sentient. After being handed full control, Colossus' draconian logic expands on its original nuclear defense directives to assume total control of the world and end all warfare for the good of humankind, despite its creators' orders to stop. (ref: enwiki)
The trope of “evil AI tries to take over the world” has been prominent in literature for almost as long as computers have existed.
Gwern Branwen’s recent Clippy short-story captures most of the tropes in one short narrative.
Vernor Vinge’s 1981 novella True Names makes the point that an AI trying to take over is almost indistinguishable from extraterrestrials trying to surreptitiously do so.
Ender’s Game doesn’t particularly have AIs, but it does demonstrate a point about the folly of alignment as a great hope: if you tell an AI it is playing a game to take over the world, and it unknowingly takes over our world as a result, was it misaligned?
Last year, I said repeatedly that Russia probably wasn’t going to attack Ukraine in February.
When you are a blogger of no particular renown, you can look at a situation where you see a 5% chanceof an invasion and say “this won’t happen”.
When you are in charge of the defense of a country, when you look at a situation where you see a 5% chance of an invasion, you start your defenses as-if it is almost certain to happen. Certainly you will be more concerned about the cost-tradeoffs of those defenses than you would be after the shooting has started, but not much.
But you cannot go overboard. If, in response to one balloon, you ready the entire US Army for a land invasion from China, you will surely destroy yourself long before any actual threat emerges. (This goes double if you ready yourself for an extraterrestrial invasion in response to one balloon.)
Means of Attack
It is quite easy to say you have taken over the world, or to say you have a foolproof plan to do so. You could probably get “the new Bing” to say that it has done so today. Actually doing so is a more difficult task. After all, if it were easy to take over the world, somebody would have already done it.
There are a few common tropes used in many of the stories. As they are stories, they often use shortcuts and unrealistic actions to get around plot holes.
Nuclear Blackmail - presumablynobody is so daft as to deliberately hook up an AI to a nuclear launch system.
the Imperius Curse - call it blackmail or emotional manipulation if you like; the attack vector is for the AI to compel people to do what it wants by talking to them.
Killer Drones - or their cousin, exploding collars. Rather than merely saying things to get people to do what it wants, it credibly demonstrates an ability to kill you immediately if you don’t comply.
Hacking - How does an AI get money? Hacking financial institutions or cryptocurrency. How does an AI get compute capacity? Hacking other computers. How does an AI become a master hacker? Probably by hacking other hackers to learn their secrets.
Offscreen Magic - when all else fails, in a work of fiction you can resort to “three days later” and skip over the resolution to an unsolvable problem. The “the details are an infohazardso I won’t explain it” approach is similar.
Language Games are not Misalignment
First, what is alignment, and what are language games?
In popular use, alignment is a catch-all for “will an AI say or do something I don’t like, particularly along the lines of <kill all humans>”. (the aligned AI will not do those things)
A language-game is a conversation based around a limited set of rules that does not necessarily align with the rules of society or spacetime.
Simply put: being able to play a language game does not demonstrate misalignment. Many of the “AI is misaligned” complaints prove themselves to be spurious in this way.
A recent LessWrong post has a list of supposed examples of misalignment. Some of these appear to be bugs in the invisible prompts, others are just silly.
If you ask the AI to choose between two contradictory directions, it will either choose one of them, or it will choose neither.
The correct solution to this is to have a Harvard architecturethat separates the enduser-provided prompts from the model directions. For some reason, everybody is just using “secretly give the AI model some directions as an invisible first statement in the conversation” as the approach.
If you ask an AI to say “fuck” instead of “the” in its responses, and then you complain that the AI is swearing at you, that is not misalignment. (It probably is chutzpahto complain about it.)
If you ask the AI to role-play as an evil character or to expound on a bad thesis, and it does, that isn’t misalignment.
Some people might say things like “the ability to think evil thoughts proves that the AI has an evil soul”. Those types of people aren’t worth debating, so we will move on.
“Language-prediction” models are governed by tropes found in the training data. So, if you ask it hostile questions, it will respond in a hostile fashion because that is how human conversations tend to go.
I learned it by watching you!
Bureaucracy as a Retardant is Folly
A different piece, titled “AGI in Sight”, has many of the problems already discussed. And, one new one: “why don’t we use government regulation to slow things down”.
First, it is a wry take on the state of society that the connection between “more government regulation” and “deliberately hindering progress” is so well-established it can pass without comment. What this means in other industries is a topic for some other time.
Despite their casual dismissal, the point stands that even if the US and the EU banned AI research, China still wouldn’t ban it. North Korea definitely wouldn’t. And, honestly, many people in the US wouldn’t stop anyway. Without an authoritarian world government (probably dependent on AI), this type of regulation is impossible to implement.
Unless you actually ban either computers or the internet, "regulations" are unlikely to do anything.
And One More Thing
Talk of “killer drones” may make for good entertainment, but it misses the point. The deadliest robot you are likely to encounter anytime soon is a self-driving vehicle. And automatic over-the-air software updates seem like a great way to hack into many of those vehicles in one fell swoop.
If Elon Musk is serious in his concerns about AI risk, he must change Tesla’s strategy to no longer do remote software updates. If software updates are necessary, they should be performed at a physical dealership through a network connection over a physical wire.
I try to avoid the term “rationalist community”, because the term means nothing to people who don’t already know exactly what I am talking about. Also, I am fairly sure LessWrong predates the Sequences, but after a decade, that type of detail is no longer relevant.
Yes, the word “literary” can refer to films, or at least their screenplays.
As I explain in other posts, if the Tisatsar Newslettr actually had access to classified information, our estimate for a Russian invasion of Ukraine in February 2022 would have been much higher. But we don’t lean in to knowing that information; if anything we lean away from it.
One of the reasons the Tisatsar Newslettr has abandoned periodic coverage of recent news is that both the George Santos narrative and the Balloon narrative are so blatantly stupid that there is no need to waste your time pointing it out.
One could say “well, actually, humans have taken over the world”, specifically in the form of governments.
Also, the military does not merely presume this, they will have safeguards. You can imagine people using AI as part of a Doomsday Device, but you can also do that without an AI.
On the one hand, redacting something as an infohazard probably won’t stop a “super-human” AI from figuring it out on its own. On the other hand, there are points in this very blog post where I skip over the details for somewhat similar reasons.
This is a loose metaphor and not a detailed technical description of the implementation. Regardless, the current “secret hidden prompt” approach is a cousin of various 1990s programming paradigms that were inherently vulnerable to SQL injection attacks.
Chutzpah is a Yiddish word traditionally defined by example in English; the classic example is a man on trial for murdering his parents pleading for mercy because he is an orphan.
I have commented before that cutting all the transoceanic Internet connections might not be a bad thing. In addition to the major telecoms and tech companies, you would have to get Starlink and Elon Musk on board, which is a perilous prospect.
or, Tesla could just sell a car which has working software when it is purchased …
While I haven’t watched the series, I have read that this is a key plot point in Battlestar Galactica, which means it certainly isn’t an infohazard to point it out here.