16 C
New York
Friday, August 22, 2025

Hanoi Turned Upside Down – O’Reilly



Prompted partially by Apple’s paper concerning the limits of huge language fashions (“The Phantasm of Pondering: Understanding the Strengths and Limitations of Reasoning Fashions through the Lens of Drawback Complexity”), I spent a while enjoying with Tower of Hanoi. It’s an issue I solved some 50 years in the past once I was in faculty, and I haven’t felt the need or have to revisit it since. Now, after all, “We Can Haz AI,” and all meaning. After all, I didn’t need to write the code myself. I confess, I don’t like recursive options. However there was Qwen3-30B, a “reasoning mannequin” with 30-billion parameters that I can run on my laptop computer. I had little question that Qwen may generate a very good Tower program, however I believed it might be enjoyable to see what occurred.

First, I requested Qwen if it was aware of the Tower of Hanoi drawback. After all it was. After it defined the sport, I requested it to jot down a Python program to unravel it, with the variety of disks taken from the command line. Wonderful—the outcome appears to be like lots like this system I bear in mind writing in faculty (besides that was manner, manner earlier than Python—I feel I used a dialect of PL/1). I ran it, and it labored completely.

The output was a bit awkward (only a checklist of strikes), so I requested it to animate it on the terminal. The terminal animation wasn’t actually passable, so after a few tries, I requested it to attempt a graphical animation. I didn’t give it any extra info than that. It generated one other program, utilizing Python’s tkinter library. And once more, this labored completely. It generated a pleasant visualization—besides that once I watched the animation, I spotted that it had solved the issue the wrong way up! Giant disks had been on prime of smaller disks, not vice versa. I need to be clear—the answer was completely right; along with inverting the towers, it inverted the rule about shifting disks, in order that it was by no means placing a smaller disk on prime of a bigger one. In the event you stacked the disks in a pyramid (the “regular” manner) and made the identical strikes, you’d get the right outcome. Symmetry FTW.

So I informed Qwen that the answer was the wrong way up and requested it to repair it. It thought for a very long time and finally informed me that I have to be wanting on the visualization the mistaken manner. Maybe it thought I ought to stand on my head? Proving, if nothing else, that LLMs will be assholes too. Identical to 10x programmers. Possibly that’s an argument for AGI?

Critically, there’s some extent right here. It’s definitely essential to analysis the bounds of synthetic intelligence. It’s positively fascinating that reasoning LLMs tended to desert issues that required an excessive amount of reasoning and had been most profitable at issues that solely required a average reasoning price range. Fascinating, however is that stunning? Very laborious issues are very laborious issues for a purpose: They’re very laborious. And most people behave the identical manner: We quit (or search for the reply) when confronted with an issue too laborious for us to unravel.

However we should additionally take into consideration what we imply by “reasoning.” I had little question that Qwen may remedy Tower of Hanoi. In spite of everything, options have to be in a whole lot of GitHub repos, Stack Overflow questions, and on-line tutorials. Do I, as a person, care the least little bit if Qwen appears to be like up the answer in an exterior supply? No, I don’t, so long as the output is right. Do I feel because of this Qwen is just not “reasoning”? Ignoring all of the anthropomorphism that we’re caught with, no. If an affordable and reasoning human is requested to unravel a tough drawback, what will we do? We attempt to search for a course of for fixing the issue. We confirm that the method is right. And we use that course of in our resolution. If computer systems are related, we’ll use them, fairly than fixing on pencil and paper. Why ought to we count on something completely different from LLMs? If somebody informed me that I needed to remedy Tower of Hanoi with 15 disks (32,767 strikes), I’m certain I’d get misplaced someplace between the start and finish, though I do know the algorithm. However I wouldn’t even consider itemizing the strikes by hand; I’d write a program (just like the one Qwen generated) and have it dump out the strikes. Laziness is a advantage—that’s one thing Larry Wall (creator of Perl) taught us. That’s reasoning—it’s as a lot about on the lookout for the simple resolution as it’s doing the laborious work.

A weblog publish I learn not too long ago reported one thing related. Somebody requested openAI’s o3 to remedy a basic chess drawback by Paul Morphy (most likely the best chess participant of the nineteenth century). The AI realized that its makes an attempt to unravel the issue had been incorrect, so it regarded up the reply on-line, used that as its reply, and gave a very good clarification of why the reply was right. It is a completely cheap technique to remedy the issue. The LLM experiences no pleasure, no validation, in fixing a tough chess drawback; it doesn’t really feel a way of accomplishment. It’s simply supplying a solution. Whereas it’s not the sort of reasoning that AI researchers need to see, wanting up the reply on-line and explaining why the reply is right is nice demonstration of human-like reasoning. Possibly this isn’t “reasoning” from a researcher’s perspective, however it’s definitely problem-solving. It represents a series of thought through which the mannequin decides that it will probably’t remedy the issue by itself, so it appears to be like up the reply on-line. And once I’m utilizing AI, problem-solving is what I’m after.

I need to make it clear that I’m not a convert to the cult of AGI. I don’t think about myself a skeptic both; I’m a nonbeliever, and that’s completely different. We are able to’t discuss common intelligence meaningfully if we will’t outline what “intelligence” means. The hegemony of the technorati has us chasing after problem-solving metrics, as if “intelligence” might be represented by a quantity. It’s all Asimov till it’s worthwhile to run benchmarks—then it’s diminished to numbers. If we all know something about intelligence, we all know it’s not represented by a vector of benchmark outcomes testing the power to unravel laborious issues.

But when AI isn’t the embodiment of some sort of undefinable intelligence, it’s nonetheless the best engineering venture of the twenty first century. The flexibility to synthesize human language accurately is a serious achievement, as is the power to emulate human reasoning—and “emulation” is a good description of what it’s doing. AI’s detractors ignore—bizarrely, for my part—its super utility, as if citing examples the place AI generates incorrect or grossly inappropriate output signifies that it’s ineffective. That isn’t the case—however it does require considering fastidiously about AI’s limitations. Programming with AI help will definitely require extra consideration to debugging, testing, and software program design—all themes that we’ve been watching fastidiously over the previous few years, and that we’re speaking about in our AI Codecon conferences. Purposes like detecting fraud in welfare purposes could must be scrapped or placed on maintain, as the town of Amsterdam came upon, till we will construct AI programs which might be free from bias. Constructing bias-free programs is more likely to be a lot more durable than fixing tough issues in arithmetic. It’s an issue which may not be solvable—we people definitely haven’t solved it. Both worrying about or breathlessly anticipating AGI achieves little, apart from diverting consideration away from each helpful purposes of AI and actual harms brought on by AI.


Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles