4 Comments

I continue to enjoy and read all of your posts-- thanks for laying things out so clearly.

One thing I can't quite get out of my mind, having followed computer chess since the early 80's, is how the response to the programs as they were improving seems extremely similar to what we see with LLMs. At first, the programs made laughable moves and were seen as a novelty. Then, they played moves about as good as an average club player, and everyone quickly (and correctly!) pointed out that the average club player isn't doing much more than calculating combinations, and computers are understandably better at that.

Then, the programs started playing at the master level, and as such they would occasionally make moves that, if another master saw the move without knowing it was made by a computer, that master would say it was creative or clever. But that was quickly dismissed as basically luck, a function of it having to pick some move, and the move it picked that was supposedly creative was just picked because the computer was trying to maximize its advantage. And that, too, was also true.

As computers progressed to the grandmaster level, the commentary about their play started to change. The fact that computers started playing undeniably clever and creative moves on a regular basis was attributed to the fact that it could do millions of computations a second and to the fact that it had such a clear goals of material gain and checkmate. No question that that was still true! At the same time, a kind of cynicism about human grandmasters became popular, that all but the top 100 or so weren't really very creative, they were just reusing known patterns from previous games in different ways. And since computers were often hand coded to recognize many of these patterns, it wasn't surprising that computers, being faster in the obvious ways, did better.

Which brings us to what I see as the key analogy with today and LLMs. Because once computers got to strong grandmaster level, lots of chess people began saying that chess programs would probably beat all but the best players, but that the programs had not shown anything original. They weren't going to be capable of new ideas. The car could beat the human in a sprint, but nothing was being learned.

What happened, of course, is that as computers got faster, the programs, as they got stronger, just invented techniques that humans hadn't previously seen. They were still just trying to maximize their advantage, but came up with "new" ideas as a by-product of seeing deeper. Especially in the realm of speculative attacks and robust defense, humans learned there were new possibilities in positions they had previously ruled out. It is true they often could not copy the computer's precision, and thus couldn't always utilize this new knowledge, but they knew it was true, and it changed expert humans' approach, especially to the opening and endgame.

And once AlphaZero and other neural nets tackled chess, where they were not being programmed with human knowledge to build upon, but were learning and teaching themselves, they introduced other new ideas to experts that, this time, humans could emulate a bit more easily. In Go, even more so-- the human game has been revolutionized by the new ideas AlphaGo demonstrated.

So while I (think I ) understand your point that the real world-- writing a novel, solving an original programming problem, writing a non-generic analytic essay that has original insights-- doesn't have clear ways to evaluate quality and to learn and improve, and may thus be non-analogous to Chess/Go/etc., I do wonder how much a sheer increase in scale may be the big difference after all, a la "The Bitter Lesson".

Put another way, sometimes I feel like we are looking at GPT-4 much like we might look at a chimpanzee. Surprisingly smart, but limited in important ways. But how is our brain fundamentally different from theirs? Perhaps, to keep the analogy going, that our brains have one or two advances, like transformers with LLMs, that allowed ours to develop abstraction and language far beyond the chimps. Or did those advances just "appear" because of increased scale or development through more "training data"? I wish I knew the slightest thing about such topics.

OK, thanks again. I'm looking forward to reading your next entry on Memory!

Expand full comment
author

Great questions, as always. Here is my best take. Of course the answer to any of these questions is at least 50% "who the hell knows", so take with a grain of salt.

First of all, to be clear and as you know from things I've written previously, I do think AIs will eventually equal and then surpass us at pretty much everything, including "creativity". I just don't think we'll get there through sheer scale alone.

It's also worth noting that, while chess apps basically had one job, we're asking LLMs to perform a stupendous variety of tasks, from summarizing documents to composing dad jokes to, literally, playing chess (you've probably seen that GPT-4 can do that, though not very well). They'll progress at different rates on different tasks, and for many of these tasks, it's hard to quantitatively measure progress. All of that clouds the issue.

My knowledge of the history of chess algorithms is shallow, but I think Deep Blue (the first computer to defeat the human champion) was fundamentally pretty similar to earlier chess AIs, just running on bigger hardware? Whereas AlphaGo was *not* similar to earlier Go (or chess) AIs; it dispensed with hand-coded heuristics for evaluating board positions, replacing them with a neural net, as well as (IIRC) introducing a second neural net to decide which game tree branches were worth exploring. In one case, "just throw more hardware at the problem" worked, but in the second case it did not. So there is precedent for new algorithms being needed. Transformers are another example of a change in algorithm being (so far as we know) necessary before just throwing scale at a problem would yield results.

Concretely, I think that complex problem solving requires an exploratory, iterative approach, and that our current approach to training LLMs won't be able to accomplish that, even with greater scale. I can point to some specific reasons for believing that:

1. Why is an exploratory approach required? <waving hands> I believe there is a fundamental computational complexity to solving complicated problems. Perhaps GPT-7 will be able to independently prove Fermat's Last Theorem, or resolve the Riemann Hypothesis, or whatever... but I would be shocked if it didn't need to spend a healthy amount of time, certainly hours, plausibly months or years, to do so. That means it needs some way of organizing its activity during that extended period of time. Continuously appending tokens to the end of a single, unstructured chain of text, forming an undifferentiated linear thought process, doesn't seem like a very promising way to organize that activity.

2. Why won't our current approach to training LLMs suddenly, given larger scale, develop the ability to break out of the linear-thought-process model? Because the "token buffer" architecture fundamentally constrains them to that model, and because all of the data we use to train them reflects that model.

I guess that might leave me in the same position as past chess experts saying that superior chess requires creativity and the then-current approach to chess AI won't be able to develop any. Or it might leave me in the same position as a past Go expert saying that a new approach will be needed. We'll see!

Expand full comment

Hey Steve, a friend of mine sent me this link-- I think you and your subscribers might find it interesting/amusing: https://nicholas.carlini.com/writing/llm-forecast/question/Capital-of-Paris

Expand full comment
author

Thanks -- recommended! (I scored about .0001 above what you'd get by simply giving 50% odds to each question. I did somewhat better than 50% on the direction of my predictions, but almost cancelled that out by being overconfident.)

Expand full comment