Apr 14·edited Apr 14

> In any case, my point is that GPT-4 needs this verbose style to solve nontrivial problems.

We humans need it too! You can't multiply two 4 digit numbers in your head, without using a pen and paper, probably. And humans too use dirty hacks/external modules when we do stuff in our heads, how else would you describe the phonological loop?

I think that there's a common failure mode in identifying "fundamental" problems with LLMs: you notice that they don't have an internal monologue and so can't do multi-step reasoning or even long multiplication, they can't visualize a chessboard and so make silly mistakes, they don't have long-term episodic memory that they can check their reasoning against explicitly. In short, naturally they are only good at single-step intuitive problem solving.

But if such problems can be solved by giving the LLM a virtual pen and paper (or even just a specific prompt) then they are not so fundamental after all. GPT has solved the actually fundamental problem of growing something that can be taught to use such semi-external crutches, providing them and teaching it to use them is a matter of time.

> GPT-4 Can’t Plan Ahead

It can. Otherwise you'd immediately notice that when it says "the answer is an apple", it chooses between "a"/"an" at random and has to commit to the choice afterwards. That's not what happens at all. You can ask it more complicated questions, like explain a puzzle and then ask to give the answer in Esperanto, or give the answer but replace every word with a bleep!, and it's kinda obvious that it knows what the answer actually is by the time it has not even begun to generate the words "The answer is:" with whatever extra additions.

So the question is not can it plan ahead, it's how good it is at planning ahead, why it fails to plan ahead sometimes, can it be helped to plan ahead by asking it to plan ahead or by giving it more compute.

Expand full comment

Very interesting point on LLM limitations. Would be interesting to see if there are good implementations of architectures that go beyond predict the next word. But barring that, I can imagine workarounds that might be somewhat useful for automated agents and decrease it's error rate. Lots of people are already trying a bunch of things like giving it an internal dialogue and long term memory and coding and calculation abilities with APIs. With these plus some approach where instead of directly answering the question it asks itself, what are the necessary (and maybe sufficient) conditions to get the correct result? Is there a way for me to check my results through for example an API call to Wolfram Alpha? And maybe it could ask these questions for each step it generates. And then with each step, it only accepts a result that satisfy the conditions. Or better yet, maybe it can choose previous steps based on the conditions as well, working backwards.

Expand full comment

Pardon my ignorance - is there a private-message function in Substack? Had I found one, I'd have used it to ask for your permission to translate this article to Portuguese; may I? (Thanks for writing it.)

Expand full comment

Hi, great article! Any thoughts on this approach? https://arstechnica.com/information-technology/2023/04/surprising-things-happen-when-you-put-25-ai-agents-together-in-an-rpg-town/ The team seems to be doing some smart things regarding long term memory and planning for AI agents.

Expand full comment

What's most interesting to me is how this somewhat turns how we've used computer historically on its head. We've moved from: "Perform a single task and always be really quite precise/predictable" to "Perform a plethora of tasks and it's okay to be imprecise".

As an example no bank is clamoring to use an LLM to replace their financial software's mortgage interest algorithm. We've seen a slow march towards this so far with our voice assistants, etc. but it does seem like we've had a step into what's next. As such, and even without any further innovation, I'm very interested in the use cases that will emerge for how people begin to leverage this. Will be fun to see the successes and failures of the next few years as it gets integrated into more.

On a different note I've thought similarly to you for a while. Which has always made me wonder if LLMs can produce anything truly novel. My rough thinking being very similar to you: "if it can only really parrot what it has previously seen what are the limits of its creativity?" Seems like it would be good a quite a bit of creative tasks since a lot falls into the type of creativity that is taking existing item and placing them together in new ways, but I believe it would always lack on anything that is _truly_ unseen. It would never (I would assume) predict the next words to be something not in its dataset.

Expand full comment

GPT-x == more and more polished mirrors of a reality that is being you == zero intelligence whatsoever.

GPT for AI is like Google for search engines in their early days:

Just performing well by all the human intelligence / tricks / common sense / procedures it has read through quality contents (Google was relying on the quality of the web pages as assessed by human users, not any algorithmic quality assessment).

This is pure crap and certainly not true AI.

Expand full comment

Really great article, Steve. I reposted it on my facebook, of course giving you credit.

I'm still having difficulty with understanding how it is able to write a Shakespearean sonnet, or persuasive essay, as well as it does, on a random topic. If all it was doing was predicting word by word, it seems like it would never really know how it would end a sentence when it started one; it is like every sentence would be a "choose your own adventure", where it selected the next word from a list each time. And while I can accept that it is technically doing just that, in some sense isn't it picking word 3 of the sentence, say, because it is anticipating what other sentences like it will look like when they finish? In that sense, isn't it anticipating future words when predicting the current word?

I know that humans in many ways when speaking or writing just predict the next word, but we also have a sense of what we want the overall sentence to mean, and our brain takes care of what the individual words in the sentence will be. If we are asserting that LLMs don't have any sense of meaning-- which I accept-- then when they are predicting the next word, don't they have to at a minimum be anticipating what the rest of the sentence is likely to look like?

Otherwise, why wouldn't they, one time in 50, say, write themselves into a "corner", where the sentence they have written word by word has no meaningful end?

Expand full comment