Exploring the gaps between current LLMs and true general intelligence
Great article Steve. 40 years as a software engineer is super impressive and beats me by around 5 years. I have lived the experiences in your essay many times over (and continue to do so).
I use GPT (4) daily as a coding, writing and research assistant (and yes, it's impressive) but it has a long way to go before I'd promote it to Senior Software Engineer.
I think your article demonstrates well the hoops software engineers jump through on a daily basis and that the devil is always in the detail!
Also quite interesting article. I really liked all the specific examples at Writely, not too technical but specific enough so I could follow it all.
I'm eager to hear why you in a future post talk more about the issue that no-one thought that LLMs would be able to do "X" three years ago because it would require a level of understanding/reasoning/real world experience/sophistication/creativity/etc. that would supposedly be impossible for LLMs, and then three years later it turns out that they can do "X"-- and everyone now explains, with 20/20 hindsight, why in fact it didn't require anything special do to "X" after all. So could many be making the same mistake now, for example looking at how LLMs currently can't really reason about novel problems and assuming it is an intrinsic limitation, when in three years time that might be fixed just due to scale?
I'm looking forward to your future posts.
> One moment, it’s giving what appear to be thoughtful answers; then you point out a mistake, it responds by making the exact same mistake again...Last time, we saw GPT-4 blatantly hallucinating prime numbers. Today, in the “diff” coding task, it repeatedly failed to recognize that it was violating the constraint regarding examining a document one character at a time. I don’t know how well I can articulate the thing that is missing here, but it’s some very deep piece of intellectual mojo.
Given that GPT-4 is so good at fixing its own mistakes when they are pointed out, and that it is so good at memorizing things like lists of prime numbers, one should indeed be suspicious, and I would point out that both of these sound exactly like BPEs (https://gwern.net/gpt-3#bpes) and sparsity errors (https://old.reddit.com/r/slatestarcodex/comments/1201v68/10word_quote_a_short_and_simple_failure_mode_of/jdjsx43/), neither of which are deep or at all interesting. (You asked GPT-4 to *concatenate* numbers? Or to edit words *letter by letter*? Seriously? At least include a disclaimer about BPEs!)