Definitely a relevant question. I mentioned Sakana in a post last month (https://amistrongeryet.substack.com/p/ai-creativity). The upshot is that it appears to be a red herring, the papers it generated were essentially useless and there's no particular reason to think that it represents interesting progress.
Your overall point is a fair one, but the specific examples don't do it as much good as I'd like for it to be fully compelling. Most of the things you cite as unattainable by current AI - managing a school classroom, troubleshooting an underperforming team, reminiscing with an old friend - are not really limitations of AI or current models but other challenges that we could likely solve functionally with other technology, rules construction, granting of access to existing data, granting of authority, etc.
In a school context you put a robot or terminal in there and introduce it, likely a one-time necessary transition (you would probably want to introduce a new teacher too): "Class, you need to obey the robot on this screen in the classroom. We've installed cameras and microphones in hidden locations throughout the room, it knows what you're doing at all times and will instruct and discipline you." You might argue that the robot can't *do* anything to discipline, but a real teacher's abilities there are highly limited without corporal punishment too. And in fact an AI that the students know is recording the entire classroom might well be *more* well-obeyed than an average teacher. Regardless, little to none of the challenge in this scenario has to do with AI "intelligence" or advancement. We have highly sophisticated image recognition and analysis (see self-driving cars like Waymo), we can train it on a library of videos of kids behaving and misbehaving, it can recognize kids by name with current tech, it can communicate instructions, etc, etc. It might not go great, but it wouldn't fail on a fundamental level, and direct tuning of the AI for that scenario based on real-world performance would likely yield very good performance within a generation or two.
Troubleshooting an underperforming team? Give the AI access to the email server and Slack history, identify the accounts/individuals in a given team (or let the AI analyze the provided info and figure it out, it probably could!), then have in its prompt to "determine issues of communications, performance, etc. that are impeding progress toward goals and overall success of the team", and it'd probably do just fine. I would in fact argue that due to 1: deep access, and 2: ability to operate at scale well beyond humans, 3: lack of emotional bias (it has other biases, but not "emotional" ones) it might actually do *better* than an average human manager. Human intuition for a *skilled* manager might do better, but it's at least worth challenging it. 😁
Reminiscing with an old friend? Obviously not a technical limitation but one of history and information. But hey, let's see what we can do. We have to assume that the AI needs to "know" the relationship in order to do this, so access to any historical information is essentially a given (this is a necessary condition for the prompt, so we can fairly ignore the cultural/social discomforts associated). With that in mind give the AI your complete sms/imessage/email/chat history with this person, along with any photos (AI can extract location and date for e.g. travel together, context from visual analysis, do facial recognition for determining other people commonly with them, etc.). I bet it could do a decent job, especially with the latest voice synthesis tech.
I'm not saying these things are trivial or would be done any time soon or ever (e.g. being comfortable giving an AI full-time audio access to your conversations with friends and your past email and texts with them, etc.), but again those are not really technical limitations as much as cultural/social, etc. All of these things and your other examples as well are amenable either to non-AI-advancement solutions, or seem pretty likely to be solved with simply the will and desire to do so, and the cultural willingness to do what's necessary to accomplish that (e.g. give an AI control of a classroom of kids). Once you've decided you are even *willing* to solve a problem with AI, you can then optimize for that, and current tech seems largely able to handle most of this to me. So I think your point is probably correct, but your case for it feels like it needs some more development, particularly (for me) around compelling scenarios where AI intelligence is the limiting factor.
Yes, you are of course correct that challenges with things like access to data make it difficult or impossible to even attempt any of these things with AI.
That said, if those issues were removed, I am extremely skeptical that current AIs would "do just fine". I think they would get confused, go down ratholes, misunderstand context, and generally fail in either boring or spectacular ways. You say "I bet it could do a decent job", I say I doubt it. Of course we won't really know until someone runs the experiment. But these tasks are very, very different from next-token completion on written text.
A fair point. I'll just note that next-token has proven to be shockingly capable on a variety of tasks that seem at least similar to some that you outlined, like "talking with an old friend". That in fact seems *ideally* suited to next-token provided you have enough good training data. The same could arguably be true of "troubleshoot an underperforming team", if you have good training material for both poorly performing and well performing teams, although it's certainly a lot more complicated (involving human psychology, interpersonal dynamics, etc.). I'm also a proponent of the idea that humans may be a lot more "next-token" than we'd like to admit, so maybe I'm overreaching. 😄
One part of my intuition is that we probably *don't* have the right training data for a lot of these tasks.
Another is the long-term memory component: many real-world tasks involve an enormous amount of context: your past history with a friend / collection of students / co-workers; on-the-job experience unique to a particular company+team+role; etc. I don't think token buffers will be a good way of managing all that. RAG is clumsy at best.
Next-token prediction has indeed been shockingly good at many things, but that doesn't mean it will be good at everything; see my analogy to Olympic sports.
As usual, a great post! I'm curious what you think about https://sakana.ai/ai-scientist/
Definitely a relevant question. I mentioned Sakana in a post last month (https://amistrongeryet.substack.com/p/ai-creativity). The upshot is that it appears to be a red herring, the papers it generated were essentially useless and there's no particular reason to think that it represents interesting progress.
As you might have seen, in the last few days there's been a lot of buzz about a different paper, describing significant productivity gains from the use of AI tools for discovering new materials: https://aidantr.github.io/files/AI_innovation.pdf. The buzz has been mostly very positive (e.g. https://twitter.com/ArnaudDyevre/status/1856074595025203485), but I haven't yet looked at it myself.
Your overall point is a fair one, but the specific examples don't do it as much good as I'd like for it to be fully compelling. Most of the things you cite as unattainable by current AI - managing a school classroom, troubleshooting an underperforming team, reminiscing with an old friend - are not really limitations of AI or current models but other challenges that we could likely solve functionally with other technology, rules construction, granting of access to existing data, granting of authority, etc.
In a school context you put a robot or terminal in there and introduce it, likely a one-time necessary transition (you would probably want to introduce a new teacher too): "Class, you need to obey the robot on this screen in the classroom. We've installed cameras and microphones in hidden locations throughout the room, it knows what you're doing at all times and will instruct and discipline you." You might argue that the robot can't *do* anything to discipline, but a real teacher's abilities there are highly limited without corporal punishment too. And in fact an AI that the students know is recording the entire classroom might well be *more* well-obeyed than an average teacher. Regardless, little to none of the challenge in this scenario has to do with AI "intelligence" or advancement. We have highly sophisticated image recognition and analysis (see self-driving cars like Waymo), we can train it on a library of videos of kids behaving and misbehaving, it can recognize kids by name with current tech, it can communicate instructions, etc, etc. It might not go great, but it wouldn't fail on a fundamental level, and direct tuning of the AI for that scenario based on real-world performance would likely yield very good performance within a generation or two.
Troubleshooting an underperforming team? Give the AI access to the email server and Slack history, identify the accounts/individuals in a given team (or let the AI analyze the provided info and figure it out, it probably could!), then have in its prompt to "determine issues of communications, performance, etc. that are impeding progress toward goals and overall success of the team", and it'd probably do just fine. I would in fact argue that due to 1: deep access, and 2: ability to operate at scale well beyond humans, 3: lack of emotional bias (it has other biases, but not "emotional" ones) it might actually do *better* than an average human manager. Human intuition for a *skilled* manager might do better, but it's at least worth challenging it. 😁
Reminiscing with an old friend? Obviously not a technical limitation but one of history and information. But hey, let's see what we can do. We have to assume that the AI needs to "know" the relationship in order to do this, so access to any historical information is essentially a given (this is a necessary condition for the prompt, so we can fairly ignore the cultural/social discomforts associated). With that in mind give the AI your complete sms/imessage/email/chat history with this person, along with any photos (AI can extract location and date for e.g. travel together, context from visual analysis, do facial recognition for determining other people commonly with them, etc.). I bet it could do a decent job, especially with the latest voice synthesis tech.
I'm not saying these things are trivial or would be done any time soon or ever (e.g. being comfortable giving an AI full-time audio access to your conversations with friends and your past email and texts with them, etc.), but again those are not really technical limitations as much as cultural/social, etc. All of these things and your other examples as well are amenable either to non-AI-advancement solutions, or seem pretty likely to be solved with simply the will and desire to do so, and the cultural willingness to do what's necessary to accomplish that (e.g. give an AI control of a classroom of kids). Once you've decided you are even *willing* to solve a problem with AI, you can then optimize for that, and current tech seems largely able to handle most of this to me. So I think your point is probably correct, but your case for it feels like it needs some more development, particularly (for me) around compelling scenarios where AI intelligence is the limiting factor.
Yes, you are of course correct that challenges with things like access to data make it difficult or impossible to even attempt any of these things with AI.
That said, if those issues were removed, I am extremely skeptical that current AIs would "do just fine". I think they would get confused, go down ratholes, misunderstand context, and generally fail in either boring or spectacular ways. You say "I bet it could do a decent job", I say I doubt it. Of course we won't really know until someone runs the experiment. But these tasks are very, very different from next-token completion on written text.
A fair point. I'll just note that next-token has proven to be shockingly capable on a variety of tasks that seem at least similar to some that you outlined, like "talking with an old friend". That in fact seems *ideally* suited to next-token provided you have enough good training data. The same could arguably be true of "troubleshoot an underperforming team", if you have good training material for both poorly performing and well performing teams, although it's certainly a lot more complicated (involving human psychology, interpersonal dynamics, etc.). I'm also a proponent of the idea that humans may be a lot more "next-token" than we'd like to admit, so maybe I'm overreaching. 😄
One part of my intuition is that we probably *don't* have the right training data for a lot of these tasks.
Another is the long-term memory component: many real-world tasks involve an enormous amount of context: your past history with a friend / collection of students / co-workers; on-the-job experience unique to a particular company+team+role; etc. I don't think token buffers will be a good way of managing all that. RAG is clumsy at best.
Next-token prediction has indeed been shockingly good at many things, but that doesn't mean it will be good at everything; see my analogy to Olympic sports.