The Future Is Already Here, It’s Just Not Evenly Distributed
You Can Awe Some of the People Some of the Time
The title of this post is a quote from science fiction writer William Gibson, who appears to have first said something along these lines around 1990.
OpenAI just launched “o1-pro”, the latest upgrade to ChatGPT. How good is it? Apparently, good enough to almost make a cancer researcher cry. Derya Unutmaz, a professor at Jackson Laboratory, asked it for ideas to advance his work in cancer immunotherapy. His annotated writeup of o1-pro’s output contains phrases like “insanely good”, “left me in awe”, “a remarkably advanced and nuanced understanding”, “left me speechless”, “wow, this is an idea I will implement for sure!”, and the capper:
To me this is absolutely shocking level of insight, made me emotional reading it 🥺
William H. Guss, a research scientist at OpenAI, writes that “o1 pro just solved an incredibly complicated/painful rewrite of a file that no other model has ever gotten close to”, summing up with:
Dean ball concurs, and also weighs in on another recent release, Google’s "Deep Research” tool:
And yet, so far as I know, AI has had no significant impact on the work of any of my close friends1 or family. As for me, even though I’m in the field, the only time I’ve used ChatGPT in the last week was to clear up my confusion over the meaning of the word “scrapple”. Globally, about 300 million people now use ChatGPT at least once per week, which sounds impressive until you realize it means that in a typical week, 96% of the world is not touching it even once2. (And many of the remaining 4% are, like me, just asking about scrapple or something.)
If AI is becoming so incredibly powerful, why isn’t it having more impact?
AI Capabilities are “Jagged”
The current wave of tools are astonishingly good at some things, and so-so or worse at others. If you have a math or science question whose solution requires an extended chain of straightforward reasoning, o1-pro may be a miracle tool. If you have a research task that doesn’t require insight, but does call for plowing through hundreds of sources, Deep Research may be transformative for you. And of course it’s now common knowledge that regular old ChatGPT is lightning-quick at answering an enormous range of factual questions (if you don’t mind the occasional wrong answer).
But as I recently discussed, there are plenty of tasks which AI is still unsuited for. These latest announcements don’t do much to change that. Zvi Mowshowitz reports that “Mostly my comments section was unimpressed with o1 and o1 pro in practice.” Timothy Lee, in agreement with OpenAI’s own assessment, points out that o1-pro’s capabilities are limited to specific domains:
o1 dramatically improves performance in certain domains, like math and coding, where answers can be checked automatically. But in a lot of other domains, o1 represents an incremental improvement at best.
If your work doesn’t call for the kinds of things AI is good at, it won’t have much to offer you. Even if you have an appropriate task, the details of how that task is presented may make a big difference. Timothy Lee again:
I also found that o1’s performance is sensitive to the way a problem is represented. For example, sometimes o1 is able to solve a problem when it’s described using words, but fails to solve the same problem if it’s presented as a diagram or photograph. This kind of brittleness could be a significant obstacle as people try to use these models to solve complex, real-world tasks.
The highly uneven nature of AI capabilities are an obvious explanation for its scattershot impact. But there are other factors at play as well.
Threshold Effects
Dean Ball recently wrote about threshold effects: new technologies don’t take over the world when they first appear; adoption only takes off once some hard-to-anticipate threshold of usefulness is crossed. Cell phones were a clunky and expensive niche product, and then they were everywhere. Self-driving cars were a research curiosity for decades, and now suddenly Google’s Waymo service is doubling every three months.
AI will start to be widely adopted for a given task only when it crosses the threshold of usefulness for that task. This can happen fairly suddenly; the final step from “not good enough” to “good enough” isn’t necessarily very large. And it will happen at different times for different tasks in different contexts.
Often, getting value from AI requires some creativity on the part of the user. This leads to another source of uneven deployment.
We’re Holding It Wrong
I mentioned that Derya Unutmaz reported astonishing results when using o1-pro to develop new ideas for cancer research. Here’s the prompt he used (emphasis added):
I’d like you to focus on 3D bioprinted solid tumors as a model to address the T cell exhaustion problem. Specifically, the model should incorporate stroma, as seen in breast cancer, to replicate the tumor microenvironment and explore potential solutions. These solutions could involve technologies like T cell reprogramming, synthetic biology circuits, cytokines, transcription factors related to exhaustion, or metabolic programming. Draw inspiration from other fields, such as Battle Royale games or the immune system’s ability to clear infected cells without triggering autoimmunity. Identify potential pitfalls in developing these therapies and propose alternative approaches. Think outside the box and outline iterative goals that could evolve into full-scale projects. Focus exclusively on in vitro human systems and models.
A lot of work clearly went into this prompt. It’s worth reading his entire post, in which he explains the prompt in detail. For instance:
You might wonder why I referenced Battle Royale games. That’s precisely the point—I wanted to push the model to think beyond conventional approaches and draw from completely different systems for inspiration.
He clearly has invested considerable effort in learning to elicit creative work from AIs – he mentions “building on work I’ve previously done and tested with o1-Preview and GPT-4o”. Asking a model trained for scientific reasoning to develop cancer immunotherapy research ideas by drawing on a specific style of video game is inspired. It’s easy to imagine another researcher just asking “give me 10 ideas for novel cancer therapy research”, getting a bland answer, and dismissing the whole thing as useless. For the moment at least, getting amazing results from AI often requires a certain special form of creativity, as well as the willingness to spend time developing a feel for the tool and playing with different prompts.
Nabeel Qureshi, observing the reaction to o1-pro, also concludes that the impact you feel from AI is heavily dependent on your skill at using it:
o1-pro is the first time its felt plausible that *finding the right prompt* is our main bottleneck to genuinely novel scientific discoveries
This shows up across a wide variety of uses. Jeffrey Ladish notes that Palisade Research recently demonstrated a major step forward in the ability of AI to carry out cyberattacks simply by writing better prompts3:
The combination of "using simple prompting techniques" and "surpasses prior work by a large margin" is the most interesting part of this imo. Basically there is tons of low hanging fruit in capabilities elicitation.
So: AI has uneven capabilities, which only sometimes rise above the threshold of usefulness, and often require the user to have a knack for the tool. There is at least one further reason that serious adoption of AI is still limited.
Most People Are Barely Trying
It may still be the case that most Internet users have never even tried using a basic AI chatbot like ChatGPT, let alone more advanced tools like Deep Research. Certainly most people have not made a serious effort in learning how to use these tools to their best advantage. I include myself in the latter category; it’s hard to learn new habits, and I find the scattershot nature of current AI capabilities and prompting techniques to be frustrating. I gravitate toward highly predictable tools that reward careful reasoning and planning; this describes conventional computer programming, but does not in the least describe modern AI. If you’re uncomfortable with change, anxious about technology, like things that are predictable, or simply don’t have much need for the things that current AIs can do, then you may not find yourself adopting these new tools.
One result is that there are, to borrow a metaphor from economics, hundred-dollar bills strewn all over the pavement waiting for someone to bend down and pick them up. One plausible example popped up in my Twitter feed this morning. You might have seen the recent news reports that black plastic kitchen utensils are contaminated with dangerous levels of flame-retardant chemicals. These reports turn out to originate in an academic paper that contains a simple math error. At a key step in the analysis, the authors multiplied 60 by 7000 and got 42,000 – but the correct figure is 420,000. Why do I mention this? Because Ethan Mollick asked OpenAI’s o1 model to “carefully check the math in this paper”, and it caught the mistake. I imagine this took him about a minute, including the 23 seconds the AI spent chewing on the question. He goes on to ask, “should AI checks be standard in science”? Probably they should! But I doubt most researchers are doing this yet.
NOTE: I’m inclined to try an experiment here – pick 1000 published papers at random, ask o1 or o1-pro to look for errors, and see what it turns up. If you’re interested in helping out in some fashion, drop me a line at amistrongeryet@substack.com. I imagine the most difficult part of the project might be finding people from various fields to double-check any errors that the model claims to find.
This Is Not Going to Settle Down
AI is transforming some people’s lives today, even as most people continue to make little or no use. That disparity is a function of jagged AI capabilities, scattershot interest in trying new tools, and varying ability to use the tools well.
(Incidentally, I suspect this helps explain why the people who work at companies like OpenAI seem to be especially optimistic about the pace of progress. They spend their days doing things that current AIs are good at, such as coding. They’re obviously going to be exposed to the latest tools. And they’re motivated to learn how to get the most value from those tools. Not to mention that if you’re optimistic about AI progress, you’re probably more likely to find yourself working at an AI lab.)
None of this is likely to change anytime soon. New tools like o1-pro and Deep Research will continue to appear faster than most people can keep up, so some folks will always be ahead of the curve while others fall behind. As the AIs themselves become more sophisticated, expertise in things like prompting techniques may cease to be a factor, but a knack for finding clever applications will continue to be important. Regulatory restrictions, corporate inertia, and other factors will mean that AI will show up sooner in some places than others. And just as Homo Sapiens is better adapted to some tasks, AIs will always have uneven capabilities.
If AI eventually hits a wall, things might eventually settle down. But it’s not clear this will ever happen, and if it does, it’ll be a long way off4. As people are constantly pointing out, even if development were halted today, we’d spend a decade just learning how to extract value from the models that have already been developed.
Two New Rules To Live By
All of this leads me to two principles for navigating the AI era:
Don’t read too much into any anecdote, no matter how startling. If someone announces that AI has solved climate change, revealed the meaning of life, or definitively settled the question of whether a hot dog is a sandwich5, don’t assume that it won’t continue to fall short at other tasks. Conversely, just because someone reports failing to get AI to solve a problem, don’t assume it can’t solve that problem.
A primary source of opportunity will lie in spotting AI strengths, and ways of applying those strengths to important problems. If you can get AI to generate insightful ideas for investigation, or use it to plow through three days of research legwork in a few minutes, or figure out which code rewrites it can be trusted to handle on its own, you’ll have a big leg up on folks who are still chuckling about ChatGPT’s inability to count the number of “r”s in “strawberry”.
I’ll end with this tweet from Ethan Mollick, illustrating that even the AI companies are struggling to wrap their heads around the scattershot impact of AI:
So many weird contradictions in the pitches of AI companies right now:
“Corporations use our AI [to] summarize conference calls leading to 6% savings. Also, within two years we think AI will replace all organizations”
“Here is a tool to accelerate science. It also talks like Santa”
Except for a few who actually work in AI.
Of course ChatGPT is not the only offering, but it appears to be the most widely used AI productivity tool by a considerable margin.
Thanks to Zvi Mowshowitz’s weekly report for bringing this to my attention.
You may have seen the reports that simple scaling of training data and model size might be reaching diminishing returns. But as many have pointed out, this does not imply an end to AI progress; developers are finding other paths forward, as evidenced by o1.
Credit to Claude, I asked it to brainstorm for me and it suggested the hot dog thing. Other selections from the list it generated:
Definitively answer whether Ross and Rachel were really "on a break".
Explain why one sock always disappears in the laundry.
Determine once and for all whether pineapple belongs on pizza.
Of course the fact that Claude considers that last one to be an open question shows the ongoing weakness of AIs.