I don’t think ML can’t be creative or make discoveries. I think creativity and discovery are, ultimately, simultaneously thinking about the right seemingly-disparate concepts (whereas algorithmic thinking is more obviously related concepts). If not an LLM, some other model can generate random ideas, rank them, then output the best.
But I think humans are better at it, while ML is better at algorithmic thinking. “Better” being more efficient and something we more enjoy doing; we can also more accurately judge what subjectively appeals to humans (i.e. taste).
I think ML should be optimized for tasks that require more generalization than programming, but are still mostly logic. Like software development, translation, and tools for art and discovery.
Humm maybe. But a plain model sampling outputs obviously isn't doing discovery in the AlphaGo sense. But once you put the model in a loop with tests, feedback, tools or even a human picking the good result, it starts to get much closer to the process he's describing.
There seems to be a problem with how he poses the problems alphaGo and these GAI models face.
AlphaGO is given a hard evaluation externally. It did not itself come up with it.
When GAI models are given an external hard evaluation, they can also succeed in many different domains (that is one of the remarkable features, succeeding in many domains) ranging from simple programming tasks to frontier mathematics (disproving conjectures recently) to writing more optimized kernel code than before.
And there is plenty of RL especially in these fields where the solution may be extremely complex but eval is rather less complex. And even the discovery and the "evolution-like" trace-selection is also happening.
For this reason it seems strange to compare it to AlphaGO as alphago is given a hard eval independent of itself, from an external source (humans) in a narrow domain. If GAI is given such, it can also show some remarkable results.
But what I find more strange is that innovation and moving forward in many many many cases does not require truly novel ideas but instead a high-quality execution of layering different methods, tactics, ideas on top of each other. Because in many domains our collective knowledge is incredibly sparse and complex, something being able to recombine tools, models, ideas in a high quality way (as he mentions being selective) I think is extraordinarily powerful.
And in such cases, with a finite exploration horizon (time, resource available) with 1% "good choices" vs 3% "good choices" are worlds apart, incomparable.
Most importantly: none of the above is about intelligence, it's barren solution-farming to important, valuable problems we have. Most of the AGI and intelligence-related debate seems to miss out on this simple fact. (Insert the usual stuff like a plane being unable to fly like a bird or a submarine not swimming is totally irrelevant to it being useful).
And then a final point: do we really think this thing is incapable of doing better on average on problems we average people face in our lifetime?
What should we think, how should we define human intelligence when we give out degrees in science or medicine for 60-70% exam results on problems considered to be generic in the field?
Unless I'm missing something, this argument seems to apply only to the original pretraining era (eg GPT 1-4). The post-training and reinforcement learning paradigms are clearly doing variation, evaluation and selective retention no?
The transcript does seem to overlook post-training steps like Reinforcement Learning with Verifiable Rewards (RLVR) (but I'll certainly won't claim that Rich Sutton is unaware of such things; RLVR has a very narrow set of evaluation approaches).
I wonder if this is a precursor to Keen Tech leaning into David Silver's Ineffable Intelligence approach.
I don't quite follow his point. Is it: a) that we need a new foundational algorithm that integrates a goal (one with "taste") directly into the training step, or b) that we need to point trained models towards goals as they iterate?
If it's a), he doesn't propose such an algorithm, and I don't know how you'd do it at such a low level because how do you quantify abstract goals? Did he suggest such an algorithm and I misread? If it's b), that already exists, see AlphaEvolve or any number of things he said. Or, to be a bit of a smart-ass, just type /goal and let it rip ...
I also think he's just categorically wrong that LLMs cannot do good and novel things. And if it can, then you could just say "well that's not novel, that's derivative". A simple example, if I make up a programming language with an LLM and it works well for my purposes, then is that not novel and good? I mean, is any language other than FORTRAN not novel?
Everything is derivative and you can put an LLM in a loop to evaluate LLMs trying things. I must be misunderstanding because he's too smart to be this wrong.
LLMs possess the map but are unable to discern fertile from barren ground. For instance: how does Anthropic's new model generate promising 'medications'? Because, beyond the knowledge embedded within the model, it has assimilated AlphaFold's reasoning paradigm. By itself, Claude would be incapable of engineering a protein analysis method
"So that is my call to arms. If we want the full power of AI scientists, then we should share the goals with them so they can create, evaluate, discover, and in these ways fully participate in achieving the goals. Let’s be bold! Let’s fully automate Creativity and Discovery!"
Should we automate exercise and play as well? How about learning?
The machine didn't have a soul, so we donated ours.
I think its worth emphasizing that his argument isn't completely against generative ai, but rather its environment. Although I don't see why it would be impossible for something like an LLM to learn some sort of self-play within its context window
I think the variation, evaluation, and selection idea is a good, if not the only, way do do creative work.
I don't think I would attribute anything in that process that I would consider an AI to be incapable of.
The characterisation of variation like this would seem to rest on the same 'random but directed' crutch that some free will arguments rest upon.
There is no random but directed of course, there is random and there is caused, and there are things that use both as components, but the random remains wholly random, and the caused remains entirely deterministic.
I think there is a good case to say that, in many fields, AI is better than humans at evaluation.
To find avenues to consider, I'm not entirely convinced that human innovation is more than a heuristic that appears more chaotic by virtue of a inconsistent and opaque formulation.
Many aspects of ideas com from noting how some two things are different and then considering that axis of difference when applied to another thing.
The possibilities thrown up by this extremely simple method are vast enough to require multiple layers of evaluation, most could be dismissed out of hand by a quick 'This is nonsense' check that I suspect people do so often and at a rate that it wouldn't even rise to the level of consciousness.
Yes, the guy with a PhD in Machine Intelligence, co-author of Reinforcement Learning: An Introduction, which is universally considered the bible of the field, recipient of the AAAI fellowship award and the Turing Award, and the inventor of Temporal Difference Learning doesn't know what he's talking about.
Sure, but does that mean he's right all the time about all things, including everything in his own field?
He is saying no generative AI is going to produce output that is both good and novel because it is always derivative. And then adds a generative AI (Claude Code) into his list of AI that have produced output that he feels is good and novel, invalidating what he is arguing.
"...no matter how many instances of white swans we may have observed, this does not justify the conclusion that all swans are white."
“When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”
I don't completely disagree but its worth noting how new a lot of the empirical evidence in favour of LLMs are, so its not impossible to be a tad ignorant of the present
"If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong." Arthur C. Clark
But I think humans are better at it, while ML is better at algorithmic thinking. “Better” being more efficient and something we more enjoy doing; we can also more accurately judge what subjectively appeals to humans (i.e. taste).
I think ML should be optimized for tasks that require more generalization than programming, but are still mostly logic. Like software development, translation, and tools for art and discovery.
AlphaGO is given a hard evaluation externally. It did not itself come up with it.
When GAI models are given an external hard evaluation, they can also succeed in many different domains (that is one of the remarkable features, succeeding in many domains) ranging from simple programming tasks to frontier mathematics (disproving conjectures recently) to writing more optimized kernel code than before.
And there is plenty of RL especially in these fields where the solution may be extremely complex but eval is rather less complex. And even the discovery and the "evolution-like" trace-selection is also happening.
For this reason it seems strange to compare it to AlphaGO as alphago is given a hard eval independent of itself, from an external source (humans) in a narrow domain. If GAI is given such, it can also show some remarkable results.
But what I find more strange is that innovation and moving forward in many many many cases does not require truly novel ideas but instead a high-quality execution of layering different methods, tactics, ideas on top of each other. Because in many domains our collective knowledge is incredibly sparse and complex, something being able to recombine tools, models, ideas in a high quality way (as he mentions being selective) I think is extraordinarily powerful. And in such cases, with a finite exploration horizon (time, resource available) with 1% "good choices" vs 3% "good choices" are worlds apart, incomparable.
Most importantly: none of the above is about intelligence, it's barren solution-farming to important, valuable problems we have. Most of the AGI and intelligence-related debate seems to miss out on this simple fact. (Insert the usual stuff like a plane being unable to fly like a bird or a submarine not swimming is totally irrelevant to it being useful).
And then a final point: do we really think this thing is incapable of doing better on average on problems we average people face in our lifetime? What should we think, how should we define human intelligence when we give out degrees in science or medicine for 60-70% exam results on problems considered to be generic in the field?
I wonder if this is a precursor to Keen Tech leaning into David Silver's Ineffable Intelligence approach.
If it's a), he doesn't propose such an algorithm, and I don't know how you'd do it at such a low level because how do you quantify abstract goals? Did he suggest such an algorithm and I misread? If it's b), that already exists, see AlphaEvolve or any number of things he said. Or, to be a bit of a smart-ass, just type /goal and let it rip ...
I also think he's just categorically wrong that LLMs cannot do good and novel things. And if it can, then you could just say "well that's not novel, that's derivative". A simple example, if I make up a programming language with an LLM and it works well for my purposes, then is that not novel and good? I mean, is any language other than FORTRAN not novel?
Everything is derivative and you can put an LLM in a loop to evaluate LLMs trying things. I must be misunderstanding because he's too smart to be this wrong.
AlphaGo uses discovery when it evaluates potential moves and iterates.
Claude Code uses discovery when it generates a script and the evaluates whether it works or not.
He’s saying we need to allow ai systems to do the evaluation and iteration themselves for science and engineering the same way we do for code.
Basically, harness engineering for engineering.
https://youtu.be/ThFq87Rp21s?si=SrKj72_X8bjnB6ED
Around 35min mark
Should we automate exercise and play as well? How about learning?
The machine didn't have a soul, so we donated ours.
Eureka! My AI found it!
Best thing about nerds is watching them try and build frameworks and formulas for the creative act. Like a metronome trying to compose a symphony.
I don't think I would attribute anything in that process that I would consider an AI to be incapable of.
The characterisation of variation like this would seem to rest on the same 'random but directed' crutch that some free will arguments rest upon.
There is no random but directed of course, there is random and there is caused, and there are things that use both as components, but the random remains wholly random, and the caused remains entirely deterministic.
I think there is a good case to say that, in many fields, AI is better than humans at evaluation.
To find avenues to consider, I'm not entirely convinced that human innovation is more than a heuristic that appears more chaotic by virtue of a inconsistent and opaque formulation.
Many aspects of ideas com from noting how some two things are different and then considering that axis of difference when applied to another thing.
The possibilities thrown up by this extremely simple method are vast enough to require multiple layers of evaluation, most could be dismissed out of hand by a quick 'This is nonsense' check that I suspect people do so often and at a rate that it wouldn't even rise to the level of consciousness.
That contradiction kind of says he doesn't know what he's talking about.
He is saying no generative AI is going to produce output that is both good and novel because it is always derivative. And then adds a generative AI (Claude Code) into his list of AI that have produced output that he feels is good and novel, invalidating what he is arguing.
"...no matter how many instances of white swans we may have observed, this does not justify the conclusion that all swans are white."
https://en.wikipedia.org/wiki/Clarke%27s_three_laws