19 May 2025

Crossing the Threshold: Why AI that Explores is More Than Just Chat

Generative AI was the preview. AI that explores may be the main event. Here’s why the next breakthroughs - and risks - won’t come from chatbots, but from AI that can think beyond what we know.

Words by

David Sully

Crossing the Threshold: Why Exploratory AI is More Than Just Chat

Generative AI was the preview. AI that explores may be the main event.

Here’s why the next breakthroughs - and risks - won’t come from chatbots, but from AI that can think beyond what we know.

At Advai, we’ve always positioned ourselves as a rational counterweight in an AI landscape too often dominated by hype. That likely frustrates some. But in cutting through the noise, we can more clearly spot where the true breakthroughs lie - and where the risks are hiding in plain sight.

Despite that balance, in recent months, I’ve been observing a shift that’s both deeply exciting and quietly unnerving. It’s not about clever demos or human-feeling chatbots. It’s about AI learning to explore. And that, I believe, changes everything.

The Limits of Generative AI

Since ChatGPT's release in November 2022, we’ve seen a remarkable pace of development from the likes of OpenAI, Google DeepMind, Anthropic, Mistral AI, DeepSeek, and national-scale efforts from the UAE, Singapore and others. The progress has been staggering.

These models are brilliant in the hands of individuals — for summarising documents, drafting text, or performing open-source research. They help people upskill and improve productivity.

But they hit a wall when scaled into complex organisational settings. Despite the hype, real-world adoption remains patchy. They continue to be unrobust and for those of us who see the enormous potential, that’s frustrating. We’re not there yet.

One key limitation? Generative AI only works with what it’s seen before. It’s a ‘look-back machine’ — deeply impressive, but fundamentally retrospective. Even ChatGPT’s “deep research” feature is just a more thorough trawl through historical data. It doesn’t invent truly new solutions.

This also means these systems fail more often when you ask questions where there is little historical information. In our own experiments, for instance, GPT-4o-mini performs better as a healthcare assistant than as a financial one. That might seem counterintuitive, until you consider the training data. There’s a wealth of open, structured healthcare content — NHS guidance, clinical data, academic publications. In contrast, banks don’t publish internal conversations, so the AI has far less to learn from.

The result? GenAI is great at replicating patterns. But it struggles where there’s no precedent.

Where Things Get Interesting: AI That Explores

This is where a threshold is being crossed. We are starting to see evidence of AI reliably exploring new approaches and proposing viable solutions. And if this continues, it will change everything.

Reinforcement Learning (RL) agents enter the picture here. These are not agents in the same way that you hear about in terms of answering customer queries or booking holidays. These are about exploration of fundamental science.

Unlike traditional generative models, these RL agents assess new strategies, test hypotheses, and iterate based on outcomes. Often, they arrive at solutions that are completely counterintuitive to human logic — and that’s precisely the point.

This isn’t speculative. DeepMind has already shown what’s possible. AlphaGo beat the world’s best Go players using RL (Mastering the game of Go with deep neural networks and tree search | Nature). AlphaFold used similar principles to predict protein structures and revolutionised biology (AlphaFold - Google DeepMind).

Now we’re beginning to see what happens when these ideas meet large language models. For example, AlphaEvolve — an internal DeepMind project reportedly using an LLM backbone to generate code, which is then scored and refined by RL techniques (Google DeepMind’s new AI agent cracks real-world problems better than humans can | MIT Technology Review). It throws out the bad, improves the good, and iterates towards optimal solutions. Google claims that implementing AlphaEvolve across their datacentres led to a 0.7% improvement in overall computing efficiency. That’s not a rounding error at the scale of Google — that’s a seismic shift.

But AlphaEvolve — and others like it — won’t stop at infrastructure or code. These systems are being aimed at science, medicine, finance, engineering. They’re not just tools. They’re becoming automated idea and solution generators.

And that’s the threshold we’re now approaching: a world where the next great scientific discovery might not come from a human, but from an AI system that’s learned how to explore the problem space more effectively than we can.

The Light and the Shadow

Let me be clear — this is not Artificial General Intelligence. It’s not sentience. It’s not HAL 9000.

What it would be is a profound acceleration of our ability to understand and solve problems. And that has breathtaking potential — from new medicines, to materials science, to new physics or maths, to power generation. If there’s a rule set behind it, these systems can learn it - surpass us in applying it, and what’s even bigger – identify what lies beyond.

But as always, there’s no light without shadow.

The same capabilities that could transform medicine could also be used to identify new forms of biological or chemical weapons. The same reinforcement loops that create efficient solutions could also explore how to deceive, coerce or exploit populations. The same systems that could identify new avenues of maths or physics could also be weaponised. What would that look like? We've worried about these systems, but now they look like they are near-term possible.

I worry that we are entering a period where capability is no longer the constraint. The constraint is how that power is directed - and by whom.

Whether these models remain closed within a few labs or are released widely will shape the risks we face. Closed models concentrate power. Open models risk misuse. There’s no easy answer - but the conversation is only getting more urgent. And it's not about the Chat-based systems that we have had. It's what's coming through now.

What happens next is vital

We are crossing a significant threshold in AI. It is not AGI — but it might be something just as impactful. A way to accelerate human understanding of the world, at a pace we’ve never seen before.

That prospect excites me. It also worries me. And it should do both.

Because what happens next depends not just on what these models can do - but on what those who have them choose to do.