The post-transformer era has an answer to AI’s energy crisis

Industry forecasts are sounding the alarm. Hundreds of billions of dollars will need to be invested in new data centers in the coming years to keep pace with AI’s insatiable computational needs.

Bain & Company projects that this annual spend will reach $500 billion per year by 2030. Projects like the Softbank, OpenAI and Oracle-led Stargate initiative are being set up to meet this demand.

Article continues below

You may like

Zuzanna Stamirowska

Social Links Navigation

CEO and co-founder of Pathway.

More compute. More layers. More data. This is the logic that we see and hear today. This sets us on an unsustainable trajectory, creating a challenge which is no longer tomorrow’s concern. An AI energy crunch is a near-term inevitability if we remain locked into the path set by the prevailing wisdom of upwards scale.

Being wedded to that trajectory also ignores a hard truth: unsustainable demand is hardwired into the unit economics of today’s LLMs and, more specifically, the transformer architecture that comes with high energy consumption baked into training and inference.

The emergence of reasoning models, that produce more tokens in hidden “thinking” text before giving answers, adds to the problem. A 2025 study found that while a long prompt to GPT-4o consumes 0.42 Wh, a long prompt to reasoning models can consume as much as 33.634 Wh (DeepSeek-R1) and 30.495 Wh (GPT-4.5). That’s enough power to charge a smartphone, and then some, per prompt.

Spiraling token usage and inference compute cost as the only means to improve models creates a massive question mark about the current economic viability of the market in AI tools. We dramatically need a paradigm shift to course correct. That shift is already underway — and it begins with leaving the transformer behind. Welcome to the post-transformer era.

Consequences of chain of thought reasoning

Increased adoption of LLMs alone doesn’t explain the runaway compute and power demands of AI companies. It’s also a result of the fact that LLMs are consuming more and more tokens over time.

The first generation of LLMs generated a few hundred tokens per response. In comparison, today’s ‘reasoning’ models burn through thousands of thinking tokens to map out step-by-step logic in chain-of-thought prompting.

But reasoning came as an afterthought to the core architecture that remained untouched since the beginning – the transformer. Transformer-based models are incredibly resource-intensive.

Taking cues from biological systems

The answer lies in nature’s most elegant design: the human brain. The human brain is a shining example of a natural system that can achieve advanced cognitive feats with remarkable energy efficiency. Inspired by this, new, challenger AI companies, are replicating how humans process information in a new architecture for LLMs to close the efficiency and learning gap that separates today’s AI from the brain’s computational model.

The brain operates through under 100 billion neurons acting as computational units, linked by hundreds of trillions of synapses that store and transform information. The number of synapse connections into each neuron varies. Some act as central hubs with countless connections, and others have significantly less.

This dynamic remains true even as the brain grows and changes – making the brain the quintessential scale-free network. Remarkably, the brain uses just 20 watts of power to power this network of neurons. This is equivalent to the power of a dim lightbulb and a drop in the ocean compared to the colossal energy demands of today’s main LLMs.

In 2025, we launched the Dragon Hatchling (BDH) architecture to close the gap between AI models and the brain by replicating neuron-synapse dynamics in LLMs. Where transformer-based models draw on static, pre-trained weights (the same fixed knowledge base every time) to predict outputs, BDH works differently.

Only the artificial neurons relevant to the task at hand are activated, and their connections strengthen or weaken based on experience garnered from an enterprise deployment. This is Hebbian learning in practice: ‘neurons that fire together, wire together’, mixed with synaptic plasticity, just like in our own brains.’ The result is a model that builds intelligence through use, not just through training, via local interactions.

It has a very compact and efficient network-like structure, which keeps knowledge exactly in the spot where it’s being processed, and enables continuous learning without the regular retraining cycles that transformer-based models depend on.

This is a much-needed alternative approach to improving LLM reasoning without burning through tokens. Enterprises can do far more with LLMs before hitting token limits as entire models don’t have to run every time they’re put into use, just the neuron connections that are relevant and needed for the task at hand.

And AI companies finally have a solution to both improve model reasoning capabilities and break from the spiraling energy consumption of the transformer that isn’t sustainable. Crucially, BDH is compatible with the general-purpose hardware that has made the LLM boom possible.

Rethinking architecture is the only way to tackle root causes

An AI energy crunch is a bleak scenario but it’s not an inevitable one. Rather, it’s a scenario based on architectural choices that can be undone. The transformer changed the world. Reasoning capabilities added to LLMs in recent years have succeeded in improving model functionality.

But the underlying unit economics aren’t viable. Nature already offers a solution to the problem of efficient, adaptive intelligence and the ethos of the post-transformer era lies in taking the right lessons from this. We can build AI systems that get smarter through use, in a sustainable way that doesn’t depend on infinitely upwards scale.

For enterprises, this isn’t just an environmental story — it’s an economic one. We’re looking at 10x or more reduction in inference costs. AI that reasons efficiently, learns continuously, and requires a fraction of the compute isn’t a future aspiration. It’s what post-transformer architecture delivers today.

We’ve featured the best Large Language Models (LLMs) for coding.

What's Hot

Incontinence problems leave me 'leaking while competing'

Forget sit-ups — this ‘halfway sit’ builds a strong and stable core without hurting your lower back

Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context

Forget sit-ups — this ‘halfway sit’ builds a strong and stable core without hurting your lower back

Artemis II Astronauts Get Personal About Historic Mission

I use the ‘cheese prompt’ to instantly improve AI answers — here’s how it works

Garmin fans can now ‘unlock fertility insights’ on their wrist thanks to this handy new feature

No, the Viral iPhone Fold Video Isn’t Real. How We Know It’s Fake

Skoda developed a bicycle bell that’s meant to get past your ANC headphones: here’s how it works

Incontinence problems leave me 'leaking while competing'

Forget sit-ups — this ‘halfway sit’ builds a strong and stable core without hurting your lower back

Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context

Google Will Now Let You Virtually Try on Clothes With Just a Selfie

What’s in a Name? How to Get Your Domain Right

Speed Across the Galaxy Next Year in Star Wars: Galactic Racer

News

Company

Services

What's Hot

The post-transformer era has an answer to AI’s energy crisis

Consequences of chain of thought reasoning

Taking cues from biological systems

Rethinking architecture is the only way to tackle root causes

Related Posts

News

Company

Services

Subscribe to Updates