My Account List Orders

Compute Rush

Table of Contents

  • Introduction
  • Chapter 1 From Models to Markets: What’s Driving the Compute Curve
  • Chapter 2 Training vs. Inference Economics
  • Chapter 3 The Latency Imperative
  • Chapter 4 The Data Bottleneck
  • Chapter 5 Safety, Reliability, and SLAs
  • Chapter 6 GPU Nation
  • Chapter 7 Challengers and Alternatives
  • Chapter 8 The Foundry Funnel
  • Chapter 9 Export Controls and Industrial Policy
  • Chapter 10 The Networking Wildcard
  • Chapter 11 Site Selection 2.0
  • Chapter 12 Cooling Revolutions
  • Chapter 13 Hyperscalers, Colos, and AI Campuses
  • Chapter 14 Water, Community, and Permitting
  • Chapter 15 Real Estate, REITs, and Financing
  • Chapter 16 The Great Reforecast
  • Chapter 17 Wires, Not Just Watts
  • Chapter 18 Gas, Renewables, and Storage
  • Chapter 19 Nuclear’s Second Act
  • Chapter 20 Cities, States, and Carbon Accountability
  • Chapter 21 The Cost of Intelligence
  • Chapter 22 The Software Efficiency Renaissance
  • Chapter 23 On-Device AI and the Edge
  • Chapter 24 Social License and Externalities
  • Chapter 25 Scenarios 2025–2030

Introduction

The year AI became a household word wasn’t the year it learned to talk or paint. It was the year that “compute”—the raw ability of machines to read, write, and reason—hit the physical limits of our grids, factories, and cities. In boardrooms and government offices, from Silicon Valley to Seoul, from rural cloud towns to the halls of Washington and Brussels, a new question sounded: Can the world actually build enough chips, enough data centers, and enough power to fuel the next epoch of artificial intelligence?

Welcome to the Compute Rush.

In these pages, we step out from under the digital illusions of the AI boom—the chatbots and bots, the viral hype cycles—to examine the hard-edged, real-world race underway beneath them. What used to be sci-fi speculation has become the dominant force in technology, business, and policy: AI models are doubling in size every few months, every industry is experimenting with AI integration, and yet the supply of compute—the ability to run, train, and serve these models—has become the true bottleneck of progress. It’s no longer just about algorithms. The story now centers on the tangible foundations of the digital age: silicon, steel, and steam.

This book offers a guided tour through this compute revolution, breaking it down into three simple but interconnected parts:
Silicon stands for the chips—GPU accelerators and their rivals—whose mind-boggling parallelism powers modern AI. Steel stands for the data centers and their intricate architectures, the vast air- or liquid-cooled fortresses rising on city edges and rural grids. Steam is a nod to power: the gigawatts of electricity, the water for cooling, and the climate risks reshaping the infrastructure map of the future. The thesis is stark: wherever AI goes, these physical constraints—fabrication plants, buildings, and electricity—now define what is possible, what is profitable, and what is permitted.

Here, we’ll go deep but stay clear. Each chapter draws on firsthand interviews with executives, engineers, grid operators, policymakers, and community leaders. We’ll walk through semiconductor fabs and grid control rooms; we’ll meet the unsung builders, from network architects to permitting lawyers, whose quiet decisions shape the fate of trillion-dollar industries. Through a mix of recent case studies and analysis, you’ll see the global scramble for compute from every side: the hyperscaler giants and the insurgent startups, the local officials turning farmland into fiber-lit campuses, and the energy professionals tasked with preventing blackouts as demand triples.

As you read, you’ll come away not just with facts and figures, but with a framework for understanding what’s coming next. You’ll learn why AI’s success now depends on training and inference economics, grid constraints, supply chain pinch points, and the politics of land, water, and power. You’ll see how the rush for compute is redrawing technology roadmaps and even the geography of cities—and you’ll gain practical insights for innovators, investors, policymakers, and anyone who wants to track, or influence, the next phase of this transformation.

By the end of Compute Rush, you’ll know how compute is made, where it lives, what it consumes, and who controls it—and with that knowledge, you’ll see why this global race will reshape jobs, markets, and security for years to come. The future of AI won’t just be written in code. It will be built—one chip, one data center, one megawatt at a time. Let’s get started.


CHAPTER ONE: From Models to Markets: What’s Driving the Compute Curve

The year 2022 felt like a seismic shift. For decades, artificial intelligence had simmered, a promising technology often confined to academic papers and specialized enterprise applications. Then came a flood. ChatGPT, Midjourney, Stable Diffusion—suddenly, AI wasn’t just a concept; it was a tangible, interactive force in the hands of millions. It could write poetry, generate photorealistic images, and even debug code, all in a matter of seconds. This public unveiling ignited a profound curiosity, but beneath the dazzling demos lay a deeper truth: these seemingly miraculous capabilities were powered by an unprecedented hunger for computational muscle.

This hunger, this insatiable demand for "compute," is the driving force behind the global scramble we call the Compute Rush. It's not just about clever algorithms anymore. It's about the sheer scale of the digital infrastructure required to make AI tick. To understand why AI needs so much compute, we first need to grasp what’s changed in the world of AI models themselves.

For years, the conventional wisdom held that bigger models were better models. This wasn’t a philosophical stance; it was an empirical observation. As researchers added more parameters—the variables within a neural network that essentially define its knowledge and capabilities—the models got smarter. Early language models might have had millions of parameters; GPT-3, released in 2020, boasted 175 billion. The trajectory was clear: exponential growth. This wasn't just about making existing tasks slightly better; it was about unlocking entirely new capabilities. Models that could once only generate a few coherent sentences were now writing entire essays, summarizing complex documents, and even engaging in surprisingly nuanced conversations.

The concept of "context windows" is another critical factor in this escalating demand. Think of a context window as the model’s short-term memory—the amount of information it can process and refer back to within a single interaction. Early chatbots might forget what you said two sentences ago. Modern large language models (LLMs) can handle entire documents, lengthy codebases, or even hours of transcribed audio within their context window. As these windows expand, the model can understand more complex queries, maintain longer conversations, and perform more sophisticated tasks, but this expanded memory comes at a significant computational cost. Each additional token—a word or sub-word unit—fed into a larger context window requires more processing power.

The shift towards "multimodality" further amplifies this demand. Initially, AI models specialized in one type of data: text, images, or audio. Now, the cutting edge is about models that can seamlessly blend these modalities. Imagine an AI that can analyze a medical image, read a patient’s electronic health record, and then generate a spoken diagnosis. Or a design AI that takes a text description, creates an image, and then generates a 3D model with accompanying sound effects. This integration of different data types means the models themselves become far more complex, requiring more diverse training data and significantly more compute for both training and inference. Each new modality adds layers of computational burden, as the model must not only process each type of data but also understand the intricate relationships between them.

These advancements aren't just theoretical; they're driving "killer use cases" that are transforming industries. In healthcare, AI is accelerating drug discovery by simulating molecular interactions, analyzing vast genomic datasets, and even designing novel proteins. Financial institutions are using AI for fraud detection, algorithmic trading, and personalized financial advice, sifting through market data at speeds no human could match. Manufacturing is leveraging AI for predictive maintenance, optimizing supply chains, and designing new materials, leading to unprecedented efficiencies. Each of these applications, from the mundane to the miraculous, relies on ever-increasing computational horsepower. The demand isn't speculative; it's being driven by tangible business value and competitive advantage.

Of course, the economics of this compute curve are paramount. Two key metrics that businesses obsess over are "cost-per-token" and "latency." Cost-per-token refers to the expense of generating a single unit of output from an AI model. For a text-based LLM, this might be the cost of generating one word. As models grow and become more complex, the raw cost-per-token can rise. Companies are constantly seeking ways to drive this down, whether through more efficient hardware, optimized software, or strategic deployment decisions. A fraction of a cent difference per token can translate into millions of dollars in operational costs when scaled across billions of user queries.

"Latency" refers to the time it takes for an AI model to respond to a query. In many applications, especially those involving real-time user interaction, even a few hundred milliseconds can make a difference between a delightful experience and a frustrating one. Imagine a conversational AI that pauses for five seconds before every reply—it would quickly become unusable. Lower latency often requires more dedicated compute resources, strategically placed data centers, and optimized network infrastructure. For mission-critical applications like autonomous driving or industrial automation, latency isn't just a user experience issue; it's a safety and operational imperative. The race to minimize latency is a constant push against the physical limitations of speed and distance.

One of the less intuitive aspects of this compute explosion is the concept of "emergent capabilities." This refers to abilities that a model exhibits that weren't explicitly programmed or foreseen during its training. As models scale up in size and complexity, they sometimes develop surprising new skills. For example, a language model trained solely on text might suddenly demonstrate rudimentary reasoning abilities or the capacity to translate between languages, even if it wasn't specifically tasked with translation during its training. These emergent capabilities are exciting, hinting at a path toward more general artificial intelligence, but they also underscore the unpredictable nature of scaling and the immense compute resources needed to explore these new frontiers. It's a bit like discovering entirely new continents simply by building bigger, faster ships.

The fundamental shift, then, is that AI is moving beyond niche applications to become a pervasive layer across almost every digital interaction. Whether you’re asking your phone a question, generating an image for a presentation, or relying on intelligent systems to route your package, you’re touching a compute-intensive AI model. This ubiquity means that even marginal improvements in model capability—a slightly better translation, a more nuanced summary—can unlock enormous value across a vast user base. The cumulative effect of these small, incremental gains, powered by ever-larger models, is what truly drives the compute curve. It's a virtuous cycle: better models lead to more applications, which in turn demand more compute, fueling further research into even larger and more capable models.

Consider the ripple effect. A new, more capable AI model from a leading lab isn't just an academic achievement; it becomes a benchmark for the industry. Competitors immediately begin allocating vast budgets to acquire the necessary chips, expand their data center footprint, and secure the power supply to match or exceed that capability. This competitive dynamic is a powerful accelerant for the Compute Rush. No major tech company, no ambitious startup, can afford to fall behind in the AI arms race. The cost of inaction—of not having enough compute—is perceived as far greater than the cost of aggressive investment.

In essence, the drivers of the compute curve are a confluence of technological breakthroughs, market demands, and competitive pressures. The exponential growth in model parameters, the expansion of context windows, the push towards multimodality, and the emergence of unforeseen capabilities all contribute to AI's insatiable appetite for processing power. Simultaneously, the relentless pursuit of lower cost-per-token and reduced latency for killer use cases translates directly into a demand for more, and more efficient, compute infrastructure. This isn't a temporary trend; it’s a fundamental reorientation of the digital economy, where physical resources—the silicon, steel, and steam—are now the critical bottlenecks shaping the future of artificial intelligence.


This is a sample preview. The complete book contains 27 sections.