- Introduction
- Chapter 1 The Earth Observation Landscape: Sensors, Orbits, and Markets
- Chapter 2 Data Sources and Licensing: Open, Commercial, and Tasking
- Chapter 3 Spatial Fundamentals: Projections, Grids, and Datacubes
- Chapter 4 Standards and Metadata: STAC, Cloud-Optimized GeoTIFFs, and Interoperability
- Chapter 5 Architectures in the Cloud: AWS, Azure, and Google Cloud for EO
- Chapter 6 Storage Patterns: Data Lakes, Lakehouses, and Tiling Strategies
- Chapter 7 Ingestion at Scale: ETL/ELT, Event-Driven Pipelines, and STAC Indexing
- Chapter 8 Cleaning and Preprocessing: Radiometric, Atmospheric, and Geometric Corrections
- Chapter 9 Feature Engineering: Spectral Indices, Textures, and Spatiotemporal Aggregations
- Chapter 10 Machine Learning for Imagery: Classification, Segmentation, and Detection
- Chapter 11 Time-Series and Change Detection: Monitoring, Alerts, and Forecasting
- Chapter 12 Multi-Source Fusion: Optical, SAR, Thermal, AIS/ADS‑B, and IoT
- Chapter 13 MLOps for EO: Experiment Tracking, Model Versioning, and Continuous Delivery
- Chapter 14 Validation and Uncertainty: QA/QC, Benchmarks, and Ground Truthing
- Chapter 15 Performance and Cost Optimization: GPUs, Serverless, and TCO Management
- Chapter 16 Data Governance, Privacy, and Compliance for Satellite Analytics
- Chapter 17 APIs and Products: Tiles, Vectors, and Insights‑as‑a‑Service
- Chapter 18 Monetization Models: Pricing, Packaging, and Contracts
- Chapter 19 Case Study—Agriculture: Crop Type, Yield, and Field Health
- Chapter 20 Case Study—Insurance: Catastrophe, Exposure, and Parametric Triggers
- Chapter 21 Case Study—Logistics: Port Congestion, Routing, and Trade Flows
- Chapter 22 Enterprise Integration: Snowflake, Databricks, and BI Tooling
- Chapter 23 Go‑to‑Market: Pilots, Partnerships, and Cloud Marketplaces
- Chapter 24 Building the Team: Roles, Skills, and Operating Models
- Chapter 25 Ethics and Responsible Use: Bias, Transparency, and Societal Impact
Big Data from Space: Processing and Applying Satellite Data for Business
Table of Contents
Introduction
Earth observation has entered a new era. Constellations of optical, radar, and thermal satellites now revisit every corner of the planet daily—sometimes hourly—streaming a torrent of pixels and signals into the cloud. What was once the domain of space agencies and research labs has become an accessible, API‑driven data layer for business. This book is a practical guide to turning that raw, orbital exhaust into trustworthy analytics and products that create measurable value.
Big data from space is not just “big” because of its volume; it is big because of its complexity. Spatial projections, sensor physics, revisit cycles, cloud cover, and regulatory constraints all stand between an analyst and a reliable signal. Many teams discover that moving from a promising notebook to a product-grade pipeline requires new patterns: standardized metadata, cost-aware storage, scalable preprocessing, and repeatable machine learning workflows. We meet you at that inflection point and provide the architecture examples, governance advice, and operational checklists needed to cross it.
The emphasis throughout is on building in the cloud and automating the boring but essential parts of the pipeline. You will learn how to design ingestion flows that index scenes as they arrive, how to apply radiometric and geometric corrections at scale, and how to construct datacubes that power fast queries. We walk through model development—from classical feature engineering with spectral indices to deep learning for segmentation and change detection—then show how to productionize models with MLOps: experiment tracking, versioning, CI/CD, and safe rollouts. Along the way, we highlight performance and cost optimization techniques so your analytics remain both fast and financially sustainable.
Because the value of satellite data is ultimately realized in decisions, we anchor concepts with industry applications. In agriculture, we translate spatiotemporal signals into crop type maps, field health indicators, and yield forecasts that support lending, input optimization, and sustainability claims. In insurance, we build exposure and hazard layers, accelerate catastrophe response with rapid damage assessment, and design parametric triggers that settle fairly and fast. In logistics, we measure port congestion, monitor yard inventory, and fuse satellite observations with AIS/ADS‑B to illuminate global trade flows. Each case study includes data choices, modeling approaches, validation strategies, product packaging, and go‑to‑market lessons.
Governance and responsibility are first-class concerns, not afterthoughts. Satellite analytics can surface sensitive patterns, infer activity, or inadvertently codify bias. We address privacy, licensing, export controls, and data ethics with concrete practices you can adopt: clear provenance via STAC, auditable pipelines, uncertainty reporting, and human‑in‑the‑loop review. By integrating these safeguards from the outset, you reduce risk, build trust with customers and regulators, and create products that stand the test of scrutiny.
This book is for startup founders validating a new EO product, for enterprise teams modernizing geospatial stacks, and for analysts who want to move beyond prototypes into reliable services. If you can work with Python, SQL, and a cloud console, you have the prerequisites; the rest is pattern recognition—learning which building blocks fit together and why. By the end, you will be able to ingest, clean, analyze, and monetize satellite‑derived datasets with confidence, ship customer‑facing APIs and dashboards, and scale operations without losing scientific rigor.
Ultimately, our goal is empowerment. The techniques and case studies ahead do more than explain “how”; they show “how to decide.” With a firm grip on both the physics above and the business needs below, you will be ready to transform orbital data into ground truth for your organization—and to build product‑grade EO analytics that matter.
CHAPTER ONE: The Earth Observation Landscape: Sensors, Orbits, and Markets
Earth observation today moves at a tempo that would have astonished anyone working in the field only two decades ago. The cadence is no longer seasonal or episodic; it is rhythmic, and in the best cases relentless, as constellations of optical, radar, and thermal payloads stitch the planet with overlapping passes each day. What used to require planning, proposals, and patience now arrives via APIs and event streams before your coffee cools. This shift changes how questions get asked, how models are built, and how decisions are defended. Yet for all the speed and apparent ease, the underlying physics and geometry remain stubborn, and misunderstanding them remains the fastest route to expensive mistakes.
Satellites do not observe; they measure. A pixel is not a photograph but a bucket of photons or electrons collected through a filter at a particular instant, geometry, and spectral band, stamped with an orbit solution and a clock that may or may not agree with your servers. The leap from raw telemetry to business insight requires a working familiarity with the instruments that produce those buckets, the orbits that deliver them, and the markets that assign them value. This chapter is your field guide to that terrain, intended to save you from reinventing known limits and to help you spot opportunities that sit just beyond today’s operational habits.
Optical sensors remain the public face of Earth observation because their images look like the world we recognize. Multispectral and hyperspectral instruments sample reflected sunlight across visible wavelengths and into the near‑ and shortwave infrared, turning color into code about vegetation vigor, soil moisture, mineralogy, and urban materials. The trade-offs are predictable but unforgiving. Spatial resolution governs how small an object you can distinguish, spectral resolution determines how confidently you can identify it, and radiometric resolution controls how faint a signal you can trust. Swath width dictates how much area you cover per pass, and revisit time determines whether you see change or merely stages of it. Most importantly, optical sensors remain hostages to daylight and weather, a limitation that shapes architectures and value propositions in equal measure.
Where eyes fail, radar sees. Synthetic aperture radar instruments send pulses of microwave energy and listen for returns, constructing their own illumination and carrying a side of polarization information that reveals structure and roughness below the canopy. Because microwaves penetrate clouds and operate day or night, SAR fills critical gaps in any operational pipeline, yet it introduces its own dialect of complexity. Speckle, geometry, and interpretation require different muscles than optical workflows, and the gap between a backscatter number and a business insight is often wider and steeper. Still, for change detection over cloudy tropics, maritime surveillance, and ground motion monitoring, SAR is not a niche; it is an anchor.
Thermal and infrared sensors occupy a middle ground, sensing emitted radiation rather than reflected sunlight. From sea surface temperature to wildfire radiative power and urban heat islands, these bands detect energy states that optical instruments miss entirely. They trade spatial sharpness for insight into processes that drive risk and resource flows, making them quietly indispensable for insurance, agriculture, and energy markets. Like SAR, they do not care about daylight, but they care deeply about atmosphere, calibration, and time of observation, because temperature is a fickle variable that drifts with season, weather, and sensor age.
Orbits decide who sees what when, and perhaps more importantly, who pays for the privilege. Sun‑synchronous orbits keep instruments in near‑constant lighting conditions, crossing the equator at roughly the same local solar time each pass, a rhythm that simplifies calibration and change detection. These orbits favor mid‑altitude workhorses that balance resolution, swath width, and revisit frequency, and they host the majority of commercial optical and radar constellations. Geostationary satellites, perched high above the equator, sacrifice spatial detail for cadence, staring at whole hemispheres with refresh rates measured in minutes, a capability that powers weather monitoring, maritime domain awareness, and rapid alerting for insurance and logistics. Lower, non‑sun‑synchronous orbits are increasingly used for tasking flexibility, stereoscopic collection, and calibration campaigns, but they impose irregular revisit patterns that complicate everything from tiling strategies to model training.
Constellations rather than single satellites now define capability. The move from a few exquisite instruments to many good enough ones has redefined revisit from a calendar concept to a statistical one. With hundreds of sensors in various orbits, latency becomes a design choice rather than a fate, and coverage becomes a service level agreement rather than a hope. For businesses, this means that architecture must shift from planning around scarcity to designing around abundance, filtering redundancy, and handling heterogeneity. The challenge is no longer getting data; it is getting the right data, at the right fidelity, fast enough to matter.
Data markets reflect this abundance and the friction it introduces. Open government missions provide global baselines with liberal licensing, predictable processing, and long histories, but they lag in cadence, resolution, and tasking flexibility. Commercial providers sell resolution, recency, and responsiveness, often wrapped in cloud subscriptions, tasking queues, and analytics layers. Emerging brokers and aggregators sit between these worlds, normalizing formats, managing quotas, and offering pay‑as‑you‑go access to heterogeneous fleets. For product teams, the choice is rarely ideological; it is a negotiation among cost, latency, coverage, and legal constraints, with licensing and redistribution rights often dominating technical considerations in go‑to‑market plans.
Tasking introduces another axis of control and cost. Rather than accepting whatever a satellite happens to collect, customers can request specific areas be imaged at specific times, with specific angles and atmospheric conditions. This capability is seductive, but it carries operational overhead, minimum area commitments, and weather risk that many teams underestimate. For agriculture, tasking enables precise phenological captures; for insurance, it can mean the difference between a prompt loss assessment and a prolonged dispute; for logistics, it may be used to monitor specific ports or corridors on demand. Successful teams treat tasking as a premium feature, not a default path, and build fallback strategies that blend tasked and archival collections.
The vertical market lens clarifies why sensor choices matter beyond technical performance. Agriculture prizes frequent, cloud‑free views during growing seasons to map crop type, monitor field health, and forecast yield. Insurance values rapid, reliable change detection for catastrophe response and exposure monitoring, often fusing optical, SAR, and thermal to see through smoke, clouds, and darkness. Logistics leans on frequent wide‑area scanning to infer port and yard activity, integrate with shipping signals, and illuminate trade flows that move markets. Each use case favors different points in the sensor–orbit–cadence space, and each imposes distinct preprocessing, modeling, and validation patterns.
Spectral indices illustrate this divergence in practice. In agriculture, a handful of ratios derived from optical bands, such as those involving red and near‑infrared, can map vegetation vigor at scale and feed into credit and input decisions. In insurance, change indices that fuse pre‑ and post‑event imagery can flag damage quickly, but they must be robust to seasonality, weather, and sensor differences, often requiring radar support when clouds intervene. In logistics, texture and coherence features derived from SAR can reveal the presence and movement of containers or ships, complementing optical counts and AIS signals to build a more complete operational picture.
Scale and cadence create engineering consequences that many teams discover only after they have models that work in notebooks but crumble in production. Daily global coverage sounds generous until you realize that each pixel may arrive with different projections, sun angles, cloud masks, and calibration states. Harmonizing these streams into a unified datacube or tiling scheme is not a one‑time effort; it is a core capability that determines how fast you can query, how cheaply you can store, and how reliably you can serve insights. Cloud‑native architectures increasingly treat metadata as first‑class data, indexing scenes as they arrive and enabling dynamic query and processing rather than bulk movement and static mosaics.
Licensing and data rights are not footnotes; they are product constraints. Open data may be free to download but not free to redistribute, while commercial data may carry usage caps, attribution mandates, and export controls that affect both architecture and sales motion. Government restrictions on high‑resolution imagery and radar data can limit where you operate, whom you serve, and how you host data. For startups, these constraints shape minimum viable products and pilot geographies; for enterprises, they inform procurement strategies and compliance workflows. Ignoring them does not make them go away; it merely delays expensive rework.
The market side of Earth observation is maturing from experimental budgets to operational line items. Agriculture buyers care about risk reduction, input optimization, and sustainability reporting, and they expect analytics to integrate with farm management systems and financing workflows. Insurance customers want faster claims, better exposure management, and parametric products that settle automatically, which demands low latency, high reliability, and auditable uncertainty estimates. Logistics firms seek visibility into global trade flows to anticipate disruptions and optimize routing, which requires fusion across satellite, terrestrial, and signal data at near‑real‑time speeds. In all three, the value is not in pretty pictures but in decisions supported by timely, repeatable, and defensible signals.
Sensor fusion emerges as a strategic capability rather than a technical nicety. Few business problems are solved by a single band or instrument. Clouds block optical views, but SAR sees through them; optical resolves textures and types that SAR confuses; thermal detects states invisible to reflected light; AIS and ADS‑B anchor moving vessels to satellite detections; IoT and ground sensors calibrate interpretation and reduce false alarms. Architectures that assume single‑source purity are brittle; architectures that treat every source as a node in a graph of evidence survive the chaos of real operations and deliver higher signal integrity.
Calibration and uncertainty are inseparable from value. A yield forecast or damage assessment that does not carry confidence bounds will not be trusted by lenders, underwriters, or operators. Ground truth campaigns, cross‑sensor validation, and uncertainty propagation must be designed into pipelines from the beginning, not bolted on before a launch. This requires discipline in metadata capture, versioning of models and corrections, and explicit handling of missing and low‑quality data. Teams that build this discipline early move faster later, because they avoid costly reprocessing and credibility gaps that stall sales cycles.
The economic model of satellite analytics is shifting from data sales to insight subscriptions, usage‑based APIs, and embedded decision layers. This shift rewards architectures that can scale elastically, automate quality checks, and deliver reliable service levels without heroic effort. It penalizes workflows that require manual downloading, bespoke preprocessing, and fragile notebooks. The winners are often not those with the best algorithms alone, but those with the best pipelines, governance, and operational loops that turn satellite data into a dependable business input.
Finally, the landscape continues to evolve faster than any single chapter can capture. New sensors launch, new orbits fill, and new regulations emerge, sometimes between quarterly earnings calls. What does not change is the need for a clear mental model of what satellites can and cannot do, what orbits and instruments serve which decisions, and how markets translate technical capability into business value. With that foundation, you can evaluate tools, design architectures, and choose data strategies that endure beyond the hype cycle, and you can do so with enough confidence to ship products that matter.
The next chapter will turn from this landscape to the practicalities of data sources and licensing, laying out which contracts to read, which open archives to trust, and how to design acquisition strategies that keep your product both legal and competitive. For now, keep the sensors, orbits, and markets in mind as the board on which you will play, because every architectural choice downstream is a move on that board, and winning games requires knowing the rules and the pieces.
This is a sample preview. The complete book contains 27 sections.