Robotics Software Engineering with ROS 2

Introduction
Chapter 1 The ROS 2 Ecosystem and Core Concepts
Chapter 2 Workspace Layout, Packages, and Build Tooling
Chapter 3 Coding Nodes with rclcpp and rclpy
Chapter 4 Composition, Lifecycle Nodes, and Executors
Chapter 5 Topics, Services, and Actions in Practice
Chapter 6 Interface and Message Design Patterns
Chapter 7 Quality of Service: Reliability, Durability, and Deadlines
Chapter 8 DDS Middleware Choices and Network Topologies
Chapter 9 Time, Clocks, and Synchronization
Chapter 10 Configuration Management: Parameters and Launch
Chapter 11 Logging, Diagnostics, and Observability
Chapter 12 Real-Time Linux and Deterministic Control Loops
Chapter 13 ros2_control and Hardware Abstraction Layers
Chapter 14 Sensor Drivers, Buses, and Actuator Integration
Chapter 15 State Estimation and Transforms with tf2
Chapter 16 Navigation and Planning with Behavior Trees
Chapter 17 Manipulation with MoveIt 2 and Control Strategies
Chapter 18 Simulation Workflows with Gazebo and Test Harnesses
Chapter 19 Testing: Unit, Integration, System, and Launch Tests
Chapter 20 Continuous Integration and Continuous Delivery for ROS 2
Chapter 21 Security, Safety, and Fault Tolerance
Chapter 22 Performance Analysis, Profiling, and Latency Budgeting
Chapter 23 Multi-Robot Systems and Distributed Coordination
Chapter 24 Data Recording, Playback, and Telemetry with rosbag2
Chapter 25 Migration Strategies and Long-Term Maintenance

Introduction

Robotics Software Engineering with ROS 2 is a practical guide to building scalable, maintainable robot applications using ROS 2 best practices. Rather than treating ROS 2 as a collection of tools and APIs, this book approaches it as an engineering platform for delivering production-grade systems. You will learn patterns that help you design clear architectures, choose the right communication semantics, budget for latency, and keep your codebase healthy as complexity grows.

This book is written for software engineers, roboticists, and technical leads who are responsible for shipping reliable robots—whether in research labs, startups, or established industry programs. We assume you are comfortable with modern software development workflows and familiar with C++ or Python. No single robotics domain is presupposed; the techniques apply across mobile robots, manipulators, autonomous vehicles, and fixed automation alike.

Our focus is on decisions that matter in practice. We begin with core ROS 2 concepts and build up to architectural patterns that tame large systems: composing nodes to reduce process sprawl, structuring packages for longevity, and designing messages that can evolve without breaking consumers. You will see how to use QoS profiles to trade off reliability, latency, and bandwidth; how executors and callback design affect determinism; and how to reason about clocks and time synchronization across heterogeneous hardware.

Real-time control receives special attention. We discuss how to align operating system choices, middleware behavior, and ROS 2 executors to achieve predictable control loops. You will learn how to isolate critical paths, interact with hardware through ros2_control, and integrate sensors and actuators over common buses—all while keeping diagnostics, logging, and observability first-class so that issues are detectable before they become outages.

Quality is engineered, not inspected in at the end. Accordingly, substantial chapters are devoted to testing strategies and CI/CD. We show how to structure unit, integration, and system tests; use launch testing to validate end-to-end behavior; and incorporate simulation in the pipeline to catch regressions early. You will learn to build pipelines that create reproducible artifacts, run hardware-in-the-loop or simulation-based checks, and provide fast feedback to developers without sacrificing coverage.

Because every project lives within constraints, we address the trade-offs of middleware selections and network topologies, the practicalities of security hardening and safety cases, and the realities of performance tuning with profiling and tracing. We also provide migration strategies—both for teams moving from ROS 1 and for those navigating changes between ROS 2 releases—so that you can evolve without derailing ongoing work.

Throughout, case studies drawn from real industry projects illustrate how the patterns play out under pressure: what went right, what failed, and how teams iterated toward robust solutions. The goal is not to prescribe one “correct” architecture, but to equip you with a toolbox and the judgment to apply it in your context.

By the end of this book, you should be able to design ROS 2 systems that are modular, testable, observable, and ready for sustained development. You will know how to balance reliability with performance, when to favor composition over distribution, how to structure interfaces for long-term evolution, and how to support continuous delivery without compromising safety. Most importantly, you will have a set of practical patterns that make complex robots tractable to build—and maintain.

CHAPTER ONE: The ROS 2 Ecosystem and Core Concepts

When engineers describe ROS 2, they often start by calling it a "robot operating system." That phrase, while common, is slightly misleading. ROS 2 is not an operating system in the classical sense; it does not manage hardware resources or schedule processes at the kernel level. Instead, think of it as a flexible software framework—a structured collection of tools, libraries, and conventions—that runs atop a conventional operating system like Linux or Windows. Its true purpose is to provide a standardized way for the diverse components of a robot to discover each other, communicate, and coordinate their behavior. This makes ROS 2 an engineering platform for building complex robotic systems, much like a well-designed city grid that allows different neighborhoods to interact efficiently without dictating what each building looks like inside.

The original Robot Operating System, ROS 1, emerged from academia and proved tremendously successful for research and prototyping. However, its architecture carried inherent limitations that became apparent as robots moved from labs into factories, warehouses, and hospitals. ROS 1 assumed a single master process to coordinate communication, which created a single point of failure. Its communication layer was not designed for real-time performance, and it offered little in the way of security or guaranteed message delivery. ROS 2 was conceived from the ground up to address these shortcomings, adopting a design philosophy centered on production readiness, real-time capable communication, multi-platform support, and a modular, federated architecture.

At the heart of ROS 2's architecture is a move away from a centralized master. Instead, it employs a distributed discovery system, typically managed by the underlying middleware. This allows nodes to find each other dynamically on a network without a single coordinator. If one part of the system crashes, the rest can continue functioning. This fundamental shift makes ROS 2 systems inherently more robust and scalable. Furthermore, by building on top of the Data Distribution Service (DDS) standard, ROS 2 gains powerful features like configurable quality of service, network fault tolerance, and a degree of vendor neutrality, though it also introduces new complexities that we will navigate throughout this book.

To understand how a ROS 2 system is built, we must first understand its atomic unit: the node. A node is a single, purpose-driven software process that performs a specific function within the robot. One node might read data from a camera, another might calculate motor commands, and a third might run a localization algorithm. This modular design is powerful because it allows developers to work on individual components in isolation, testing and replacing them without rewriting the entire system. A collection of nodes working together forms the complete robot application, communicating through well-defined interfaces.

Nodes communicate with each other using a few primary mechanisms, the most fundamental of which is the topic. A topic is a named channel over which nodes exchange messages. A publishing node sends data to a topic, and any number of subscribing nodes can receive that data. This publish-subscribe model is asynchronous and decoupled—the publisher does not need to know who is listening, and subscribers do not need to know who is publishing. This is ideal for continuous data streams, like sensor readings or status updates, where timing flexibility is more important than guaranteed receipt of every single message.

For situations that require a direct, synchronous request-response interaction, ROS 2 provides services. A service is defined by a pair of messages: a request and a response. A client node sends a request message to a server node, which processes it and returns a response. Services are perfect for operations like triggering a specific action, querying the current state, or commanding a one-time configuration change. They are blocking from the client's perspective—the client waits for the server to reply—which makes them less suitable for high-frequency data but excellent for discrete transactions.

Sometimes, a robot needs to initiate a long-running operation that provides feedback as it executes, such as navigating to a waypoint or picking up an object. For these scenarios, ROS 2 offers actions. An action is built on top of topics and services, combining a goal request, a result response, and a feedback topic. A client sends a goal, the server acknowledges it, and then the server can periodically publish feedback messages while it works. Once complete, it sends a final result. This pattern is essential for tasks that take measurable time and where the caller needs to monitor progress or have the ability to cancel the operation mid-execution.

Beyond communication, nodes often need configurable parameters—settings that can be adjusted at runtime without recompiling code. ROS 2 has a global parameter system where each node can declare, get, and set parameters. Parameters can hold simple types like integers, strings, and booleans, or more complex structures. A node might expose parameters for its control loop frequency, a file path for a model, or tuning gains for a filter. This system allows for flexible deployment and tuning, and parameters can be set individually via command-line tools, launch files, or programmatically from other nodes.

The magic that enables all these nodes to communicate, whether on a single machine or across a network, is the ROS 2 middleware layer. This layer is an implementation of the DDS standard. DDS is a specification for real-time, scalable, and deterministic data exchange, used in industries from aerospace to industrial automation. ROS 2 does not implement DDS itself but provides a standardized interface, the RMW (ROS Middleware Interface), that abstracts over different DDS vendor implementations like eProsima Fast DDS, RTI Connext, or Eclipse Cyclone DDS. This abstraction gives developers choice and allows ROS 2 to leverage the robust networking features already proven in other mission-critical domains.

One of the most significant advantages DDS brings is its rich Quality of Service (QoS) policies. QoS settings allow you to fine-tune the communication characteristics for each topic. You can specify whether a topic should be reliable (guaranteeing delivery of every message) or best-effort (prioritizing low latency over completeness). You can set deadlines for message arrival, define the durability of messages for late-joining subscribers, and control the lifespan of data. Choosing the right QoS profile is not an academic exercise; it directly impacts system behavior, latency, and resource usage, and mismatched QoS settings are a frequent source of debugging headaches.

Discovery is the process by which nodes find each other and establish communication. In ROS 2, this is handled automatically by the DDS layer. Nodes announce their presence, the topics they publish or subscribe to, and their QoS settings. This discovery can occur on a single machine via shared memory for blazing-fast communication, or across a network using multicast or unicast UDP. The decentralized nature means there is no single point of failure, but it also means network configuration, firewalls, and domain ID settings become critical considerations for distributed systems.

The build system for ROS 2 is ament, a set of tools built on top of CMake for C++ and setuptools for Python. Ament introduces the concept of a package, the fundamental unit of software organization in ROS 2. A package contains nodes, libraries, configuration files, launch files, and tests. Packages declare their dependencies on other packages, and ament ensures everything is built in the correct order. This structured approach to dependency management is vital for creating maintainable systems where components can be reused and versioned independently.

Closely related to packaging is the concept of the workspace. A ROS 2 workspace is a directory containing a set of packages that you are developing or using together. The colcon build tool is the standard command-line utility for compiling all packages in a workspace. It understands the dependency graph between packages and orchestrates the build process efficiently. Managing workspaces correctly—understanding overlay and underlay concepts—is key to working with multiple projects or different versions of libraries without conflicts.

ROS 2 is not just a communication framework; it comes with a rich ecosystem of built-in tools and client libraries. The core client libraries are rclcpp for C++ and rclpy for Python, which provide the API for writing nodes, creating publishers and subscribers, and interacting with the ROS 2 graph. Alongside these are essential command-line tools: ros2 node for inspecting active nodes, ros2 topic for echoing and publishing messages, ros2 service for calling services, and ros2 launch for starting complex systems from configuration files. These tools are your first line of defense for debugging and introspection.

The launch system deserves special mention. While a single node can be started manually, a real robot system might involve dozens of nodes, parameter files, and remapping rules. ROS 2 launch files, written in Python, allow you to describe the entire system startup procedure declaratively. You can specify which nodes to run, set their parameters, define namespace rules, and even orchestrate conditional startup sequences. Mastering launch files is essential for moving beyond toy examples and managing the complexity of real deployments.

Another critical concept inherited from ROS 1 but significantly enhanced is tf2, the transform library. Robots are assemblies of moving parts, and tf2 provides a standardized way to keep track of the spatial relationships between coordinate frames over time. It maintains a dynamic, time-buffered tree of transforms—like "where is the robot's base_link relative to the world frame, and where is the camera relative to the base_link?" This library is indispensable for sensor fusion, navigation, manipulation, and any task that requires understanding where things are in space.

It's important to recognize that ROS 2 is a project with a defined release cadence, offering both rolling distributions for bleeding-edge development and Long-Term Support (LTS) distributions like Humble Hawksbill and Iron Irwini for stable deployment. Choosing a distribution is a strategic decision that balances access to new features against the need for stability and community support. The ecosystem also includes a vast repository of community-contributed packages, from driver libraries for specific hardware to complete navigation and manipulation stacks, which can dramatically accelerate development.

The philosophy underpinning ROS 2 is composability. The framework encourages you to break down your robot's software into small, cohesive nodes with clear responsibilities. This modularity facilitates testing, parallel development, and code reuse. However, it also introduces architectural questions: how many nodes should a single process contain? How do you manage their lifecycle? These topics, involving node composition and managed nodes, are advanced patterns we will explore later, but it's useful to know they exist as tools for optimizing performance and resource usage.

From an engineering perspective, adopting ROS 2 is about embracing a set of conventions and interfaces that promote separation of concerns. Your perception team doesn't need to know the intricacies of your motor controller, only the interface (the topic, service, or action) it exposes. This contractual approach to software design reduces coupling and makes the system easier to reason about, test, and evolve. It shifts the challenge from "how do I make these components talk?" to "what information do they need to exchange, and with what guarantees?"

As we proceed through this book, we will dive deep into each of these concepts. We will see how to structure packages for longevity, design robust message interfaces, tune QoS for different data types, and choose the right DDS implementation for our network. We will explore how to write real-time-safe code, integrate hardware through ros2_control, and build comprehensive test suites. Each chapter builds upon this foundational understanding, applying these core concepts to solve the practical engineering challenges of building robots that are not just functional, but reliable, scalable, and maintainable. The ecosystem is your toolbox; learning to use it effectively is the craft of robotics software engineering.

This is a sample preview. The complete book contains 27 sections.

Table of Contents

Robotics Software Engineering with ROS 2

Table of Contents

Introduction

CHAPTER ONE: The ROS 2 Ecosystem and Core Concepts