- Introduction
- Chapter 1 OpenClaw Setup, Workbench, and Hello Agent
- Chapter 2 Prompting Patterns for Chat Agents
- Chapter 3 Tool Use and Action Design
- Chapter 4 Memory, State, and Context Windows
- Chapter 5 Retrieval-Augmented Agents with Local Docs
- Chapter 6 Evaluation Basics: Unit Tests for Agents
- Chapter 7 Multi‑Turn Dialogue and Persona Conditioning
- Chapter 8 Structured Outputs and Schema Enforcement
- Chapter 9 Deterministic Workflows with Programmatic Control
- Chapter 10 Sensor Streams: Ingest, Parse, and Simulate
- Chapter 11 Reactive Controllers for Robotics and IoT
- Chapter 12 Planning, Task Decomposition, and Executors
- Chapter 13 Multi‑Agent Collaboration and Roles
- Chapter 14 Vision‑Enabled Agents for Images and Video
- Chapter 15 Speech Interfaces: ASR, TTS, and Voice Agents
- Chapter 16 Reinforcement Signals and Online Feedback Loops
- Chapter 17 Safety, Guardrails, and Red‑Team Scenarios
- Chapter 18 Privacy, Security, and Data Governance
- Chapter 19 Federated Learning for Edge‑Deployed Agents
- Chapter 20 Continual Learning, Bandits, and A/B Loops
- Chapter 21 Benchmarking and Metrics at Scale
- Chapter 22 Observability: Logging, Tracing, and Telemetry
- Chapter 23 Packaging and Deployment to Cloud and Edge
- Chapter 24 CI/CD and Automation for Agent Pipelines
- Chapter 25 Production Playbooks, SRE, and Incident Response
Hands-On Lab Manual: 50 OpenClaw Agent Projects
Table of Contents
Introduction
This manual was built for practitioners who learn best by doing. Inside, you will find 50 concise, reproducible projects that take you from first principles to production systems using OpenClaw. Each project focuses on a specific competency—chat agents, sensor‑driven controllers, federated learners, or end‑to‑end pipelines—so you can acquire skills in focused, testable increments. Whether you are an individual learner or an instructor planning a hands‑on course, the structure is intentionally modular: every chapter presents two projects that can be completed independently or sequenced into a coherent pathway.
We prioritize clarity and reproducibility. Every project supplies a short brief with learning objectives, a code sketch to get you unblocked quickly, and evaluation criteria so you know when you are “done.” The goal is not to hand you polished, opaque solutions; it is to give you runnable scaffolds that invite experimentation. You will see patterns repeat—prompt design, tool wiring, state handling, telemetry, and tests—so that the muscle memory of shipping agents becomes second nature.
Coverage follows a deliberate gradient. Early chapters center on chat agents and core mechanics: prompts, tools, memory, and structured outputs. Midway, we shift to perception and control, integrating sensors, vision, and audio while introducing planning, multi‑agent collaboration, and reactive loops. In the latter chapters, we emphasize real‑world rigor: safety and privacy, federated and continual learning, evaluation at scale, and the operational disciplines—observability, deployment, and incident response—that turn prototypes into reliable services.
You can approach the book linearly or à la carte. If you are new to OpenClaw, begin with setup and the “Hello Agent,” then progress through prompting, tools, and memory before tackling retrieval‑augmented tasks. If you are building embodied or IoT systems, jump ahead to sensor streams and reactive controllers. If your mandate is reliability and scale, prioritize chapters on benchmarking, telemetry, CI/CD, and SRE playbooks. Each project lists prerequisites and estimated effort so you can plan your learning sprints.
A few conventions keep the labs consistent. Code sketches favor readability over cleverness and highlight only the essential glue; you are encouraged to swap components to fit your environment. Evaluation rubrics include both functional checks (did the agent meet its objectives?) and operational checks (is it observable, testable, and safe?). Where trade‑offs exist—latency versus accuracy, autonomy versus control—we call them out explicitly and invite you to measure the impact with small, targeted experiments.
Finally, this is a manual for building judgment as much as code. Agents interact with people, data, and devices in ways that carry consequences. Throughout the projects you will practice setting boundaries, auditing outcomes, and documenting assumptions. By the end, you will not only have a portfolio of 50 working OpenClaw projects, but also a practical framework for deciding what to build, how to evaluate it, and how to run it responsibly in the real world.
Chapter One: OpenClaw Setup, Workbench, and Hello Agent
Welcome to the ground floor of OpenClaw! This chapter is your initiation into the practical world of building intelligent agents. We'll start with the absolute essentials: getting OpenClaw up and running on your machine, familiarizing ourselves with the development environment, and then, with a celebratory flourish, crafting our very first "Hello Agent." Think of this as laying the foundation before we start building skyscrapers of agentic intelligence. You wouldn't try to bake a cake without preheating the oven, would you? (Unless you're into raw dough, which, no judgment, but it's not what we're aiming for here.)
Our journey begins with setting up the OpenClaw development environment. While the exact steps might vary slightly depending on your operating system—whether you're a Windows warrior, a macOS maestro, or a Linux loyalist—the core principles remain consistent. We'll aim for a setup that is robust, easily reproducible, and ready to tackle the projects that lie ahead. Forget the days of arcane dependency management and cryptic error messages; our goal is a smooth, friction-free installation that gets you coding agents faster than you can say "artificial intelligence."
The primary tool in our arsenal will be the OpenClaw SDK. This comprehensive kit provides all the necessary libraries, tools, and utilities to develop, test, and deploy OpenClaw agents. We’ll walk through the installation process step-by-step, ensuring every component is correctly configured. This often involves downloading the SDK installer, running it, and then verifying the installation through a simple command-line check. It’s not rocket science, but paying attention to the details here will save you headaches down the line. We’re aiming for a pristine environment, free from the digital dust bunnies that can plague a developer's workspace.
Next, we'll configure our Python environment. OpenClaw agents are predominantly built using Python, so a well-managed Python installation is crucial. We'll recommend using a virtual environment to isolate our project dependencies. If you're new to virtual environments, think of them as sterile, self-contained bubbles for your Python projects. They prevent dependency conflicts between different projects and keep your global Python installation squeaky clean. This is a best practice that will serve you well, not just with OpenClaw, but with any Python development endeavor. We’ll show you how to create, activate, and manage these environments, making dependency management a breeze rather than a wrestling match.
Once Python is sorted, we'll install the OpenClaw Python package. This is typically a straightforward pip install command, but we'll also cover any specific requirements or optional components that might enhance your development experience. Sometimes, additional packages are needed for specific functionalities, like interacting with certain hardware or cloud services. We’ll make sure you have the core components ready to roll, and then highlight how you can extend your setup as your projects become more ambitious. No surprises, just clear instructions to get you from zero to agent-ready.
With the core software in place, we'll turn our attention to the OpenClaw Workbench. This is your primary integrated development environment (IDE) for building and experimenting with OpenClaw agents. The Workbench provides a rich set of features, including code editing, debugging tools, agent simulation capabilities, and visualization dashboards. It's where you'll spend most of your time crafting agent logic, observing their behavior, and refining their intelligence. We’ll explore its interface, highlighting key areas and functionalities that will become your daily companions. Think of it as your agent control center, a place where you command and observe your digital creations.
Navigating the Workbench interface will be our next order of business. We’ll show you how to create new agent projects, open existing ones, and manage your project files. The Workbench typically organizes projects into a hierarchical structure, making it easy to keep your code, configurations, and data neatly arranged. We’ll also delve into the code editor itself, demonstrating features like syntax highlighting, autocompletion, and integrated documentation, all designed to boost your productivity. A well-configured editor is like a well-oiled machine; it simply makes everything run smoother.
Debugging is an inevitable part of software development, and agent development is no exception. The OpenClaw Workbench comes equipped with powerful debugging tools that allow you to step through your agent's code, inspect variables, and set breakpoints. We'll demonstrate how to use these tools effectively to identify and resolve issues in your agent's logic. Understanding how your agent thinks (or doesn't think, in the case of a bug) is crucial for building robust and reliable systems. Consider the debugger your magnifying glass, helping you pinpoint those elusive errors hiding in plain sight.
Beyond just coding, the Workbench also offers agent simulation capabilities. This allows you to test your agents in a controlled environment before deploying them to real-world scenarios. You can define various input conditions, observe your agent's responses, and analyze its decision-making process. This iterative simulation and testing cycle is fundamental to developing effective agents. It's like a sandbox where your agents can play and learn without causing any real-world mischief. We’ll run through a basic simulation to illustrate how it all works.
Visualizations are another powerful feature of the OpenClaw Workbench. As your agents become more complex, understanding their internal state and decision flow can be challenging. The Workbench often provides dashboards and visualizers that display key agent metrics, sensor readings, and interaction logs in an intuitive format. These visual aids can be invaluable for gaining insights into your agent's behavior and identifying areas for improvement. A picture is worth a thousand lines of log files, especially when you're trying to debug an inscrutable agent.
Now, for the main event: our "Hello Agent." Just as "Hello World" is the traditional first program in many programming languages, "Hello Agent" will be your inaugural OpenClaw creation. This project, while simple, will solidify your understanding of the basic agent structure, how to define an agent's intent, and how to get it to respond. We'll start with a minimalist agent that simply echoes back a greeting. This might seem trivial, but it establishes the fundamental communication loop that underpins all OpenClaw agents.
The core of our "Hello Agent" will involve defining a simple agent class. This class will inherit from a base OpenClaw agent class, providing it with the fundamental capabilities of an agent. Within this class, we'll implement a method that processes incoming messages or prompts. For our "Hello Agent," this method will be incredibly straightforward: it will take a user's input and construct a polite, pre-defined response. This is your first foray into giving an agent a voice, however small that voice may be.
Our code sketch for "Hello Agent" will be deliberately concise. We'll focus on the essential components: importing the necessary OpenClaw modules, defining our agent class, and implementing the response logic. You'll see how easy it is to instantiate an agent and interact with it. We'll also demonstrate how to run this agent within the Workbench, sending it a test message and observing its output. This immediate feedback loop is crucial for understanding how your agent behaves in practice.
import openclaw as oc
class HelloAgent(oc.Agent):
def __init__(self, name="Greeter"):
super().__init__(name=name)
def process_message(self, message: str) -> str:
if "hello" in message.lower() or "hi" in message.lower():
return "Hello there! How can I help you today?"
else:
return "I'm a simple greeter agent. Try saying hello!"
After writing our "Hello Agent," we'll discuss evaluation criteria. For this initial project, evaluation is straightforward: does the agent respond appropriately to greetings? Does it handle other inputs gracefully? While seemingly basic, this exercise reinforces the concept of testing your agent's behavior. As we progress through more complex projects, our evaluation criteria will become more sophisticated, but the principle of verifying an agent's performance remains constant. You wouldn't launch a rocket without checking its trajectory, right?
Beyond just a functional check, we'll also touch upon basic operational considerations. For our "Hello Agent," this might involve ensuring the agent starts up without errors and that its responses are consistent. As we move into later chapters, these operational checks will expand to include aspects like logging, error handling, and resource utilization. The goal is not just to build agents that work, but agents that work reliably and predictably.
Troubleshooting is an inevitable part of any development process. We'll briefly discuss common issues you might encounter during setup or with your first "Hello Agent," such as missing dependencies, incorrect path configurations, or simple typos in your code. We'll provide tips on how to diagnose these problems and where to look for solutions, often pointing to OpenClaw's official documentation or community forums. Remember, every developer faces bugs; the mark of a good developer is how effectively they overcome them.
By the end of this chapter, you'll have a fully functional OpenClaw development environment, a solid understanding of the Workbench, and your very first agent up and running. This foundational knowledge will be your springboard for the more intricate and fascinating projects that lie ahead. You've taken the first crucial step into the world of agent development, and trust us, it only gets more interesting from here. Consider your mission, should you choose to accept it, successfully initiated. Now, let's get ready to build some truly intelligent systems.
Chapter Two: Prompting Patterns for Chat Agents
Having successfully ushered our "Hello Agent" into existence, we now stand at the threshold of a much more fascinating realm: enabling our agents to engage in meaningful conversations. This, my friends, is the art and science of prompting. Think of prompts as the whispered instructions you give to a highly intelligent but somewhat naive genie. The genie wants to grant your wish, but the clarity and precision of your request determine whether you get a palace or a pile of pickles. In this chapter, we'll transform our agents from simple greeters into conversationalists, focusing on the fundamental techniques that unlock their linguistic prowess.
Our journey into prompting begins with understanding the very nature of a "chat agent." Unlike our basic "Hello Agent," which had a single, hardcoded response, a true chat agent needs to understand intent, extract information, and generate relevant, coherent replies. This is where the magic of Large Language Models (LLMs) comes into play. OpenClaw agents leverage these powerful models as their linguistic brains, but these brains, while brilliant, still need careful guidance to perform optimally. It's like having a master chef at your disposal; you wouldn't just say "cook something," would you? You'd provide ingredients, desired flavors, and perhaps even a recipe.
At its core, prompting is about designing the input that you feed to an LLM to elicit a desired output. It's a delicate dance of instruction and context. The quality of your prompt directly correlates with the quality of your agent's response. A vague prompt will lead to a vague answer, a biased prompt to a biased answer, and a perfectly crafted prompt to something truly remarkable. We'll explore various strategies to make our prompts effective, ensuring our OpenClaw agents don't just talk, but communicate with purpose and precision.
One of the foundational concepts in prompting is the idea of "system messages" or "persona conditioning." Before an agent even processes a user's query, we can provide it with an initial set of instructions that define its role, its tone, and its overall behavioral guidelines. Imagine giving your agent a job description and a style guide. This system message sets the stage for all subsequent interactions, ensuring consistency and guiding the agent's responses within desired parameters. For instance, we could instruct an agent to always be helpful and polite, or perhaps to adopt the persona of a sardonic tea connoisseur. The possibilities, as you might guess, are endless and often quite amusing.
Let's dive into a practical application of system messages. We'll start by modifying our basic agent to include a system message that establishes a helpful assistant persona. This simple addition will dramatically change the agent's conversational style, moving it beyond mere pattern matching. Instead of just reacting to keywords, it will begin to understand its role and respond accordingly. This is a crucial step in moving from rudimentary agents to those capable of genuine interaction.
import openclaw as oc
class HelpfulChatAgent(oc.Agent):
def __init__(self, name="HelpfulAssistant"):
super().__init__(name=name)
self.system_message = "You are a helpful and polite assistant. Always try to provide clear and concise answers."
def process_message(self, user_message: str) -> str:
# For now, we'll concatenate the system message with the user message
# In later chapters, we'll see more sophisticated ways to manage context.
prompt = f"{self.system_message}\nUser: {user_message}\nAssistant:"
# In a real OpenClaw agent, this would involve sending the prompt to an LLM
# For this exercise, we'll simulate a simple LLM response.
if "hello" in user_message.lower() or "hi" in user_message.lower():
return "Hello! I am a helpful assistant. How can I assist you today?"
elif "time" in user_message.lower():
return "I don't have access to real-time information, but I can tell you it's always a good time to learn about OpenClaw!"
else:
return "That's an interesting question! Could you please rephrase it, or tell me more about what you're trying to achieve?"
As you can see, even with a simulated LLM, the introduction of the system_message field in our HelpfulChatAgent immediately frames the agent's expected behavior. When run within the OpenClaw Workbench, you’ll observe that the agent's responses, while still rudimentary in this example, attempt to align with the "helpful and polite" directive. This is a powerful concept: guiding the agent's entire conversational approach from the outset.
Another critical prompting pattern is "few-shot prompting." LLMs are incredibly adept at learning from examples. Instead of just giving a general instruction, we can provide a few input-output pairs that demonstrate the desired behavior. Think of it as showing, not just telling. If we want our agent to summarize text in a particular style, we can provide a couple of examples of input text and their corresponding summaries in the desired style. The LLM then uses these examples to infer the underlying pattern and apply it to new, unseen inputs. It's like giving a student a few solved problems before asking them to tackle new ones on their own.
Let's craft an agent that uses few-shot prompting to categorize customer feedback. We'll provide a few examples of feedback and their corresponding categories (e.g., "Bug Report," "Feature Request," "General Inquiry"). This will enable our agent to classify new feedback with greater accuracy and consistency, even without explicit rules for each category. It's an elegant way to teach an LLM a specific task without having to write complex conditional logic.
import openclaw as oc
class FeedbackClassifierAgent(oc.Agent):
def __init__(self, name="FeedbackClassifier"):
super().__init__(name=name)
self.system_message = "You are a customer feedback classifier. Categorize the user's feedback into one of the following: 'Bug Report', 'Feature Request', 'General Inquiry'."
self.examples = [
{"input": "The app crashes every time I open the camera.", "output": "Bug Report"},
{"input": "I would love to see a dark mode option in the settings.", "output": "Feature Request"},
{"input": "How do I reset my password?", "output": "General Inquiry"},
{"input": "The login screen freezes sometimes.", "output": "Bug Report"},
{"input": "Can you add support for more payment methods?", "output": "Feature Request"}
]
def process_message(self, user_feedback: str) -> str:
prompt_parts = [self.system_message]
for example in self.examples:
prompt_parts.append(f"User Feedback: {example['input']}\nCategory: {example['output']}")
prompt_parts.append(f"User Feedback: {user_feedback}\nCategory:")
prompt = "\n\n".join(prompt_parts)
# In a real OpenClaw agent, this would involve sending the prompt to an LLM
# For this exercise, we'll simulate a simple LLM response based on keywords.
user_feedback_lower = user_feedback.lower()
if "crash" in user_feedback_lower or "bug" in user_feedback_lower or "freeze" in user_feedback_lower:
return "Bug Report"
elif "feature" in user_feedback_lower or "add" in user_feedback_lower or "option" in user_feedback_lower:
return "Feature Request"
else:
return "General Inquiry"
In the FeedbackClassifierAgent, we’re explicitly constructing a prompt that includes our system message followed by several user_feedback and Category pairs. This effectively trains the simulated LLM (in a real scenario, the actual LLM) on the desired classification task. When you run this in the Workbench, observe how the agent's simulated response attempts to classify based on the patterns presented in the examples, even for new, unseen feedback. This "learning by example" is a cornerstone of effective LLM interaction.
Beyond system messages and few-shot examples, another potent prompting technique is "chain-of-thought" (CoT) prompting. This involves guiding the LLM to articulate its reasoning process before providing the final answer. Instead of just asking for the solution, you ask the LLM to "think step by step." This often leads to more accurate and reliable responses, especially for complex tasks that require multiple logical steps. It's like asking a student to show their work on a math problem; it helps uncover any errors in their reasoning and ensures a more robust solution.
Consider a scenario where our agent needs to perform a multi-step calculation or analyze a short passage of text to answer a question. By asking the agent to break down its thought process, we not only improve its performance but also gain valuable insights into its internal workings. This transparency can be incredibly helpful for debugging and refining agent behavior. It's like peeking behind the curtain to see the wizard at work, rather than just accepting the magic at face value.
Let's build an agent that uses chain-of-thought to answer simple arithmetic problems. Instead of just giving the answer, it will show the steps it took to arrive at that answer. This demonstrates how CoT prompting can make an agent's reasoning more explicit and verifiable.
import openclaw as oc
class CoTMathAgent(oc.Agent):
def __init__(self, name="CoTMathSolver"):
super().__init__(name=name)
self.system_message = "You are a math solver agent. For any arithmetic problem, first, explain your step-by-step reasoning, and then provide the final answer."
def process_message(self, user_query: str) -> str:
# In a real OpenClaw agent, this would involve sending the prompt to an LLM
# For this exercise, we'll simulate a step-by-step reasoning process.
user_query_lower = user_query.lower()
if "add" in user_query_lower and "and" in user_query_lower:
try:
parts = user_query_lower.split("add ").split(" and ")
num1 = int(parts.strip())
num2 = int(parts.strip())
steps = (
f"Step 1: Identify the numbers to be added: {num1} and {num2}.\n"
f"Step 2: Perform the addition operation: {num1} + {num2}.\n"
f"Step 3: Calculate the sum."
)
answer = num1 + num2
return f"{steps}\nFinal Answer: {answer}"
except (ValueError, IndexError):
return "I'm sorry, I couldn't parse that addition problem. Please make sure to specify two numbers to add."
elif "subtract" in user_query_lower and "from" in user_query_lower:
try:
parts = user_query_lower.split("subtract ").split(" from ")
num1 = int(parts.strip())
num2 = int(parts.strip())
steps = (
f"Step 1: Identify the numbers: subtract {num1} from {num2}.\n"
f"Step 2: Perform the subtraction operation: {num2} - {num1}.\n"
f"Step 3: Calculate the difference."
)
answer = num2 - num1
return f"{steps}\nFinal Answer: {answer}"
except (ValueError, IndexError):
return "I'm sorry, I couldn't parse that subtraction problem. Please make sure to specify two numbers."
else:
return "I can help with simple addition and subtraction. Please try phrases like 'add 5 and 3' or 'subtract 2 from 7'."
In our CoTMathAgent, the system message explicitly instructs the agent to explain its reasoning. While our current simulation provides a hardcoded "step-by-step" output, a real LLM integration would generate this reasoning dynamically. This approach not only makes the agent's output more transparent but also often improves its accuracy on complex tasks by forcing it to organize its internal computations.
The nuances of prompting extend to controlling the "temperature" and "top-p" parameters of the underlying LLM. While we're simulating responses for clarity in these early projects, it's important to know these controls exist. Temperature, for instance, influences the randomness of the output. A high temperature makes the agent more creative and varied in its responses, while a low temperature makes it more deterministic and focused. Top-p, or nucleus sampling, is another way to control the diversity of the output by selecting from a smaller, more probable set of words. Understanding and experimenting with these parameters can fine-tune your agent's conversational style, making it more engaging or more factual, depending on your project's needs.
For our projects, we will often aim for a balance, allowing for some creativity while maintaining factual grounding. However, in applications where strict factual accuracy is paramount, such as a medical information agent, you would lean towards lower temperature settings. Conversely, for a creative writing assistant, you might crank up the temperature to encourage more imaginative outputs. The OpenClaw framework typically exposes these parameters through its LLM integration modules, allowing for granular control over the agent's "personality."
Another practical prompting pattern is "constrained generation." Sometimes, you don't just want a free-form answer; you want the agent to adhere to a specific format or output type. For example, you might want the agent to always respond with a JSON object, or to select an answer from a predefined list of options. This is crucial for integrating agents into larger systems where structured data is expected. We can achieve this by explicitly instructing the LLM about the desired output format within the prompt itself.
Imagine an agent designed to book appointments. We wouldn't want it to respond with a poetic description of the booking process; we'd want a structured confirmation, perhaps including the date, time, and service. By adding instructions like "Respond only with a JSON object containing 'date', 'time', and 'service' fields," we can guide the LLM to produce the exact output format we need for seamless integration with other software components.
Let’s illustrate constrained generation by creating an agent that extracts key information from a user's request and formats it as a simple, comma-separated string. This is a basic form of structured output that can be easily parsed by other programs.
import openclaw as oc
class InfoExtractorAgent(oc.Agent):
def __init__(self, name="InfoExtractor"):
super().__init__(name=name)
self.system_message = "You are an information extraction agent. Extract the user's name and their inquiry topic. Respond only with 'Name: [name], Topic: [topic]'."
def process_message(self, user_request: str) -> str:
# In a real OpenClaw agent, this would involve sending the prompt to an LLM
# For this exercise, we'll simulate extraction based on keywords.
user_request_lower = user_request.lower()
name = "Unknown"
topic = "General"
# Simple keyword-based name extraction (very basic for simulation)
if "my name is " in user_request_lower:
name_start_index = user_request_lower.find("my name is ") + len("my name is ")
name_end_index = user_request_lower.find(".", name_start_index)
if name_end_index == -1: # No period found, assume name goes to end
name_end_index = len(user_request_lower)
name = user_request[name_start_index:name_end_index].strip().title()
if "about an order" in user_request_lower:
topic = "Order Inquiry"
elif "technical issue" in user_request_lower or "problem with" in user_request_lower:
topic = "Technical Support"
elif "billing question" in user_request_lower or "invoice" in user_request_lower:
topic = "Billing"
return f"Name: {name}, Topic: {topic}"
The InfoExtractorAgent demonstrates how to explicitly instruct the agent on the desired output format using the system_message. Our simulated logic here directly constructs the string, but in a real OpenClaw setup with an LLM, the model would be encouraged to generate the output in the specified "Name: [name], Topic: [topic]" format, making it easy for downstream systems to consume.
It's also important to consider the "length" of your prompts. While LLMs have increasingly large context windows (the amount of text they can process at once), there's still a practical limit. Long, convoluted prompts can dilute the agent's focus or even exceed the model's capacity, leading to truncated or irrelevant responses. The key is to be concise and clear, providing just enough information for the agent to understand its task without overwhelming it. Think of it as writing a good executive summary: every word counts.
Furthermore, the order of information within your prompt can sometimes matter. Placing the most critical instructions or examples at the beginning of the prompt can help ensure the LLM prioritizes that information. It's a subtle but often effective way to guide the agent's attention, especially when dealing with complex or multi-faceted requests. Experimentation is your best friend here; sometimes a slight reordering can yield surprisingly better results.
Finally, a word on "negative prompting" or specifying what not to do. While generally less common than positive instructions, telling an LLM to avoid certain topics or response styles can be useful in specific scenarios. For example, you might instruct a customer service agent to "Never provide personal financial advice." This helps reinforce guardrails and ensures the agent operates within defined boundaries. It's like putting up a "Do Not Disturb" sign, but for conversational topics.
Throughout all these prompting patterns, the evaluation criteria remain paramount. For each agent we build, we need to ask: Does it meet the user's intent? Is the response accurate? Is it in the desired format? Is the tone appropriate? In the Workbench, you'll be able to manually test your agents with various prompts and observe their behavior. As we progress, we'll introduce more automated evaluation techniques, but for now, your keen eye and critical thinking are your most valuable tools.
By the end of this chapter, you won't just know how to type words into a prompt box; you'll understand the strategic thinking behind crafting effective prompts. You'll be able to shape your OpenClaw agents into articulate, purposeful conversationalists, ready to tackle a wide array of chat-based tasks. The journey from a simple "Hello Agent" to a sophisticated chat assistant begins with mastering these fundamental prompting patterns. So, let's keep those creative juices flowing and those prompts perfectly polished.
Chapter Three: Tool Use and Action Design
Our journey through the nascent stages of OpenClaw agent development has, so far, equipped us with the ability to set up our environment and converse with our creations. We’ve moved beyond mere “Hello World” to agents capable of understanding intent and responding with a semblance of purpose. But let’s be honest: talking is great, but acting is often better. Imagine a chef who can eloquently describe a gourmet meal but can't actually pick up a knife or turn on the stove. Such a chef, while charming, wouldn’t be very useful in a kitchen. This chapter is where our OpenClaw agents get their hands dirty, learning to interact with the world beyond their conversational boundaries by using tools and executing actions.
The concept of "tool use" for an AI agent is transformative. It's the bridge between understanding and doing. Up until now, our agents have been confined to generating text. While impressive, a purely conversational agent has limitations in a world that operates on data, APIs, and physical interactions. Tools empower agents to fetch real-time information, manipulate external systems, perform calculations, send emails, interact with databases, or even control robotic arms. Essentially, tools are the external functionalities that an agent can invoke to achieve its goals, extending its capabilities far beyond what a language model alone can accomplish.
Think of an OpenClaw agent as a highly intelligent executive assistant. They can understand your requests perfectly, but if they can't access your calendar, send an email, or look up a document, their utility remains constrained. Tools are precisely what grant that assistant the ability to perform those external tasks. In the OpenClaw framework, a tool is typically a function or an API endpoint that the agent can call. The agent learns when to use a specific tool, how to use it (what arguments to pass), and what to do with the results. This decision-making process is often guided by the underlying Large Language Model, which, through careful prompting and observation, understands the available tools and their purposes.
The design of effective tools is paramount. A poorly designed tool, much like a blunt knife, will hinder rather than help. Tools should be atomic, meaning they perform a single, well-defined operation. They should also be robust, handling expected inputs and gracefully managing errors. Clear documentation for each tool, describing its purpose, parameters, and expected output, is also crucial. This documentation is often provided to the LLM within the agent's prompt, allowing the agent to "read the manual" for its available functionalities. Without clear instructions, even the most brilliant agent might fumble with its toolkit.
Let's begin by introducing a simple tool: a "weather lookup" function. Our goal is to enable an agent to answer questions about the current weather in a given city. This requires the agent to do more than just generate a response; it needs to perform an external action – specifically, calling a function that simulates fetching weather data. This will be our first concrete step into the world of agents that can perform actions.
import openclaw as oc
def get_current_weather(city: str) -> str:
"""
Fetches the current weather conditions for a specified city.
Args:
city (str): The name of the city for which to get weather.
Returns:
str: A description of the current weather and temperature.
"""
city_lower = city.lower()
if "london" in city_lower:
return "London: Cloudy with a temperature of 10°C. Light drizzle."
elif "new york" in city_lower:
return "New York: Sunny with a temperature of 18°C. Mild breeze."
elif "tokyo" in city_lower:
return "Tokyo: Partly cloudy with a temperature of 22°C. Humid."
else:
return f"Weather data not available for {city} at this time."
class WeatherAgent(oc.Agent):
def __init__(self, name="WeatherReporter"):
super().__init__(name=name)
self.system_message = (
"You are a helpful weather reporting agent. "
"You can provide current weather information for various cities by using the 'get_current_weather' tool. "
"When a user asks about weather, call this tool and provide the result."
)
self.tools = {
"get_current_weather": get_current_weather
}
def process_message(self, user_query: str) -> str:
user_query_lower = user_query.lower()
# Simple intent detection to decide if the tool should be used
if "weather" in user_query_lower or "temperature" in user_query_lower:
city = None
# Very basic city extraction for simulation
if "london" in user_query_lower:
city = "London"
elif "new york" in user_query_lower:
city = "New York"
elif "tokyo" in user_query_lower:
city = "Tokyo"
if city:
# In a real OpenClaw agent, the LLM would decide to call the tool
# and extract arguments. Here we simulate that decision.
print(f"DEBUG: Agent decided to call get_current_weather for {city}")
weather_info = self.tools["get_current_weather"](city)
return f"Here is the weather for {city}: {weather_info}"
else:
return "I can report on weather for major cities like London, New York, or Tokyo. Which city are you interested in?"
else:
return "I am a weather agent. Ask me about the weather!"
In the WeatherAgent, we define a function get_current_weather that simulates an external API call. Crucially, we then register this function within the agent's self.tools dictionary. The system_message is updated to inform the agent about its new capability and the name of the tool. While our current process_message still relies on keyword detection to decide when to call the tool and what arguments to pass, in a real OpenClaw setup, the underlying LLM would dynamically make these decisions based on the user's query and the tool's description. The print(f"DEBUG: ...") line helps visualize the agent's internal "thought process" of deciding to use a tool.
The "action design" aspect comes into play with how we structure these tools. Each tool should have a clear name and a concise description of what it does, along with its parameters. This metadata is what the LLM uses to understand the tool's purpose and how to interact with it. OpenClaw typically provides decorators or specific interfaces to define these tools, making them discoverable and usable by the agent's reasoning engine. Without this structured definition, the agent wouldn't know what it has at its disposal, much like a handyman with a toolbox full of unlabeled gadgets.
Another fundamental aspect of tool use is the agent's ability to interpret the results. Once a tool is executed, its output needs to be fed back to the LLM. The LLM then integrates this information into its understanding of the conversation and formulates an appropriate response. This creates a powerful loop: the agent receives a query, decides to use a tool, executes the tool, receives the result, and then uses that result to generate a user-facing response. It's a continuous process of sensing, thinking, and acting, rather than just generating text in isolation.
Let’s enhance our understanding by building an agent that can perform a simple calculation using a "calculator" tool. This will demonstrate how agents can offload computational tasks, which LLMs are not inherently great at, to dedicated tools. While LLMs can often "simulate" calculations, relying on a robust external calculator tool ensures accuracy and reliability, especially for complex or sensitive numerical operations.
import openclaw as oc
def calculate(expression: str) -> str:
"""
Evaluates a simple arithmetic expression.
Args:
expression (str): The arithmetic expression to evaluate (e.g., "5 + 3", "10 * 2").
Returns:
str: The result of the calculation or an error message.
"""
try:
# Using eval() for demonstration, but in a production system,
# a safer expression parser or dedicated math library would be preferred.
result = eval(expression)
return str(result)
except Exception as e:
return f"Error evaluating expression: {e}"
class CalculatorAgent(oc.Agent):
def __init__(self, name="MathAgent"):
super().__init__(name=name)
self.system_message = (
"You are a helpful math agent. "
"You can perform arithmetic calculations using the 'calculate' tool. "
"When a user asks you to perform a calculation, use this tool and provide the result."
"Always respond with the steps and the final answer."
)
self.tools = {
"calculate": calculate
}
def process_message(self, user_query: str) -> str:
user_query_lower = user_query.lower()
# Simple intent detection and expression extraction
if any(op in user_query_lower for op in ["add", "subtract", "multiply", "divide", "+", "-", "*", "/"]):
try:
# This is a very basic attempt to extract an expression.
# A real agent would use more sophisticated parsing.
expression = user_query_lower.replace("add", "+").replace("plus", "+") \
.replace("subtract", "-").replace("minus", "-") \
.replace("multiply", "*").replace("times", "*") \
.replace("divide by", "/").replace("divided by", "/")
# Further simple parsing to isolate the actual math
# Example: "What is 5 + 3?" -> "5 + 3"
# This is highly simplified for demonstration.
parts = expression.split()
math_expression_parts = []
for part in parts:
if part.isdigit() or part in ["+", "-", "*", "/"]:
math_expression_parts.append(part)
if not math_expression_parts:
return "I couldn't find a valid mathematical expression in your request. Please try again."
math_expression = " ".join(math_expression_parts)
print(f"DEBUG: Agent decided to call calculate with expression: {math_expression}")
calculation_result = self.tools["calculate"](math_expression)
# Combine the LLM's "reasoning" (simulated) with the tool result
reasoning = (
"Okay, I understand you want to perform a calculation.\n"
f"I will use my calculator tool to evaluate the expression: {math_expression}.\n"
"Here's the result:"
)
return f"{reasoning}\nFinal Answer: {calculation_result}"
except Exception as e:
return f"I had trouble with that calculation. Error: {e}"
else:
return "I can help you with arithmetic. Try asking me to 'add 5 and 3' or 'multiply 10 by 2'."
The CalculatorAgent introduces another tool, calculate, which uses Python's eval() function to simulate arithmetic. We explicitly warn that eval() should be avoided in production for security reasons, emphasizing the difference between a pedagogical example and a real-world implementation. The system_message guides the agent to use this tool for calculations. Again, the process_message method contains simplified logic for intent detection and argument extraction. The output combines a simulated reasoning trace with the actual tool result, reinforcing the chain-of-thought pattern we discussed in the previous chapter, but now augmented with external action.
A crucial aspect of tool use is error handling. What happens if a tool fails? A robust agent should be able to detect tool failures, inform the user, and potentially suggest alternative approaches or retry the operation. This involves designing tools that return clear error messages and an agent's process_message or equivalent logic to gracefully handle these errors. Without proper error handling, an agent can become brittle and unresponsive when external systems misbehave, leading to a frustrating user experience. It's like having a helpful assistant who just stares blankly when the printer jams.
Tool orchestration is another advanced concept. An agent might need to use multiple tools in sequence, or even make decisions about which tool to use based on the output of a previous tool. For example, an agent might first use a "search" tool to find relevant information, then a "summarize" tool to condense the search results, and finally an "email" tool to send the summary. This sequential execution, guided by the LLM's reasoning, allows for highly complex and sophisticated agent behaviors. This begins to touch on the idea of "planning," which we will delve into in more detail in a later chapter, but it's important to recognize that tool use lays the groundwork.
Consider a scenario where an agent needs to book a meeting. This might involve:
- Checking availability using a
calendar_checktool. - Suggesting times based on availability, using the LLM's reasoning.
- Confirming with the user and then booking the slot using a
calendar_booktool. - Sending an invite using an
email_sendtool.
Each of these steps involves a distinct tool, and the agent must orchestrate their use based on the flow of the conversation and the outcomes of previous tool calls. This multi-tool interaction is where agent intelligence truly shines, transforming a simple chatbot into a functional workflow automation system.
When designing tools, it's also important to consider their granularity. Should a tool perform a very specific, small action, or a larger, more composite one? Generally, smaller, more atomic tools are preferable. They offer greater flexibility, allowing the LLM to compose complex workflows from simpler building blocks. A tool that "creates a user account" is better than a tool that "creates a user account and sends a welcome email and updates the CRM," as the latter combines too many disparate actions. The agent then has less control over each individual step.
Another important consideration is security and access control. If an agent has access to tools that can modify real-world systems (e.g., delete files, make financial transactions), robust security measures are absolutely essential. This includes authenticating the agent's access to external services, carefully scoping the permissions of each tool, and ensuring that the agent's decisions to use these tools are auditable and, where necessary, subject to human oversight. A powerful agent with unchecked access to critical systems can be a recipe for disaster.
For instance, an agent with a delete_file tool should ideally be restricted to a specific directory or require explicit user confirmation for sensitive operations. The OpenClaw framework often provides mechanisms for managing tool permissions and integrating with existing authorization systems, allowing developers to build agents that are both powerful and safe. Ignoring security considerations in agent design is akin to giving a child a loaded gun; it's irresponsible and potentially catastrophic.
Evaluation of agents that use tools becomes more complex than simply checking conversational fluency. We now need to verify:
- Correct tool selection: Did the agent choose the right tool for the job?
- Correct argument extraction: Did it pass the correct parameters to the tool?
- Successful tool execution: Did the tool perform its intended action without errors?
- Correct interpretation of results: Did the agent correctly understand the tool's output?
- Appropriate response generation: Did it formulate a user-friendly response based on the tool's output?
This expanded evaluation matrix requires more sophisticated testing strategies, which we will explore in detail in later chapters. For now, manual inspection in the Workbench, carefully observing the agent's tool calls and their results, will be our primary method. The debug output, like the print(f"DEBUG: ...") statements we included, becomes invaluable for tracing the agent's decision-making process.
By the end of this chapter, you'll have moved beyond purely conversational agents to those that can perform real-world actions. You'll understand the fundamental principles of tool design, how agents select and use these tools, and the critical importance of error handling and security. This ability to integrate external functionalities fundamentally expands the scope and utility of your OpenClaw agents, transforming them from mere talkers into capable doers. Now, let's equip our agents with even more practical abilities, ready to tackle the complexities of dynamic environments.
This is a sample preview. The complete book contains 30 sections.