Parallel Programming Patterns: Multicore and GPU Solutions
MTA
Techniques and patterns for exploiting parallelism on CPUs and GPUs to scale performance
2nd Edition
Modern parallel programming has transitioned from a niche optimization to a fundamental requirement for scaling performance on multicore CPUs and manycore GPUs. This book provides a pattern-driven framework for navigating this landscape, moving from basic concurrency models to advanced hardware-specific optimizations. It begins by establishing a foundation in parallel thinking, emphasizing work decomposition and the management of shared memory. By distinguishing between task-based, data-parallel, and pipeline models, the text equips developers to select the right abstraction for a given workload while navigating the complexities of cache coherence, memory ordering, and synchronization primitives like locks, atomics, and barriers.
The book places a significant focus on GPU architecture and its Single Instruction, Multiple Threads (SIMT) model, detailing how to leverage thousands of threads through frameworks like CUDA, HIP, and OpenCL. It demystifies hardware-specific concepts such as warps, occupancy, and coalesced memory access, while also introducing cross-platform layers like SYCL and Kokkos for performance portability. Practical data-parallel patterns—specifically Map, Reduce, Scan, and Stencil—are explored in depth, alongside strategies for handling irregular workloads like graph analytics and sparse matrix operations. This comprehensive technical coverage ensures that developers can translate massive hardware parallelism into real-world throughput.
Performance is presented as a discipline of measurement rather than intuition. The text guides readers through profiling techniques, hardware counters, and the "roofline model" to identify whether a program is compute-bound or memory-bound. Detailed chapters on memory optimization address locality, NUMA effects, and the critical need to overlap computation with data movement through streams and pipelining. By integrating these strategies with rigorous debugging for races, deadlocks, and "Heisenbugs," the book moves beyond mere execution speed to address the reliability and determinism required for production-grade software.
Finally, the book adopts a holistic view of software engineering by connecting parallel patterns to broader goals of portability, maintainability, and energy efficiency. Through scalability case studies ranging from desktop applications to distributed clusters, it illustrates how hierarchical parallelism (combining MPI, OpenMP, and GPU kernels) is used to solve large-scale problems. The ultimate goal is to cultivate a disciplined approach to parallel design, enabling developers to build software that is not only fast but also resilient and adaptable to the evolving landscape of heterogeneous hardware.
This book is for software developers, systems programmers, data scientists, and application developers who need to optimize code performance by leveraging multicore CPUs and GPUs. It is particularly beneficial for those looking to understand the fundamental principles, common patterns, and practical techniques required to write correct, scalable, and maintainable parallel programs in heterogeneous computing environments.
MixCache.com
View booksJanuary 14, 2026
68,253 words
4 hours 47 minutes
Get unlimited access to this book + all MixCache.com books for $11.99/month
Subscribe to MTAOr purchase this book individually below
$6.99 USD
Click to buy this ebook:
Buy NowFull ebook will be available immediately
- read online or download as a PDF file.
Full ebook will be available immediately
- read online or download as a PDF file.
$5 account credit for all new MixCache.com accounts!
Have a question about the content? Ask our AI assistant!
Start by asking a question about "Parallel Programming Patterns: Multicore and GPU Solutions"
Example: "Does this book mention William Shakespeare?"
Thinking...