C++17 并行算法

C++17并行算法是C++标准库在2017年标准中引入的重要特性，它通过扩展现有的STL算法，允许开发者利用多核处理器的并行计算能力提升程序性能。本指南将详细介绍其核心概念、使用方法和实际应用场景。

概述[编辑 | 编辑源代码]

C++17在`<algorithm>`和`<numeric>`头文件中为69个标准算法添加了并行执行策略支持。这些策略通过`std::execution`命名空间指定，使得算法能自动利用硬件并行性，无需手动管理线程。

核心特性[编辑 | 编辑源代码]

执行策略：控制算法的并行行为
透明优化：保持原有接口不变，仅需添加策略参数
异常安全：并行执行时异常传播机制

执行策略[编辑 | 编辑源代码]

C++17定义了三种标准执行策略：

策略	说明	适用场景
`std::execution::seq`	顺序执行（默认）	需要严格顺序或调试
`std::execution::par`	并行执行	可并行且无数据竞争的操作
`std::execution::par_unseq`	并行+向量化	允许指令级并行的无锁操作

代码示例[编辑 | 编辑源代码]

基础用法[编辑 | 编辑源代码]

#include <vector>
#include <algorithm>
#include <execution>

int main() {
    std::vector<int> data = {5, 3, 1, 4, 2};
    
    // 顺序排序
    std::sort(std::execution::seq, data.begin(), data.end());
    
    // 并行排序
    std::sort(std::execution::par, data.begin(), data.end());
    
    // 并行+向量化查找
    auto result = std::find(std::execution::par_unseq, 
                          data.begin(), data.end(), 3);
}

性能对比示例[编辑 | 编辑源代码]

比较并行与串行版本的std::transform：

#include <vector>
#include <algorithm>
#include <execution>
#include <chrono>

void test_performance() {
    std::vector<double> vec(10'000'000, 0.5);
    
    auto start = std::chrono::high_resolution_clock::now();
    std::transform(std::execution::seq, vec.begin(), vec.end(), 
                  vec.begin(), [](double v) { return std::sqrt(v); });
    auto seq_time = std::chrono::high_resolution_clock::now() - start;
    
    start = std::chrono::high_resolution_clock::now();
    std::transform(std::execution::par, vec.begin(), vec.end(),
                  vec.begin(), [](double v) { return std::sqrt(v); });
    auto par_time = std::chrono::high_resolution_clock::now() - start;
    
    // 典型输出（4核CPU）:
    // Sequential: 120ms | Parallel: 35ms
}

并行算法列表[编辑 | 编辑源代码]

C++17中支持并行化的主要算法包括：

排序算法：sort, stable_sort
数值运算：reduce, transform_reduce
查找算法：find, count
修改序列：fill, generate
遍历操作：for_each

实际应用案例[编辑 | 编辑源代码]

图像处理[编辑 | 编辑源代码]

并行应用高斯模糊滤波器：

void apply_blur(std::vector<Pixel>& image, int width, int height) {
    std::vector<Pixel> temp(image.size());
    
    // 并行处理每一行
    std::for_each(std::execution::par, 
        counting_iterator(0), counting_iterator(height),
        [&](int y) {
            for (int x = 0; x < width; ++x) {
                temp[y*width + x] = calculate_blur(image, x, y, width);
            }
        });
    
    image = std::move(temp);
}

科学计算[编辑 | 编辑源代码]

并行计算矩阵乘法：

Matrix parallel_matrix_multiply(const Matrix& a, const Matrix& b) {
    Matrix result(a.rows(), b.cols());
    
    std::for_each(std::execution::par,
        counting_iterator(0), counting_iterator(a.rows()),
        [&](int i) {
            for (int j = 0; j < b.cols(); ++j) {
                double sum = 0.0;
                for (int k = 0; k < a.cols(); ++k) {
                    sum += a(i,k) * b(k,j);
                }
                result(i,j) = sum;
            }
        });
    
    return result;
}

注意事项[编辑 | 编辑源代码]

数据竞争[编辑 | 编辑源代码]

使用并行算法时必须确保操作是线程安全的。以下代码存在数据竞争：

std::vector<int> data = {1, 2, 3};
int sum = 0;
// 错误！sum被多个线程同时修改
std::for_each(std::execution::par, data.begin(), data.end(), 
             [&](int x) { sum += x; });

正确做法是使用std::reduce：

int safe_sum = std::reduce(std::execution::par, 
                          data.begin(), data.end());

性能考量[编辑 | 编辑源代码]

并行化带来的开销包括：

任务划分开销
线程创建/同步成本
缓存一致性维护

经验法则：

数据量 >10,000元素时考虑并行
避免在小循环中使用并行策略

进阶主题[编辑 | 编辑源代码]

自定义执行策略[编辑 | 编辑源代码]

可以通过实现execution::execution_policy接口创建自定义策略：

并行算法复杂度[编辑 | 编辑源代码]

对于长度为 $N$ 的序列和 $P$ 个处理器：

时间复杂度： $O (N / P)$ 理想情况
空间复杂度：可能需额外 $O (P)$ 存储

参见[编辑 | 编辑源代码]

C++17标准文档（ISO/IEC 14882:2017）
并行模式库（PPL）
OpenMP多线程编程

模板:C++特性导航