Python并行处理实战之如何使用ProcessPoolExecutor加速计算

更新时间：2025年06月13日 17:24:12 作者：engchina

Python提供了多种并行处理的方式,其中concurrent.futures模块的ProcessPoolExecutor是一个非常强大且易于使用的工具,本文将通过一个实际示例,展示如何使用ProcessPoolExecutor进行并行处理,并详细解释代码的工作原理,感兴趣的朋友跟随小编一起看看吧

简介

在现代计算中，并行处理是提高程序性能的重要手段。Python提供了多种并行处理的方式，其中concurrent.futures模块的ProcessPoolExecutor是一个非常强大且易于使用的工具。本文将通过一个实际示例，展示如何使用ProcessPoolExecutor进行并行处理，并详细解释代码的工作原理。

完整代码示例

import time
import multiprocessing
from concurrent.futures import ProcessPoolExecutor, as_completed
from typing import List
def process_numbers(chunk: List[int], factor: int) -> str:
    """
    处理数字的函数，通过将它们乘以因子来模拟处理。
    这个函数接受一个数字列表和一个因子，计算列表中每个数字乘以因子的和，
    并返回结果字符串。
    """
    result = sum(x * factor for x in chunk)
    time.sleep(0.1)  # 使用睡眠模拟工作
    return f"处理的块和: {result}"
def main(numbers: List[int] = None, num_chunks: int = 10, factor: int = 2):
    """
    演示并行处理的主函数。
    这个函数负责设置日志记录、生成数字列表、确定最佳工作进程数量、
    将数字分成块，并使用ProcessPoolExecutor进行并行处理。
    """
    import logging
    logging.basicConfig(level=logging.INFO)
    _log = logging.getLogger(__name__)
    # 如果没有提供数字，则生成示例列表
    if numbers is None:
        numbers = list(range(1, 101))  # 生成1到100的数字
    total_numbers = len(numbers)
    _log.info(f"开始并行处理 {total_numbers} 个数字")
    cpu_count = multiprocessing.cpu_count()
    _log.info(f"检测到 {cpu_count} 个CPU核心")
    # 确定最佳工作进程数量
    optimal_workers = min(cpu_count, num_chunks)
    _log.info(f"使用 {optimal_workers} 个工作进程")
    # 计算块大小
    chunk_size = max(1, total_numbers // optimal_workers)
    _log.info(f"每个块包含 {chunk_size} 个数字")
    # 将数字分成块
    chunks = [numbers[i:i + chunk_size] for i in range(0, total_numbers, chunk_size)]
    _log.info(f"总共生成了 {len(chunks)} 个块")
    start_time = time.time()
    processed_count = 0
    # 使用ProcessPoolExecutor进行并行处理
    with ProcessPoolExecutor(max_workers=optimal_workers) as executor:
        _log.info("启动ProcessPoolExecutor")
        # 提交所有任务
        futures = [executor.submit(process_numbers, chunk, factor) for chunk in chunks]
        _log.info(f"提交了 {len(futures)} 个任务")
        # 等待完成并收集结果
        for future in as_completed(futures):
            try:
                result = future.result()
                processed_count += 1
                _log.info(f"{'#'*50}\n{result} ({processed_count}/{len(chunks)} 总计)\n{'#'*50}")
            except Exception as e:
                _log.error(f"处理块时出错: {str(e)}")
                raise
    elapsed_time = time.time() - start_time
    _log.info(f"并行处理完成，耗时 {elapsed_time:.2f} 秒。")
if __name__ == "__main__":
    # 使用数字列表的示例
    main()

代码解释

1. 导入必要的模块

import time
import multiprocessing
from concurrent.futures import ProcessPoolExecutor, as_completed
from typing import List

这些模块提供了我们需要的并行处理功能和类型提示。

2. 定义处理函数

def process_numbers(chunk: List[int], factor: int) -> str:
    """
    处理数字的函数，通过将它们乘以因子来模拟处理。
    这个函数接受一个数字列表和一个因子，计算列表中每个数字乘以因子的和，
    并返回结果字符串。
    """
    result = sum(x * factor for x in chunk)
    time.sleep(0.1)  # 使用睡眠模拟工作
    return f"处理的块和: {result}"

这个函数模拟了对数字列表的处理，通过将每个数字乘以一个因子并求和。time.sleep(0.1)用于模拟实际工作。

3. 主函数

def main(numbers: List[int] = None, num_chunks: int = 10, factor: int = 2):
    """
    演示并行处理的主函数。
    这个函数负责设置日志记录、生成数字列表、确定最佳工作进程数量、
    将数字分成块，并使用ProcessPoolExecutor进行并行处理。
    """
    import logging
    logging.basicConfig(level=logging.INFO)
    _log = logging.getLogger(__name__)

主函数负责设置日志记录、生成数字列表、确定最佳工作进程数量、将数字分成块，并使用ProcessPoolExecutor进行并行处理。

4. 生成数字列表

    # 如果没有提供数字，则生成示例列表
    if numbers is None:
        numbers = list(range(1, 101))  # 生成1到100的数字

如果没有提供数字列表，则生成1到100的数字列表。

5. 确定最佳工作进程数量

    cpu_count = multiprocessing.cpu_count()
    _log.info(f"检测到 {cpu_count} 个CPU核心")
    # 确定最佳工作进程数量
    optimal_workers = min(cpu_count, num_chunks)
    _log.info(f"使用 {optimal_workers} 个工作进程")

根据CPU核心数和用户指定的块数，确定最佳工作进程数量。

6. 将数字分成块

    # 计算块大小
    chunk_size = max(1, total_numbers // optimal_workers)
    _log.info(f"每个块包含 {chunk_size} 个数字")
    # 将数字分成块
    chunks = [numbers[i:i + chunk_size] for i in range(0, total_numbers, chunk_size)]
    _log.info(f"总共生成了 {len(chunks)} 个块")

将数字列表分成多个块，每个块的大小根据总数和工作进程数量计算。

7. 并行处理

    start_time = time.time()
    processed_count = 0
    # 使用ProcessPoolExecutor进行并行处理
    with ProcessPoolExecutor(max_workers=optimal_workers) as executor:
        _log.info("启动ProcessPoolExecutor")
        # 提交所有任务
        futures = [executor.submit(process_numbers, chunk, factor) for chunk in chunks]
        _log.info(f"提交了 {len(futures)} 个任务")
        # 等待完成并收集结果
        for future in as_completed(futures):
            try:
                result = future.result()
                processed_count += 1
                _log.info(f"{'#'*50}\n{result} ({processed_count}/{len(chunks)} 总计)\n{'#'*50}")
            except Exception as e:
                _log.error(f"处理块时出错: {str(e)}")
                raise

使用ProcessPoolExecutor进行并行处理，提交所有任务并等待完成。

8. 计算耗时

    elapsed_time = time.time() - start_time
    _log.info(f"并行处理完成，耗时 {elapsed_time:.2f} 秒。")

计算并行处理的总耗时并输出。

并行处理的基本概念和优势

并行处理是指同时执行多个任务，以提高程序的执行效率。Python的concurrent.futures模块提供了一个高级接口，用于并行执行任务。ProcessPoolExecutor是其中一个重要的类，它使用多进程来并行执行任务。

并行处理的优势包括：

提高程序的执行效率
充分利用多核CPU的计算能力
简化多线程或多进程编程的复杂性

如何运行和测试这个示例

将上述代码保存为parallel_processing_example.py文件。
确保你的Python环境中安装了必要的模块（本示例不需要额外安装模块）。
在终端或命令行中运行以下命令：

python parallel_processing_example.py

你将看到程序的执行过程和并行处理的结果。

总结

通过这个示例，我们展示了如何使用Python的ProcessPoolExecutor进行并行处理。并行处理是提高程序性能的重要手段，特别是在处理大量数据或计算密集型任务时。希望这个示例能帮助你更好地理解并行处理的概念和实现。

到此这篇关于Python并行处理实战之如何使用ProcessPoolExecutor加速计算的文章就介绍到这了,更多相关Python ProcessPoolExecutor加速计算内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家！

您可能感兴趣的文章:

python实现日常记账本小程序
这篇文章主要为大家详细介绍了python实现日常记账本小程序，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2018-03-03
Python Paramiko上传下载sftp文件及远程执行命令详解
这篇文章主要为大家介绍了Python Paramiko上传下载sftp文件及远程执行命令示例详解，有需要的朋友可以借鉴参考下，希望能够有所帮助，祝大家多多进步，早日升职加薪
2022-07-07
Python3加密解密库Crypto的RSA加解密和签名/验签实现方法实例
这篇文章主要介绍了Python3加密解密库Crypto的RSA加解密和签名/验签实现方法实例,需要的朋友可以参考下
2020-02-02
Python常见字符串操作函数小结【split()、join()、strip()】
这篇文章主要介绍了Python常见字符串操作函数,结合实例形式总结分析了split()、join()及strip()的常见使用技巧与注意事项,需要的朋友可以参考下
2018-02-02
python使用socket创建tcp服务器和客户端
这篇文章主要为大家详细介绍了python使用socket创建tcp服务器和客户端，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2018-04-04
Python中数据库操作库实战示例
本文将详细介绍Python中最主流、最实用的数据库操作库,涵盖关系型数据库（MySQL、PostgreSQL、SQLite）和非关系型数据库（MongoDB、Redis）对应的操作库,感兴趣的朋友跟随小编一起看看吧
2026-02-02
python opencv 找出图像中的最大轮廓并填充(生成mask)
这篇文章主要介绍了python opencv 找出图像中的最大轮廓并填充(生成mask)，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2021-03-03
Django unittest 设置跳过某些case的方法
今天小编就为大家分享一篇Django unittest 设置跳过某些case的方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2018-12-12
从基础到精通详解Pandas操作Excel使用手册大全
在数据分析和处理中,Excel文件是最常见的数据格式之一,Pandas作为Python最强大的数据处理库,提供了丰富的Excel操作功能,下面小编就为大家详细介绍一下吧
2025-11-11
Python根据成绩分析系统浅析
在本篇文章里小编给大家分享了关于Python根据成绩分析是否继续深造一个系统的相关知识点，有需要的朋友们学习下。
2019-02-02