python分布式计算dispy的使用详解

更新时间：2019年12月22日 15:17:16 作者：振裕

今天小编就为大家分享一篇python分布式计算dispy的使用详解，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧

dispy，是用asyncoro实现的分布式并行计算框架。

框架也是非常精简，只有4个组件，在其源码文件夹下可以找到：

dispy.py (client) provides two ways of creating “clusters”: JobCluster when only one instance of dispy may run and SharedJobCluster when multiple instances may run (in separate processes). If JobCluster is used, the scheduler contained within dispy.py will distribute jobs on the server nodes; if SharedJobCluster is used, a separate scheduler (dispyscheduler) must be running.

dispynode.py executes jobs on behalf of dispy. dispynode must be running on each of the (server) nodes that form the cluster.

dispyscheduler.py is needed only when SharedJobCluster is used; this provides a scheduler that can be shared by multiple dispy users.

dispynetrelay.py is needed when nodes are located across different networks; this relays information about nodes on a network to the scheduler. If all the nodes are on same network, there is no need for dispynetrelay - the scheduler and nodes automatically discover each other.

一般情况下，使用dispy和dispynode就已经足够解决问题了。

简单使用：

服务器端：

在服务器端启动dispy，监听并接收所有发来的计算任务，完成计算后将结果返回给客户端。

打开python_home/Scripts文件夹，在安装dispy后会有上面说到的4个dispy组件，以py文件形式存在。当然你也可以在dispy的源码文件夹里面找到对于的dispynode.py文件，然后执行

python dispynode.py -c 2 -i 192.168.138.128 -p 51348 -s secret --clean

python dispynode.py -c 2 -i 192.168.8.143 -p 51348 -s secret --clean

这里192.168.138.128和192.168.8.143是执行计算节点的ip（对服务器来说相当于localhost），这里我启用了两个节点，每个节点使用2个cpu资源，其中有一个节点是在虚拟机，一个是本地机器。

-s secret是通信密码，客户端和服务器连接需要密码，密码随意。

--clean表示每次启动服务都删除上次的启动信息，如果不删除，可能会出现pid占用的错误。

客户端：

在客户端需要注意的是，发送到计算节点函数所引用的模块，不能在py文件的顶层导入，而需要在函数内导入。

对于需要导入自定义模块，比较麻烦一点，需要先实例化函数，才能在计算节点的函数中使用。

# 这些在顶层导入的模块只能是这个py文件用
import time
import socket
import numpy
import datetime

# 这个是自定义函数，要在本模块中先实例化才能在计算节点函数中调用使用，
# 而本模块的其他地方可以直接调用使用
from my_package.my_model import get_time 

# 实例化自定义的函数，注意后面是没有括号的，否则就是直接调用得到返回值了
now = get_time.now

# 计算函数，dispy将这个函数和参数一并发送到服务器节点
# 如果函数有多个参数，需要包装程tuple格式
def compute(args):
 n,array=args # 如果函数有多个参数，需要包装程tuple格式
 # 看到没，计算需要的模块是在函数内导入的
 import time, socket
 time.sleep(3)
 host = socket.gethostname()
 # 这个py文件中自定义函数，可以直接引用
 total= my_sum(array)
 # 这个now是在其他模块中自定义的函数，需要在顶层先实例化才能引用
 now_time=now()
 return (host, n, total,now_time)

def sum(array):
 # 自定义函数，需要的模块同样需要在函数内导入
 import numpy as np
 return np.sum(array)

def loadData():
 # 自定义函数，生成测试数据
 import numpy as np
 data = np.random.rand(20,20)
 data = [line for line in data]
 return data



if __name__ == '__main__':
 import dispy, random
 # 定义两个计算节点
 nodes = ['192.168.8.143', '192.168.138.128']
 # 启动计算集群，和服务器通信，通信密钥是'secret'
 # depends 为依赖函数
 cluster = dispy.JobCluster(compute,nodes=nodes,
      secret='secret',depends=[sum，now])
 jobs = []

 datas = loadData()
 for n in range(len(datas)):
  # 提交任务
  job = cluster.submit((n,datas[n]))
  job.id = n
  jobs.append(job)
 # print(datetime.datetime.now())
 # cluster.wait() # 等待所有任务完成后才接着往下执行
 # print(datetime.datetime.now())
 for job in jobs:
  host, n, total,t = job()
  print('%s executed job %s at %s with %s total=%.2f t=%s' 
    % (host, job.id, job.start_time, n,total,t))
  # other fields of 'job' that may be useful:
  # print job.stdout, job.stderr, job.exception, 
  # job.ip_addr, job.start_time, job.end_time
 # 显示集群计算状态
 cluster.stats()

以上这篇python分布式计算dispy的使用详解就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持脚本之家。

您可能感兴趣的文章:

Pyhhon之Pygame的Font文本和字体
这篇文章主要介绍了Pygame的Font文本和字体，Pygame 通过pygame.font模块来创建一个字体对象，从而实现绘制文本的目的。想进一步了解的同学可以参考阅读本文
2023-04-04
一个Python案例带你掌握xpath数据解析方法
xpath解析是最常用且最便捷高效的一种解析方式，通用性强。本文将通过一个Python爬虫案例带你详细了解一下xpath数据解析方法，需要的可以参考一下
2022-02-02
Python使用pickle模块报错EOFError Ran out of input的解决方法
这篇文章主要介绍了Python使用pickle模块报错EOFError Ran out of input的解决方法,涉及Python异常捕获操作处理相关使用技巧,需要的朋友可以参考下
2018-08-08
python实现内存监控系统
这篇文章主要为大家详细介绍了python实现内存监控系统，通过系统命令或操作系统文件获取到内存信息，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2018-06-06
python中pycurl库的用法实例
这篇文章主要介绍了python中pycurl库的用法实例,可实现从指定网址读取网页的功能,需要的朋友可以参考下
2014-09-09
Python中使用items()方法返回字典元素对的教程
这篇文章主要介绍了Python中使用items()方法返回字典元素对的教程,是Python入门中的基础知识,需要的朋友可以参考下
2015-05-05
django和vue互传图片并进行处理和展示
在项目中图片上传并附带几个参数的场景非常常见,如果技术栈是Vue+Django的小伙伴就一定会遇到这个需求,下面这篇文章主要给大家介绍了关于django和vue互传图片并进行处理和展示的相关资料,需要的朋友可以参考下
2023-05-05
python3连接mysql获取ansible动态inventory脚本
Ansible Inventory 是包含静态 Inventory 和动态 Inventory 两部分的，静态 Inventory 指的是在文件中指定的主机和组，动态 Inventory 指通过外部脚本获取主机列表。这篇文章主要介绍了python3连接mysql获取ansible动态inventory脚本,需要的朋友可以参考下
2020-01-01
一文带你快速掌握Python LightGBM必备知识点
LightGBM（Light Gradient Boosting Machine）是一种梯度提升树算法的高效实现，这篇文章为大家整理了十个LightGBM必备知识点，希望对大家有所帮助
2023-06-06
一篇文章带你了解python标准库--sys模块
这篇文章主要介绍了Python标准库之Sys模块使用详解,本文讲解了使用sys模块获得脚本的参数、处理模块、使用sys模块操作模块搜索路径、使用sys模块查找内建模块、使用sys模块查找已导入的模块等使用案例,需要的朋友可以参考下
2021-08-08