Docker使用GPU全过程

 更新时间:2024年01月09日 15:16:25   作者:DripBoy  
这篇文章主要介绍了Docker使用GPU全过程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教

一、docker使用宿主机硬件设备的三种方式

  • 使用--privileged=true选项,以特权模式开启容器
  • 使用--device选项
  • 使用容器卷挂载-v选项

二、docker使用gpu方式演变

docker使用宿主机的gpu设备,本质是把宿主机使用gpu时调用的设备文件全部挂载到docker上。

nvidia提供了三种方式的演变,如下是官网的一些介绍

来自 <Enabling GPUs in the Container Runtime Ecosystem | NVIDIA Technical Blog>

NVIDIA designed NVIDIA-Docker in 2016 to enable portability in Docker images that leverage NVIDIA GPUs. It allowed driver agnostic CUDA images and provided a Docker command line wrapper that mounted the user mode components of the driver and the GPU device files into the container at launch. Over the lifecycle of NVIDIA-Docker, we realized the architecture lacked flexibility for a few reasons: Tight integration with Docker did not allow support of other container technologies such as LXC, CRI-O, and other runtimes in the future We wanted to leverage other tools in the Docker ecosystem – e.g. Compose (for managing applications that are composed of multiple containers) Support GPUs as a first-class resource in orchestrators such as Kubernetes and Swarm Improve container runtime support for GPUs – esp. automatic detection of user-level NVIDIA driver libraries, NVIDIA kernel modules, device ordering, compatibility checks and GPU features such as graphics, video acceleration As a result, the redesigned NVIDIA-Docker moved the core runtime support for GPUs into a library called libnvidia-container. The library relies on Linux kernel primitives and is agnostic relative to the higher container runtime layers. This allows easy extension of GPU support into different container runtimes such as Docker, LXC and CRI-O. The library includes a command-line utility and also provides an API for integration into other runtimes in the future. The library, tools, and the layers we built to integrate into various runtimes are collectively called the NVIDIA Container Runtime. Since 2015, Docker has been donating key components of its container platform, starting with the Open Containers Initiative (OCI) specification and an implementation of the specification of a lightweight container runtime called runc. In late 2016, Docker also donated containerd, a daemon which manages the container lifecycle and wraps OCI/runc. The containerd daemon handles transfer of images, execution of containers (with runc), storage, and network management. It is designed to be embedded into larger systems such as Docker. More information on the project is available on the official site. Figure 1 shows how the libnvidia-container integrates into Docker, specifically at the runc layer. We use a custom OCI prestart hook called nvidia-container-runtime-hook to runc in order to enable GPU containers in Docker (more information about hooks can be found in the OCI runtime spec). The addition of the prestart hook to runc requires us to register a new OCI compatible runtime with Docker (using the –runtime option). At container creation time, the prestart hook checks whether the container is GPU-enabled (using environment variables) and uses the container runtime library to expose the NVIDIA GPUs to the container. Figure 1.Integration of NVIDIA Container Runtime with Docker

1、nvidia-docker

nvidia-docker是在docker的基础上做了一层封装

通过 nvidia-docker-plugin把硬件设备在docker的启动命令上添加必要的参数。

Ubuntu distributions 
# Install nvidia-docker and nvidia-docker-plugin 
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc/nvidia-docker_1.0.0.rc-1_amd64.deb 
sudo dpkg -i /tmp/nvidia-docker_1.0.0.rc-1_amd64.deb && rm /tmp/nvidia-docker*.deb # Test nvidia-smi 
nvidia-docker run --rm nvidia/cuda nvidia-smi 
 
Other distributions 
# Install nvidia-docker and nvidia-docker-plugin 
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc/nvidia-docker_1.0.0.rc_amd64.tar.xz 
sudo tar --strip-components=1 -C /usr/bin -xvf /tmp/nvidia-docker_1.0.0.rc_amd64.tar.xz && rm /tmp/nvidia-docker*.tar.xz 
# Run nvidia-docker-plugin 
sudo -b nohup nvidia-docker-plugin > /tmp/nvidia-docker.log 
# Test nvidia-smi 
nvidia-docker run --rm nvidia/cuda nvidia-smi 
 
Standalone install 
# Install nvidia-docker and nvidia-docker-plugin 
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc/nvidia-docker_1.0.0.rc_amd64.tar.xz 
sudo tar --strip-components=1 -C /usr/bin -xvf /tmp/nvidia-docker_1.0.0.rc_amd64.tar.xz && rm /tmp/nvidia-docker*.tar.xz 
# One-time setup 
sudo nvidia-docker volume setup 
# Test nvidia-smi 
nvidia-docker run --rm nvidia/cuda nvidia-smi

2、nvidia-docker2

sudo apt-get install nvidia-docker2 sudo apt-get install nvidia-container-runtime sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]

3、nvidia-container-toolkit

docker版本在19.03及以上后

nvidia-container-toolkit进行了进一步的封装,在参数里直接使用--gpus "device=0" 即可

总结

以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。

相关文章

  • 关于Docker容器Dead状态的分析

    关于Docker容器Dead状态的分析

    这篇文章主要介绍了关于Docker容器Dead状态的分析,具有很好的参考价值,希望对大家有所帮助。如有错误或未考虑完全的地方,望不吝赐教
    2023-07-07
  • docker容器启动后如何修改或添加端口

    docker容器启动后如何修改或添加端口

    这篇文章主要介绍了docker容器启动后如何修改或添加端口问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教
    2024-04-04
  • Docker容器网络配置全攻略之桥接、Host、container详解

    Docker容器网络配置全攻略之桥接、Host、container详解

    这篇文章主要介绍了Docker容器网络配置全攻略之桥接、Host、container的使用,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教
    2025-04-04
  • docker-compose ports和expose的区别详解

    docker-compose ports和expose的区别详解

    这篇文章主要介绍了docker-compose ports和expose的区别详解,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
    2020-01-01
  • k3d入门指南之在Docker中运行K3s的详细教程

    k3d入门指南之在Docker中运行K3s的详细教程

    在本文中,我们将简单了解k3d,这是一款可让您在安装了Docker的任何地方运行一次性Kubernetes集群的工具,此外在本文中我们还将探讨在使用k3d中可能会出现的一切问题,感兴趣的朋友跟随小编一起看看吧
    2021-05-05
  • docker容器自动退出重启报错问题及解决

    docker容器自动退出重启报错问题及解决

    这篇文章主要介绍了docker容器自动退出重启报错问题及解决方案,具有很好的参考价值,希望对大家有所帮助。如有错误或未考虑完全的地方,望不吝赐教
    2023-07-07
  • Docker之自定义网络实现

    Docker之自定义网络实现

    大家好,本篇文章主要讲的是Docker之自定义网络实现,感兴趣的同学赶快来看一看吧,对你有帮助的话记得收藏一下,方便下次浏览
    2021-12-12
  • Docker定制容器镜像的2种方法(推荐)

    Docker定制容器镜像的2种方法(推荐)

    本篇文章主要介绍了Docker定制容器镜像的2种方法(推荐),小编觉得挺不错的,现在分享给大家,也给大家做个参考。一起跟随小编过来看看吧
    2017-02-02
  • docke自定义网络之容器互联

    docke自定义网络之容器互联

    大家好,本篇文章主要讲的是docke自定义网络之容器互联,感兴趣的同学赶快来看一看吧,对你有帮助的话记得收藏一下,方便下次浏览
    2021-12-12
  • docker删除镜像的实现方式详解

    docker删除镜像的实现方式详解

    这篇文章主要为大家介绍了docker删除镜像的实现方式详解,有需要的朋友可以借鉴参考下,希望能够有所帮助,祝大家多多进步,早日升职加薪
    2023-07-07

最新评论