跳到主要内容
版本:v1.13.0 (最新版)

Volcano Job 插件 -- SSH 用户指南

背景

SSH 插件用于实现 Volcano Job 内 Pod 之间的免密 SSH 登录,这在 MPI 等分布式工作负载中是必需的。通常与 svc 插件配合使用。

要点

  • 若配置了 ssh-key-file-path,请确保目标目录下已存在私钥与公钥。多数场景建议保留默认值。
  • 若配置了 ssh-private-keyssh-public-key,请确保取值正确。多数场景建议保留默认密钥。
  • 配置 SSH 插件后,会创建名为 {job-name}-ssh 的 Secret,其中包含 authorized_keysid_rsaconfigid_rsa.pub,并以 Volume 形式挂载到 Job 内所有容器(含 initContainers)的指定路径。
  • 默认可在 /root/.ssh/config 中查看 Job 内所有主机名;该文件包含主机名与子域名的对应关系。
  • 配置 SSH 插件后,可在同一 Job 内通过 ssh hostname 免密登录其他 Pod。

参数说明

编号名称类型默认值必填说明示例
1ssh-key-file-pathString/root/.ssh存放 SSH 私钥与公钥的路径ssh: ["--ssh-key-file-path=/home/user/.ssh"]
2ssh-private-keyStringDEFAULT_PRIVATE_KEY私钥字符串ssh: ["--ssh-private-key=-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAKCAQEAyeyZjWDx5Na9bw1f61M4s+QlLT/kyrB37AR2j5Sb/A9hvJak\nLNQQpNC+KVfYNl4jePG+6lwHqye//pcC9+0SWsHWwgaahjMLnAthR2k8JAakNA9x\nV/wHz0YU99OKEetaOuxXpWZPXCHX0zuQO87YbdKzRbgxACirM3Phkwr7XLtQtWZk\nyXG34CQXZQWgBIS1Fl+PlGOpVpOPnWoZPMpbAK74i/Tz4sP8Zhqc6dya1hrbUwY3\nYfMZNYXpaAw7wWVjq8grfs0+Fl3SxHrzTXge2m+eZAZ6iPJ8cX4uYKxi0ZmxpM/a\ngI6Mmjq0MU75Vxpq22LaUvHIpOfX5UxhkrsxlwIDAQABAoIBAQDGOuIb6zpNn4rl\nBMpPqamW4LimjX08hrWUHGWQWyIu96LJk1GlOKMGSm8FA1odNZm5WApG5QYaPrG7\na+DcJ/7G3ljIrdbxPBd/n6RmiKcj7ukwuqBY8fFwyKo5CZEYOmagRfldRO1P02Gf\n22+jZ1MNrbWVElf4gfRgVLj0s+lEhFkzhi+QGMmMpjEJnnG98xxVGEvWMw1rnKJm\n3Gi771Gltbg3GuEPs3IeoBgba3EaHmSxJnBivAL4zsO8UUCAXB13cUiXx8qO7y1e\nCSWSenRmK2ugbL6v0co12O0n0pxF9xlJ6fALdRWzpJsFlN3ttkY9N5GrQc/pVjOa\nvqa172RRAoGBAOSAIMNLT6QjgYDk5Z7ZxjNnxH/lMso+cx6bxk9YMKRrw0fDQh8m\ncBAihXhuntCPDGhrzQ+Anqx4jJVDFqac0xBck90a8LmmzD0q72eDTCYPouDWe6DL\nJQAc/HDmIC13sADEXmGW3c0Qn4hjBnMd89ouYj7ZajU2sED2irPPc/HLAoGBAOI5\nruL4Q0FarGrP3a9z9EDrVJsK2OfSTaJ7rhZ+uvB838svbHU+4mEYPhx4PCwvrYyi\nFn4hyau003ZmLc1qTABjmwcO/PPiYyoRHJDUIIhiIyIL+id/G53uG2eTzqYtU6uS\nnAIB2rKwwhU8ek+zbJBLu5uxuxlf4mdZITdkwtXlAoGBALH3RQ02A9JgQQYFwP2G\nucLhx/6goX05RGoLg1na4w+8Sr0Cy+X9BvzaFkAlUBY5w700cOLpFyxXO48pUGP1\n8sFkiVmFGQZPbfUaEpn5ff6K4R3ijyk97xR2fvrjkR44gOEoECZL3XZQwx/zmFti\nccF1rNksdnb5oC8IliDTq4cfAoGANyy6asECJj5nLuXju5ccS3kZ+XZ70I6KQMbJ\nftMJ5P2P146JdU8RB31SKL9qbZxzR4mA0uKKvUYtDQN+yErUnoOsm9wb9Z+RcAEc\nZnZWOO02hGdHa7qkkbAxHuH91KnZbk8jnZm2LT7PFz7Y1fd80vSlnSOL7nRkU7B5\nWXlJy8ECgYA4g0wc0Jq8c1Q0FulMkOQqYRDXaDo34987L+mZ70i/RtdkKjK/IKJ9\n18UDCyEaDPD0BWBJGPejZkY8UD6FBG/5k7wNIbT7hHLRSRlw4iRmVX2hRVXrXzD8\nvc86Qyg2iG0JqkMAvRdH40amPKp5bW4VcfcvQo4TSsI972u12rgwtg==\n-----END RSA PRIVATE KEY-----\n"]
3ssh-public-keyStringDEFAULT_PUBLIC_KEY公钥字符串ssh: ["--ssh-public-key=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJ7JmNYPHk1r1vDV/rUziz5CUtP+TKsHfsBHaPlJv8D2G8lqQs1BCk0L4pV9g2XiN48b7qXAerJ7/+lwL37RJawdbCBpqGMwucC2FHaTwkBqQ0D3FX/AfPRhT304oR61o67FelZk9cIdfTO5A7ztht0rNFuDEAKKszc+GTCvtcu1C1ZmTJcbfgJBdlBaAEhLUWX4+UY6lWk4+dahk8ylsArviL9PPiw/xmGpzp3JrWGttTBjdh8xk1heloDDvBZWOryCt+zT4WXdLEevNNeB7ab55kBnqI8nxxfi5grGLRmbGkz9qAjoyaOrQxTvlXGmrbYtpS8cik59flTGGSuzGX root@aiplatform"]

说明

  • DEFAULT_PRIVATE_KEYDEFAULT_PUBLIC_KEY 因内容过长未在表中完整列出,请参阅下方示例。
  • Volcano 不负责校验 ssh-key-file-path,请自行确保路径正确。
  • 多数场景建议留空并使用默认值;此时 Volcano 会自动生成密钥对并完成相关配置。

示例

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: mpi-job
spec:
minAvailable: 3
schedulerName: volcano
plugins:
ssh: [] ## 注册 SSH 插件
svc: []
tasks:
- replicas: 1
name: mpimaster
template:
spec:
containers:
- command:
- /bin/bash
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd;
MPI_HOST=`cat /etc/volcano/mpiworker.host | tr "\n" ","`;
sleep 10;
mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 --prefix /usr/local/openmpi-3.1.5 python /tmp/gpu-test.py;
sleep 3600;
image: lyd911/mindspore-gpu-example:0.2.0
name: mpimaster
ports:
- containerPort: 22
name: mpijob-port
workingDir: /home
restartPolicy: OnFailure
- replicas: 2
name: mpiworker
template:
spec:
containers:
- command:
- /bin/bash
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
image: lyd911/mindspore-gpu-example:0.2.0
name: mpiworker
resources:
limits:
nvidia.com/gpu: "1"
ports:
- containerPort: 22
name: mpijob-port
workingDir: /home
restartPolicy: OnFailure

说明

  • 本示例将创建一个包含 1 个 master 与 2 个 worker 的 MPI Job。
  • 因启用了 svc 插件,可在任意 Pod 中通过环境变量获取所有主机;若使用默认 SSH 配置,也可在 /root/.ssh/config 中查看主机列表。
[root@mpi-job-master-0 /]# cat /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
Host mpi-job-mpimaster-0
HostName mpi-job-mpimaster-0.mpi-job
Host mpi-job-mpiworker-0
HostName mpi-job-mpiworker-0.mpi-job
Host mpi-job-mpiworker-1
HostName mpi-job-mpiworker-1.mpi-job
  • 可在 master Pod 中按如下方式登录其他主机:
[root@mpi-job-master-0 /]# ssh mpi-job-mpiworker-0
Warning: Permanently added 'mpi-job-mpiworker-0.mpi-job,X.X.X.X' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 3.10.0-1160.36.2.el7.x86_64 x86_64)

* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.
Last login: Thu Apr 14 07:19:05 2022 from 10.244.0.67
root@mpi-job-mpiworker-0:~#

说明

  • 请确保所有容器内均已提供 sshd 服务。