Skip to main content
Version: v1.13.0 (Latest)

MPI Plugin User Guide

Introduction

MPI plugin is designed to optimize the user experience when running MPI jobs, it not only allows users to write less yaml, but also ensures the normal operation of MPI jobs.

How the MPI Plugin Works

The MPI plugin will do three things:

  • Open ports used by MPI for all containers of the job
  • Force open ssh and svc plugins
  • add MPI_HOST environment variable for master pod, this environment variable includes the worker's domain name, It is used by the --host parameter of mpiexec

Parameters of the MPI Plugin

Key Points

  • If master or worker is configured, please ensure that the tasks corresponding to their values exist, and the roles of these tasks correspond to the meaning of the parameters
  • If port is configured, make the port value of sshd the same as the value of the parameter.
  • If the gang plugin is enabled, then make sure that the value of minAvailable is equal to the number of replicas of the worker.

Arguments

IDNameTypeDefault ValueRequiredDescriptionExample
1masterstringmasterNoName of MPI master--master=mpimaster
2workerstringworkerNoName of MPI worker--worker=mpiworker
3portstring22NoThe port to open for the container--port=5000

Examples

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: lm-mpi-job
spec:
minAvailable: 1
schedulerName: volcano
plugins:
mpi: ["--master=mpimaster","--worker=mpiworker","--port=22"] ## MPI plugin register
tasks:
- replicas: 1
name: mpimaster
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd;
mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 mpi_hello_world;
image: volcanosh/example-mpi:0.0.3
name: mpimaster
workingDir: /home
restartPolicy: OnFailure
- replicas: 2
name: mpiworker
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
image: volcanosh/example-mpi:0.0.3
name: mpiworker
workingDir: /home
restartPolicy: OnFailure