Introduction
MPI plugin is designed to optimize the user experience when running MPI jobs, it not only allows users to write less yaml, but also ensures the normal operation of MPI jobs.
How the MPI Plugin Works
The MPI plugin will do three things:
- Open ports used by MPI for all containers of the job
- Force open
sshandsvcplugins - add
MPI_HOSTenvironment variable for master pod, this environment variable includes the worker’s domain name, It is used by the--hostparameter ofmpiexec
Parameters of the MPI Plugin
Key Points
- If
masterorworkeris configured, please ensure that the tasks corresponding to their values exist, and the roles of these tasks correspond to the meaning of the parameters - If
portis configured, make the port value ofsshdthe same as the value of the parameter. - If the
gangplugin is enabled, then make sure that the value ofminAvailableis equal to the number ofreplicas of the worker.
Arguments
| ID | Name | Type | Default Value | Required | Description | Example |
|---|---|---|---|---|---|---|
| 1 | master | string | master | No | Name of MPI master | –master=mpimaster |
| 2 | worker | string | worker | No | Name of MPI worker | –worker=mpiworker |
| 3 | port | string | 22 | No | The port to open for the container | –port=5000 |
Examples
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: lm-mpi-job
spec:
minAvailable: 1
schedulerName: volcano
plugins:
mpi: ["--master=mpimaster","--worker=mpiworker","--port=22"] ## MPI plugin register
tasks:
- replicas: 1
name: mpimaster
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd;
mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 mpi_hello_world;
image: volcanosh/example-mpi:0.0.3
name: mpimaster
workingDir: /home
restartPolicy: OnFailure
- replicas: 2
name: mpiworker
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
image: volcanosh/example-mpi:0.0.3
name: mpiworker
workingDir: /home
restartPolicy: OnFailure