In kube-batch there are 4 actions such as allocate, preempt, reclaim, backfill and with the help of plugins like conformance, drf, gang, nodeorder and more plugins. All these plugins provides behavioural characteristics how scheduler make scheduling decisions.
As discussed in Introduction, preempt is one of the actions in kube-batch scheduler. Preempt action comes into play when a high priority task comes and there is no resource requested by that task is available in the cluster, then few of the tasks should be evicted so that new task will get resource to run.
In preempt action, multiple plugin function are getting used like
- TaskOrderFn(Plugin: Priority),
- JobOrderFn(Plugin: Priority, DRF, Gang),
- NodeOrderFn(Plugin: NodeOrder),
- PredicateFn(Plugin: Predicates),
- PreemptableFn(Plugin: Conformance, Gang, DRF).
Compares taskPriority set in PodSpec and returns the decision of comparison between two priorities.
Compares jobPriority set in Spec(using PriorityClass) and returns the decision of comparison between two priorities.
The job having the lowest share will have higher priority.
The job which is not yet ready(i.e. minAvailable number of task is not yet in Bound, Binding, Running, Allocated, Succeeded, Pipelined state) will have high priority.
NodeOrderFn returns the score of a particular node for a specific task by running through sets of priorities.
PredicateFn returns whether a task can be bounded to a node or not by running through set of predicates.
Checks whether a task can be preempted or not, which returns set of tasks that can be preempted so that new task can be deployed.
In conformance plugin, it checks whether a task is critical or running in kube-system namespace, so that it can be avoided while computing set of tasks that can be preempted.
It checks whether by evicting a task, it affects gang scheduling in kube-batch. It checks whether by evicting particular task, total number of tasks running for a job is going to be less than the minAvailable requirement for gang scheduling requirement.
The preemptor can only preempt other tasks only if the share of the preemptor is less than the share of the preemptee after recalculating the resource allocation of the premptor and preemptee.