
Why is a scalable scheduler required
- Large-scale compute clusters are expensive, so it is important to use them well. Utilization and efficiency can be increased by running a mix of workloads on the same machines.
- This consolidation reduces the amount of hardware required for a workload, but it makes the scheduling problem (assigning jobs to machines) more complicated.
- Clusters and their workloads keep growing, and since the scheduler’s workload is roughly proportional to the cluster size, the scheduler is at risk of becoming a scalability bottleneck.
What is Omega
Omega is parallel scheduler architecture built around shared state, using lock-free optimistic concurrency
control, to achieve both implementation extensibility and performance scalability.
Omega Details
- One important driver of complexity is the hardware and workload heterogeneity that is commonplace in large compute clusters.
- Omega uses shared state approach. Omega grants each scheduler full access to the entire cluster, allow them to compete in a free-for-all manner, and use optimistic concurrency control to mediate clashes when they update the cluster state.
- There is no central resource allocator in Omega; all of the resource-allocation decisions take place in the schedulers.
- Omega schedulers operate completely in parallel and do not have to wait for jobs in other schedulers, and there is no inter-scheduler head of line blocking.
- Different Omega schedulers can implement different policies, but all must agree on what resource allocations are permitted (e.g., a common notion of whether a machine is full), and a common scale for expressing the relative importance of jobs, called precedence
Lessons learnt from designing Omega
- Optimistic concurrency over shared state is a viable, attractive approach to cluster scheduling.
- Although this approach will do strictly more work than a pessimistic locking scheme as work may need to be re-done, we found the overhead to be acceptable at reasonable operating points, and the resulting benefits in eliminating head-of line blocking and better scalability to often outweigh it.