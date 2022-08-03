Ingo Molnar submitted the kernel scheduler updates for the next release, Linux Kernel 6.0 , which is currently under development.

submitted the kernel scheduler updates for the next release, , which is currently under development. The kernel scheduler updates are focusing mainly on load-balancing improvements, along with ABI improvements, optimizations, and fixes.

Ingo Molnar also referred to the new version as Linux Kernel 6.0, as Linux Torvalds announced previously, instead of version 5.20.

Ingo Molnar submitted the schedular changes for the Linux Kernel 6.0, which is now currently under development. Most of us expected the version to be named 5.20, however, at the release announcement of Linux Kernel 5.19, Linux Torvalds stated that he is starting to worry about getting confused by big numbers again and decided to name the next version 6.0. Ingo Molnar also referred to the new version as 6.0.

Load balancing improvements

Kernel scheduler updates for Linux Kernel 6.0 mostly focus on load-balancing. One of them is NUMA balancing on AMD Zen systems for affine workloads which is related to a patch from AMD to further tune the Linux kernel’s scheduler around NUMA imbalance in the “find_idlest_group” function.

Another scheduler change led by Intel aims to more efficient CPU idle searching under heavy system load. Some other changes include improvements for handling reduced-capacity CPUs in load balancing, core scheduling, wake-up balancing, Energy Model, and other optimizations and fixes. Linux Kernel 6.0 merge window is still open and you can take a look at the submitted scheduler changes in the pull request.

Load-balancing improvements:

Improve NUMA balancing on AMD Zen systems for affine workloads.

Improve the handling of reduced-capacity CPUs in load-balancing.

Energy Model improvements: fix & refine all the energy fairness metrics (PELT), and remove the conservative threshold requiring 6% energy savings to migrate a task. Doing this improves power efficiency for most workloads, and also increases the reliability of energy-efficiency scheduling.

Optimize/tweak select_idle_cpu() to spend (much) less time searching for an idle CPU on overloaded systems. There’s reports of several milliseconds spent there on large systems with large workloads.

Improve NUMA imbalance behavior. On certain systems with spare capacity, initial placement of tasks is non-deterministic, and such an artificial placement imbalance can persist for a long time, hurting (and sometimes helping) performance.

Improve core scheduling by fixing a bug in sched_core_update_cookie() that caused unnecessary forced idling.

Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs for newly woken tasks.

Fix a new idle balancing bug that introduced unnecessary wakeup latencies.

ABI improvements/fixes:

Do not check capabilities and do not issue capability check denial messages when a scheduler syscall doesn’t require privileges.

Add forced-idle accounting to cgroups too.

Fix/improve the RSEQ ABI to not just silently accept unknown flags.

Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags.

Optimizations:

Optimize & simplify leaf_cfs_rq_list()

Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg().

Misc fixes & cleanups:

Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems.

Fix a full-NOHZ bug that can in some cases result in the tick not being re-enabled when the last SCHED_RT task is gone from a run queue but there’s still SCHED_OTHER tasks around.

Various PREEMPT_RT related fixes.

Misc cleanups & smaller fixes.