Meta‘s engineering team published a blog post about a new Linux kernel feature, Transparent Memory Offloading which is developed by the team and currently being used in production on Meta servers. The team stated that the new feature enabled the team to save 20-32% memory per server.
In production for more than a year
TMO was designed for heterogeneous data center environments. The new feature introduces a new Linux kernel mechanism that measures the lost work due to resource shortage across CPU, memory, and I/O in real-time. It allows TMO to automatically adjusts the amount of memory to offload to a heterogeneous device, according to the device’s performance characteristics and the application’s sensitivity to slower memory accesses.
The feature is capable of identifying offloading opportunities from the application containers and sidecar containers that provide infrastructure-level functions. The team also stated that TMO has been running in production for over a year now and saved 20% to 32% of total memory across millions of servers. TMO consists of the following components:
- Pressure Stall Information (PSI): A Linux kernel component that measures the lost work due to resource shortage across CPU, memory, and I/O in real-time. For the first time, we can directly measure an application’s sensitivity to memory access slowdown without resorting to fragile low-level metrics such as the page promotion rate.
- Senpai: A userspace agent that applies mild, proactive memory pressure to effectively offload memory across diverse workloads and heterogeneous hardware with minimal impact on application performance.
- TMO performs memory offloading to swap at subliminal memory pressure levels, with turnover proportional to file cache. This contrasts with the historical behavior of swapping as an emergency overflow under severe memory pressure.
Meta engineering team said,
« Currently, we manually choose the offload back end between compressed memory and SSD-backed swap depending on the application’s memory compressibility as well as its sensitivity to memory-access slowdown. Although we could develop tools to automate the process, a more fundamental solution entails the kernel managing a hierarchy of offload back ends (e.g., automatically using zswap for warmer pages and SSD for colder or less compressible pages, as well as folding NVM and CXL devices into the memory hierarchy in the future). The kernel reclaim algorithm should dynamically balance across these pools of memory. We are actively working on this architecture.
With upcoming bus technologies such as CXL that provide memorylike access semantics, memory offloading can help offload not only cold memory but also warm memory. We are actively focusing on that architecture to utilize CXL devices as a memory-offloading back end. »