Handling Complex Memory Situations

Jérôme Glisse felt that the time had come for the Linux kernel
to address seriously the issue of having many different types of memory
installed on a single running system. There was main system memory and
device-specific memory, and associated hierarchies regarding which memory
to use at which time and under which circumstances. This complicated new
situation, Jérôme said, was actually now the norm, and it should be treated
as such.

The physical connections between the various CPUs and devices and RAM
chips—that is, the bus topology—also was relevant, because it could influence
the various speeds of each of those components.

Jérôme wanted to be clear that his proposal went beyond existing efforts
to handle heterogeneous RAM. He wanted to take account of the wide range of
hardware and its topological relationships to eek out the absolute
highest performance from a given system. He said:

One of the reasons for
radical change is the advance of accelerator
like GPU or FPGA means that CPU is no longer the only piece where
computation happens. It is becoming more and more common for an application
to use a mix and match of different accelerator to perform its computation.
So we can no longer satisfy our self with a CPU centric and flat view of a
system like NUMA and NUMA distance.

He posted some patches to accomplish several different things. First, he
wanted to expose the bus topology and memory variety to userspace as a
clear API, so that both the kernel and user applications could make the
best possible use of the particular hardware configuration on a given
system. A part of this, he said, would have to take account of the fact
that not all memory on the system always would be equally available to all
devices, CPUs or users.

To accomplish all this, his patches first identified four basic
elements that could be used to construct an arbitrarily complex graph of
CPU, memory and bus topology on a given system.

These included “targets”, which were any sort of memory; “initiators”,
which were CPUs or any other device that might access memory; “links”,
which were any sort of bus-type connection between a target and an
initiator; and “bridges”, which could connect groups of initiators to
remote targets.

Aspects like bandwidth and latency would be associated with their relevant
links and bridges. And, the whole graph of the system would be exposed to
userspace via files in the SysFS hierarchy.

Powered by WPeMatico