Realm Machine Model
This example illustrates how to query the machine model from Realm. In addition, it outlines various types that Realm uses to define resources of underlying hardware that an application occupies, as well as affinities between these resources.
Here is a list of covered topics:
Machine
In Realm, a Machine is the highest level of machine model, representing all the computer nodes an application can occupy.
It can be retrieved using the Machine::get_machine() function,
which returns a singleton object.
Processors, memories of the Machine, and affinity information can be queried from the Machine object.
Processor
A Processor is a runtime object that represents any execution resource that can run a task, such as CPU, GPU, etc.
To retrieve a Processor, ProcessorQuery can be used. In this example, we use ProcessorQuery to iterate over all
enabled processors and print out their address space and ID (a unique identifier Realm assignes).
for(Machine::ProcessorQuery::iterator it = Machine::ProcessorQuery(machine).begin(); it; ++it) {
...
}
The address space of a Realm resource, such as Processor and Memory (introduced in the next section), indicates the process/rank where it resides.
A Processor can also be queried by its kind and the runtime supports the following options:
LOC_PROCrepresents a latency processor,which is usually a CPU core. It can be specified by-ll:cpu.TOC_PROCrepresents a throughput processor (GPU). Currently, Realm supports both NVIDIA and AMD GPUs. It can be specified by-ll:gpu.UTIL_PROCrepresents a CPU processor that is designed for users to run their own background work. It can be specified by-ll:util.IO_PROCrepresents a processor that is used for I/O, which is also a CPU core. It can be specified by-ll:io.PROC_GROUPrepresents a group of processors.PROC_SETrepresents a set of processors for OpenMP/Kokkos etc. It can be specified by-ll:mp_nodes.OMP_PROCrepresents OpenMP thread pool. It can be specified by-ll:ocpu. The number of threads perOMP_PROCcan be specified by-ll:othr.PY_PROCrepresents a CPU processor that is used for Python interpreter. It can be specified by-ll:py. Currently, we only support a singlePY_PROC.
For a list of all the processors kind supported by Realm, please refer to Full Processor Kind.
Memory
Memory is used to describe the location of application data. MemoryQuery can be used to query
the Memory. For example, a MemoryQuery can be created with the condition has_affinity_to to return
all memories that are affixed to the given processor.
Machine::MemoryQuery mq = Machine::MemoryQuery(machine).has_affinity_to(p, 0, 0);
A Memory can also be queried by its kind and the runtime supports the following options:
GLOBAL_MEMrepresents CPU memory guaranteed to be visible to all processors on all nodes. e.g. GASNet global memory.GLOBAL_MEMis usually slow.GLOBAL_MEMis only used by MPI and GASNet1 modules, and it can be specified by-ll:gsize.SYSTEM_MEMrepresents CPU memory visible to all processors on a node. It can be specified by-ll:csize.REGDMA_MEMrepresents registered memory visible to all processors on a node, and can be a target of RDMA. It can be specified by-ll:rsize.SOCKET_MEMrepresents CPU memory visible to all processors within a node. It is NUMA-aware, so it provides better performance for processors on the same socket.Z_COPY_MEMrepresents Zero-Copy memory visible to all CPUs within a node and one or more GPUs. It can be specified by-ll:zsize.GPU_FB_MEMrepresents framebuffer memory for a particular GPU. It can be specified by-ll:fsize.GPU_MANAGED_MEMrepresents managed memory that can be cached by either host or GPU. It can be specified by-ll:msize.GPU_DYNAMIC_MEMrepresents dynamically-allocated framebuffer memory for a particular GPU. Its size is not fixed, but its maximum size can be specified by-cuda:dynfb_max.DISK_MEMrepresents disk memory visible to all processors on a node. It can be specified by-ll:dsize.HDF_MEMandFILE_MEMrepresent HDF and file memory visible to all processors on a node, respectively. They do not have memory space, so their sizes are always 0. These I/O related memories allow users to create instances for I/O operations.
For a list of all the memories kind supported by Realm, please refer to Full Memory Kind.
Affinity
Realm provides ProcessorMemoryAffinity and MemoryMemoryAffinity to query the affinity information
between processors and memories. In this example, we use ProcessorMemoryAffinity to retrieve the information,
including latency and bandwidth between a pair of memory and processor.
std::vector<Machine::ProcessorMemoryAffinity> pm_affinity;
machine.get_proc_mem_affinity(pm_affinity, p, m, true/*local_only*/);
unsigned bandwidth = pm_affinity[0].bandwidth;
unsigned latency = pm_affinity[0].latency;
ID
Realm ID is a 64-bit value that uniquely encodes both the type of the referred-to Realm object and its identity. Once we convert an ID into a hexadecimal number, it can be decoded. The following is an example of a processor ID and a memory ID:
Processor ID 1d00010000000001 is CPU.
System Memory ID 1e00010000000000 has 0 KB, bandwidth 100, latency 5.
The highest two digits (8 bits) are used to tell the type of an ID, e.g., 1d represents Processor and 1e represents Memory.
The next four digits (16 bits) are used to tell the owner node of an ID, e.g., 0001 means the processor/memory is on
node 1. The last two/three digits (8/12 bits) are used tell the local index of a Memory/Processor ID, e.g., 01 means
the cpu index is 1 while 00 means the memory index is 0.
Besides processor and memory, the ID of other Realm objects (e.g., Event and etc.) can also be decoded.
For a complete introduction to Realm ID, please refer to the ID header file.
Realm provides is_TYPE functions to test the type of an ID, e.g. in this example, is_processor is used to check if an ID
is a processor.
References
[1]: Full Processor Kind
[2]: Full Memory Kind
[3]: ID header file




