Realm Hello World
The tutorial begins with a simple “hello world” example that showcases the basics.
You can access the source code, the Makefile and CMakeList.txt for building
and running the application, in the tutorial/realm
directory of the repository.
By going through these tutorial programs in detail, we will demonstrate how to
effectively use the Realm C++ runtime API.
Here is a list of covered topics:
Realm Namespaces
Each Realm class has its own C++ header file. All classes are
aggregated in realm.h
and can be included in an application for
convenience. Each class definition is placed in a Realm
namespace to
avoid naming conflicts.
Realm Runtime Startup
The following code illustrates how to initializes a singleton Runtime
object.
Runtime rt;
rt.init(&argc, &argv);
The initialization must be performed by every application process. After initialization is complete, the runtime remains mostly idle (except system status checks) and waits for the task launches.
Registering Realm Tasks
A Realm task is an asynchronous operation, which can be defined by a task ID
and one or more task bodies representing the implementations of the task.
This tutorial contains two types of tasks: main task and hello world task.
Therefore, we use an enumeration of two members to store the IDs for each task with which the Realm runtime will associate.
The first task should always start from or larger than Processor::TASK_ID_FIRST_AVAILABLE
, because Realm reserve
some numbers for internal tasks.
enum {
MAIN_TASK = Processor::TASK_ID_FIRST_AVAILABLE + 0,
HELLO_TASK,
};
A task can have an implementation on every kind of processor, e.g., CPU, GPU, etc.
For example, the main_task
is the CPU implementation of the main task, while
the hello_cpu_task
, hello_gpu_task
and hello_omp_task
are
the CPU, GPU and OpenMP implementations of the hello world task, respectively.
It is worth noting that in Realm, a CPU processor typically refers to a physical CPU core,
and therefore, the implementation of a CPU task generally is single-threaded.
A GPU processor has a CUDA/HIP context associated with it,
which allows it to launch GPU tasks containing CUDA/HIP kernels.
However, for the sake of simplicity, we do not launch actual CUDA kernels in the GPU task in this tutorial.
A task has to be registered on processors before the Realm runtime can launch it.
To register a task, the static method Processor::register_task_by_kind
is used shown as follows.
Processor::register_task_by_kind(Processor::LOC_PROC, false /*!global*/,
MAIN_TASK,
CodeDescriptor(main_task),
ProfilingRequestSet(),
0, 0).wait();
This function takes several parameters:
target_kind
- describes which kind of processor the task will be launched on. We will introduce the processor API in the next example.global
- if set to true, the task is visible on all nodes. It is noted that in this example,register_task_by_kind
is called from the main function, which is performed by every application process when running with mpirun, thus, it is still visible on all nodes evenglobal
is false. However, when callingregister_task_by_kind
from a single task, we need to set theglobal
to true if we need to make it visible on all nodes.TaskFuncID
- the task ID we defined in the enumeration.CodeDescriptor
- an object that describes a blob of code as a callable function. In this case, the implementation of theMAIN_TASK
ismain_task
.user_data
anduser_data_len
- the data passed into the task (the 3rd and 4th paramters of the task implementation).
register_task_by_kind
is an asynchronous function that does not guarantee that task registration is done after it returns.
For this reason, it returns an Event
object, allowing us to wait for completion explicitly. The usage of events is introduced in the following tutorials.
As mentioned before, Realm allows a task to have multiple implementations. When the task is launched, Realm automatically selects the appropriate implementation based on the processor where it is being executed.
Launching Tasks
Before launching a task, we need to pick a processor. In this example, we select the first CPU core,
GPU, and OpenMP processor to launch the CPU, GPU and OpenMP tasks, respectively. An example of selecting the first CPU
core is shown as follows. We will introduce the Machine
API in the next tutorial.
Processor p = Machine::ProcessorQuery(Machine::get_machine())
.only_kind(Processor::LOC_PROC)
.first();
In the area of high-performance computing, most distributed programs start by invoking a main function across a number of parallel processes
concurrently, in what is known as the Single-Program-Multiple-Data (SPMD) execution model. To transition from the SPMD-style execution model
to the task-based model employed by Realm, the collective_spawn
method is the most expedient way to bridge this gap.
In this example, the MAIN_TASK
is launched using the collective_spawn
method, as seen below:
Event e = rt.collective_spawn(p, MAIN_TASK, 0, 0);
The main task is not SPMD-style, and now
the program is transitioned from the SPMD model into the task-based one. Additionally, Realm provides the collective_spawn_by_kind
method,
which can be used to launch an SPMD task where each process launches one task.
Within the main task, we use the spawn
method of the Processor object to launch the HELLO_TASK
on the selected CPU, GPU and
OpenMP processor, respectively. An example of spawning the HELLO_TASK
on the CPU processor is shown below:
Event cpu_e = cpu.spawn(HELLO_TASK, NULL, 0);
Like register_task_by_kind
, the spawn
and collective_spawn
are
also asynchronous functions that return an Event
object. Then we can either
invoke the wait
method to wait for the completion of the task or pass it as the pre-condition of other Realm
operations. We will introduce more details about Realm events in the following tutorial.
It is worth mentioning that there is no task hierarchy in Realm, so the completion of a Realm task does not imply that all its sub-tasks
have also been completed. If the cumulative property is needed, users need to implement it explicitly. For example, to ensure that all
HELLO_TASK
are completed before exiting the MAIN_TASK
, the wait
method is used.
launch_task(p).wait();
Launching Tasks without Using collective_spawn
The HELLO_TASK
can also be launched from the main function without using the collective_spawn
.
To achieve it, we need to mimic the collective_spawn
behavior by explicitly picking a process, such as the rank 0,
to launch the hello world tasks using the spawn
method.
Processor local_proc = Machine::ProcessorQuery(Machine::get_machine())
.only_kind(Processor::LOC_PROC).local_address_space()
.first();
if (local_proc.address_space() == 0) {
Event e = launch_task(p);
rt.shutdown(e);
}
However, it is generally recommended to use collective_spawn
to launch a main task and then spawn
tasks within it.
Shuting Down Runtime
At the end of a Realm program, it is necessary to shut down the runtime using the shutdown
and wait_for_shutdown method
s.
In this example, we instruct the runtime to initiate shutdown
as soon as the MAIN_TASK
or all Hello_Task
are finished, respectively.
While the wait_for_shutdown
method must be called by all processes, it is not necessary for shutdown
.
However, if the shutdown
is called from all processes, the pre-conditional event must be identical across all processes.
Therefore, in this program, the shutdown
is called from all processes when using the collective_spawn
, but only on rank 0 without
collective_spawn
.