Realm Profiling
Introduction
After developing a functional Realm application, it is often necessary to profile and tune its performance. Realm provides a built-in profiling mechanism allowing users to query profiling information about all Realm operations. This tutorial will explain how to profile a task, instance creation, and copy operation. In addition, we also provide a sophisticated use case of utilizing profiling results to make load balance decisions for task launchers.
Here is a list of covered topics:
- Creating Profiling Requests
- Adding Performance Metrics
- Collecting Profiling Results
- Advanced Use Case: Load Balancing Task Launcher
- References
Creating Profiling Requests
Realm profiling allows clients to profile most of the operations in Realm, such as tasks and
instance creation. To tell Realm which profiling information
we want to collect, we need to create a ProfilingRequestSet object. Then we can use the
add_request method to create a ProfilingRequest object and add it to the ProfilingRequestSet
created before.
ProfilingRequestSet task_prs;
task_prs.add_request(profile_proc, COMPUTE_PROF_TASK, &task_result, sizeof(ComputeProfResultWrapper))
Realm profiling adopts a callback design, where a profiling task is launched
to report profiling results when an operation is completed. Therefore, as shown on the code above,
we need to specify the task (COMPUTE_PROF_TASK) and processor (profile_proc) to be used to launch the task for the ProfilingRequest.
It is noted that we use utility processor to launch profiling tasks because we want
to leave the cpu processor for regular tasks. It does not mean that profiling tasks
can not be launched on cpu processors.
Adding Performance Metrics
Once the ProfilingRequest is created, we can add performance metrics that need to be profiled into it.
Realm supports a variety of performance metrics that can be specified by the add_measurement
method of the ProfilingRequest object.
task_prs.add_request(profile_proc, COMPUTE_PROF_TASK, &task_result, sizeof(ComputeProfResultWrapper))
.add_measurement<ProfilingMeasurements::OperationTimeline>()
The following metrics are used in the tutorial.
OperationTimelinetracks the timestamps of different stages of an operation. In this example, we use it for tasks and copy operations.OperationProcessorUsagecontains the processor where the task is launched.InstanceTimelinetracks the timeline of an instance, including when the instance is created, ready for use, and destroyed.InstanceMemoryUsageincludes the memory where the instance is located and the size of the instance.OperationCopyInfotracks the transfer details for copy and fill operations.OperationMemoryUsagetracks the memory usage for copy and fill operations.
Realm supports many other kinds of ProfilingMeasurements, and they can be found in the profiling header file.
Once the metrics are set, we can pass the ProfilingRequestSet object to Realm APIs that
support profiling. For example, the following code is an example of setting the ProfilingRequestSet for tasks.
worker_procs[0].spawn(COMPUTE_TASK, &compute_task_args, sizeof(ComputeTaskArgs), task_prs).wait();
Collecting Profiling Results
As mentioned earlier, after an operation is completed, a profiling task is launched where
profiling results can be collected. Now we demonstrate how to collect
the results of the task launched in the main_task.
- To gather profiling results of a task, a
ProfilingResponseobject is created by passing theargsandarglen.
ProfilingResponse resp(args, arglen);
It is worth mentioning that for each ProfilingRequest, there will be one and only
one ProfilingResponse reported back to the client, such that the client can use
this feature to make sure all the profiling responses are collected.
- The argument of the profiling task can be retrieved by the
user_datamethod of the ProfilingResponse object.
const ComputeProfResultWrapper *result = static_cast<const ComputeProfResultWrapper *>(resp.user_data());
- Then, the
get_measurementmethod can be used to obtain the results of performance metrics requested by theadd_measurementmethod described in the previous section.
ProfilingMeasurements::OperationTimeline timeline;
if(resp.get_measurement(timeline)) {
metrics->ready_time = timeline.ready_time;
metrics->start_time = timeline.start_time;
metrics->complete_time = timeline.complete_time;
}
Sometimes, we want to be notified when
the results are ready inside the task where the operation needing profiling is launched. For
example, in this tutorial, we want to print the results in the main_task.
To achieve this, we can create a UserEvent, pass it as the argument of a profiling task
via the add_request method,
explicitly trigger it inside the profiling task, and then we can either
wait for the event as seen in this example explicitly or push it into a CompletionQueue and query it later.
Advanced Use Case: Load Balancing Task Launcher
Real-world applications often have tasks with different costs, and the same task may run at different speeds on different processors. Therefore, a simple round-robin task scheduling may not be sufficient. This section explores the benefits of empirical task distribution using profiling results. We repeatedly launch a batch of tasks on a group of worker processors, where each task has a different cost. A standard approach is to profile the tasks offline to obtain their execution time and compute a static load-balancing strategy. In this program, we start with a round-robin strategy. Then we profile the tasks and use the profiling results to pick the best worker processor for future iterations dynamically.
Here is the performance comparison between round-robin and profiling guided scheduling:
$ ./profiling -ll:cpu 6
[0 - 7f59ba4f3800] 0.189922 {3}{app}: With round-robin 38358 us
[0 - 7f59ba4f3800] 0.212061 {3}{app}: With profiling 21299 us
It turns out the profiling guided one achieves better task load balance.




