Legion Runtime Class #5: Realm
TweetAbout Legion Runtime Class
These notes are closely based on the set of Legion Runtime Class videos produced by the Legion developers. They are my own notes and code walks, and any errors or things that are just plain wrong represent my own mistakes.
Today's notes are based on the following video:
Overview
The low-level runtime in Legion is an implementation of Realm. The low-level interface is primarily concerned with scheduling an asynchronous task graph that is built up by an application running above the low-level runtime. The interface that is exposed to the high-level runtime that we have been discussing in previous posts is defined in lowlevel.h
. There are several entities that are represented at the low-level:
Machine
is what we are running on (e.g. processors and memories)Event
UserEvent
Reservation
Memory
Processor
There are other entities as well. For instance, Barrier
, RegionInstances
and IndexSpace
. Note that field spaces are not represented as they are only defined as part of the high-level runtime. Other parts of the low-level runtime are spread across other files: lowlevel_gpu.[h,cc]
for CUDA stuff, lowlevel_dma.[h,cc]
for DMA infastructure, and activemsg.[h,cc]
for network messaging.
Everything in the low-level is node independent. That is, while raw pointers are used internally to a runtime, users of the low-level interact with portable handles. A handle is called an ID in Legion, and is either a 32-bit or 64-bit integer. Here is the Memory
object. Notice that the only instance method shown is an id_t id
ID value. Most objects are this light-weight.
class Memory {
public:
typedef IDType id_t;
id_t id;
bool operator<(const Memory &rhs) const { return id < rhs.id; }
bool operator==(const Memory &rhs) const { return id == rhs.id; }
bool operator!=(const Memory &rhs) const { return id != rhs.id; }
class Impl;
Impl *impl(void) const;
...
Given one of these ID structures, it can be turned into an implementation by calling impl
. Note that the exposure of the implementation is only historic, and will be removed in the future for full implementation hiding.
Each type of object will also have slightly different methods defined on it. For instance, applications can run tasks on processors by calling the spawn
method on a Processor
object:
class Processor {
public:
typedef IDType id_t;
id_t id;
Event spawn(TaskFuncID func_id, const void *args, size_t arglen,
Event wait_on = Event::NO_EVENT, int priority = 0) const;
...
Internally an ID is represented by class ID
, which is used to deconstruct the 32-bit or 64-bit integer ID values. An ID encodes several types of information in the current low-level distributed implementation. First, a few of the upper bits are used to encode the low-level object type that the ID represents. This isn't strictly necessary, but does make debugging and defensive programming easier. The ID also encodes the node in the distributed system that owns the object, or created the object, which is used for routing requests. Finally, there is an index component, that is decomposed into high and low parts. These are bits used in a type specific way, and we'll explore it later.
One of the goals of the low-level runtime is to provide the minimum functionality to support applications with as little overhead as possible. One method to do this is to avoid locks as much as possible. Many of the low-level data structures are monotonic supporting access without locks. An example is a a counter that only increments. Or a pointer variable that only transitions from null
to non-null
. This type of state can be examined without locks, and users can reason about any future state of that value. A typical pattern is to test a value without a lock, and only take a lock to avoid races.
One can imagine that a very common task is to convert an ID (e.g. an Event
) into a pointer to its local implementation on a node, effectively we need a map. However, it would be nice to exploit properties of the system to make this as efficient as possible.
ID / Implementation Conversion
The DynamicTable
class in Legion provides such a facility. This data structure is used to map an ID to a pointer in the tree, and it minimizes locking. It is a dynamic structure, but the elements themselves never move. I won't go into detail about this. A full post is needed to describe this data structure.
In the end, the DyanmicTable
returns implementations given an ID, where in the below code snippet n->events
is such a table:
Event::Impl *Runtime::get_event_impl(ID id)
{
switch(id.type()) {
case ID::ID_EVENT:
{
Node *n = &runtime->nodes[id.node()];
Event::Impl *impl = n->events.lookup_entry(id.index(), id.node());
return impl;
...
Event
The big thing in the low-level runtime is the Event
. An Event
effectively represents a point in time at which something has occurred. So, an Event
represent the completion of a task. Importantly, an Event
can be used as a pre-condition for other events or other work. Using this property a dependency graph can be constructed.
The only thing that an application can really do with an event is ask if it has triggered. The event Event::NO_EVENT
is a special singleton event that has always been triggered. If an application has many events, these events can be combined into a new event that reports that it has been triggered only after all its child events have triggered:
// creates an event that won't trigger until all input events have
static Event merge_events(const std::set<Event>& wait_for);
static Event merge_events(Event ev1, Event ev2,
Event ev3 = NO_EVENT, Event ev4 = NO_EVENT,
Event ev5 = NO_EVENT, Event ev6 = NO_EVENT);
The main thing that events are used for is as pre-conditions. Here we spawn a task on a processor, and have the option to pass a wait_on
event that will cause the task to not be spawned until the wait_on
event has triggered.
Event spawn(TaskFuncID func_id, const void *args, size_t arglen,
Event wait_on = Event::NO_EVENT, int priority = 0) const;
The runtime is responsible for marking an event as having triggered. However, the special UserEvent
is an event that can be used as a event whose triggering is controlled by an application.
UserEvent
has the UserEvent::trigger
method. In the high-level runtime, there is a notion of when a task is done. An example use of UserEvent
is wanting some things to happen contingent on a task completing. Instead of using the event that comes back from the spawn command, there might be more cleanup that needs to occur. A solution is to create a UserEvent
, merge a bunch of other events together, and use that combined event as a place holder for a collection of other operations as a pre-condition on the UserEvent
. Well, that wasn't very clear, but the picture should be clear: you can use events and pre-conditions to create dependency graphs of work.
Barrier
The Barrier
is a unique type of event. It is similar to an MPI barrier in that it represents an entity that threads arrive at. However, it has quite different semantics.
First, a barrier in Legion can be examined without actually being a thread that arrives at the barrier, thus threads can manipulate it through its API. One of the more interesting aspects of Legion barriers is that they are generational. There is a notion of threads arriving at barriers more than once, each time representing a new barrier. What's cool is that since arrival itself is asynchronous, one thread may arrive at the second generation of the barrier before all other threads have arrived at the first generation.
The final property is that the number of arrivals for a barrier can be changed dynamically. This is important because a task in Legion may choose to sub-divide its work such that a barrier that expected one arrival may need to wait on many sub-tasks to arrive. The only restriction on updating the arrival count is that a task originally involved with arrival must adjust the count before announcing its arrival (if applicable) to avoid race conditions.