About Legion Runtime Class

These notes are closely based on the set of Legion Runtime Class videos produced by the Legion developers. They are my own notes and code walks, and any errors or things that are just plain wrong represent my own mistakes.

Today's notes are based on the following video:

Overview

The CopyOp is an operation that copies data between two logical regions, most likely between different region trees. The motivation for the creation of this operation is that users have created their own routines for performing such copies, but the runtime already has a lot of infrastructure for high performance copying, and exposing that as an operation lets user take advantage of those facilities found the runtime.

The first difference to notice between the CopyOp and the inline mapping operation MapOp discussed last time is that the CopyOp extends the SpeculativeOp abstract operation class.

class CopyOp : public Copy, public SpeculativeOp {
  public:
    static const AllocationType alloc_type = COPY_OP_ALLOC;
  public:
    CopyOp(Runtime *rt);
    CopyOp(const CopyOp &rhs);
    virtual ~CopyOp(void);

The CopyOp begins its life at Runtime::issue_copy_operation. The first thing that happens is we grab a copy operation from one of the operation pools that is maintained by the runtime:

void Runtime::issue_copy_operation(Context ctx, const CopyLauncher &launcher)
{
  CopyOp *copy_op = get_available_copy_op();
  copy_op->initialize(ctx, launcher, false/*check privileges*/);

Initialization of the copy operation involves a lot of work. For instance, the region requirements of the source and destination regions are copied into the operation from the launcher object. Information is passed along that the copy operation, despite performing mappings, won't be actually be accessed by a task and can thus there is more flexibility over the memories that can be chosen. During initialization there is a lot of error checking for compatibility and privileges, but won't go into detail regarding that.

Once the operation is initialized we resume back in issue_copy_operation. The next thing we do is look for conflicts. Note that when we issue the copy operation the parent task may actually have active mappings. And if those mappings conflict with the requirements that the copy operation needs then something has to give. Note that the copy operation cannot wait on the parent to unmap the conflicting regions because the parent in turn is waiting on the copy operation to complete. This circular deadlock is prevented by the runtime by transparently unmapping the conflicting regions.

  std::vector<PhysicalRegion> unmapped_regions;
  ctx->find_conflicting_regions(copy_op, unmapped_regions);
  if (!unmapped_regions.empty())
  {
    // Unmap any regions which are conflicting
    for (unsigned idx = 0; idx < unmapped_regions.size(); idx++)
    {
      unmapped_regions[idx].impl->unmap_region();
    }
  }
  add_to_dependence_queue(proc, copy_op);
...

The unmap operations are also added to the pipeline. Note that they are added before addtodependencequeue_ for this copy operation, so the unmappings will complete first. If any regions were unmapped then these mappings must be restored before the parent resumes because it expects them to be present. This is done by looping over the regions that were unmapped and creating new inline mappings.

  // Remap any regions which we unmapped
  if (!unmapped_regions.empty())
  {
    std::set<Event> mapped_events;
    for (unsigned idx = 0; idx < unmapped_regions.size(); idx++)
    {
      MapOp *op = get_available_map_op();
      op->initialize(ctx, unmapped_regions[idx]);
      mapped_events.insert(op->get_completion_event());
      add_to_dependence_queue(proc, op);
    }
    // Wait for all the re-mapping operations to complete
    Event mapped_event = Event::merge_events(mapped_events);
    if (!mapped_event.has_triggered())
    {
      pre_wait(proc);
      mapped_event.wait();
      post_wait(proc);
...

Notice that as we create the inline mappings that we take all the completion events and combine them. We'll wait until the inline mappings finishes before continuing. While idle, other tasks might be scheduled onto the CPU. Note that generally applications should try to avoid situations in which the runtime is automatically performing these unmap and remap operations because an application is often in a better position to reduce such conflicts. The runtime still ensures correctness, but the latency associated with waiting for the inline mappings to be recreated cannot be hidden.

The first stop for an operation in the pipeline is dependence analysis. At a high-level dependence analysis for a copy operation is similar to the inline mapping we saw in the last class. Note that here we loop through all of the source requirements and perform analysis on each one. Not shown here is a second loop over all of the destination regions. Next let's focus on register_predicate_dependence, which is an aspect of speculative operations that we haven't seen yet.

void CopyOp::trigger_dependence_analysis(void)
{
  begin_dependence_analysis();
  // Register a dependence on our predicate
  register_predicate_dependence();
  for (unsigned idx = 0; idx < src_requirements.size(); idx++)
  {
    runtime->forest->perform_dependence_analysis(parent_ctx->get_context(),
                 this, idx, src_requirements[idx], src_privilege_paths[idx]);
      }
...
      end_dependence_analysis();

Recall that the copy operation is an instance of a speculative operation:

class CopyOp : public Copy, public SpeculativeOp {

What this means is that the operation completes or doesn't complete based on a predicate value. The predicate value is set when we initialized the copy operation (note the last parameter to initialize_speculation):

void CopyOp::initialize(SingleTask *ctx, const CopyLauncher &launcher, bool check_privileges)
{
  parent_ctx = ctx;
  parent_task = ctx;
  initialize_speculation(ctx, true/*track*/, Event::NO_EVENT,
        launcher.src_requirements.size() + launcher.dst_requirements.size(), 
        launcher.predicate);

The register_predicate_dependence is a call in CopyOp::trigger_dependence_analysis is part the abstract base class, a SpeculativeOp. The predicate for this CopyOp at this point is already in the pipeline (such as the Future from a previously launched task). What this call does is to attach to that predicate as a dependence so that the dependency becomes part of the overall dependency analysis.

Normally after dependence analysis an operation is ready to map. However, this may not always be true for speculative operations. One of the things that can happen is that the predicate that is being speculated on tells us that it has resolved its value. The interface for this is in the speculative operation class.

class SpeculativeOp : public Operation, PredicateWaiter {
  public:
    virtual void trigger_mapping(void);
  public:
    // Call this method for inheriting classes 
    // to indicate when they should map
    virtual bool speculate(bool &value) = 0;
    virtual void resolve_true(void) = 0;
    virtual void resolve_false(void) = 0;
  public:
...

Note that speculate, resolve_true, and resolve_false are implemented by the CopyOp. The SpeculativeOp class takes care of all the plumbing and logic that is shared among operations that are speculative.

How does mapping occur for the CopyOp (related: CopyOp doesn't implement trigger_mapping)? Note that trigger_mapping is called once dependencies are resolved. Instead of it being handled by the CopyOp, the speculative operation will take care of things in SpeculativeOp::trigger_mapping, and communicate the results to the sub-class via the resolve_true, resolve_false, and speculate pure virtual methods. The first thing we do is handle some special cases where there is no predicate, in which case we just examine whatever the state of speculation is and call down into the sub-class:

void SpeculativeOp::trigger_mapping(void)
{
  // Quick out
  if (predicate == NULL)
  {
    if (speculation_state == RESOLVE_TRUE_STATE)
      resolve_true();
    else
      resolve_false();
    return;
  }

But we might have a real predicate, and we may not have resolved its value yet. So we register ourselves as a waiter on the predicate. Several things happen here. We may try to speculate on the value (see discussion below), and then depending on the outcome of more logic in this function that has been omitted, we trigger the result of the predicate with the sub-class.

...
  bool value, speculated;
  bool valid = predicate->register_waiter(this, get_generation(), value);
  // Now that we've attempted to register ourselves with the
  // predicate we can remove the predicate reference
  predicate->remove_predicate_reference();
  if (!valid)
    speculated = speculate(value);
...
  if (continue_true)
    resolve_true();
  if (continue_false)
    resolve_false();
  if (need_resolution)
    resolve_speculation(); 
}

The mapper can be asked to speculate on the value, as mappers may be able to make a good guess. The example cited in the class is that of conjugate gradient in which convergence is a predicate. Since convergence occurs at the end, it's safe to guess that we haven't yet converged at any particular moment. When we speculate true, we start running the operation even though we don't yet know the final value. Another valid scenario is that the mapper has no guess. In this case the mapper can request that no speculation occur, and instead just block until the value is known.

So if all this is hidden from the copy operation, how does it actually start executing? It has implementations of speculate, resolve_true, and resolve_false. These were called by the SpeculativeOp class when it figured out what's going on with the speculation and the predicate.

void CopyOp::resolve_true(void)
{
  // Put this on the queue of stuff to do
  runtime->add_to_local_queue(parent_ctx->get_executing_processor(),
             this, false/*prev fail*/);
}

void CopyOp::resolve_false(void)
{
  // Mark that this operation has completed both
  // execution and mapping indicating that we are done
  // Do it in this order to avoid calling 'execute_trigger'
  complete_execution();
  complete_mapping();
}

Now that's cool. If true we add ourselves to the queue for execution, and the predicate was false then we just clean up and abort.

Execution of the copy operation in CopyOp::trigger_execution is similar to inline mapping in many respects. We premap all of the sources and all the destinations. The mapper is consulted about what memories should be preferred, and then mapping is attempted. Failures might occur and the operation can report this and be restarted later. After all the regions have mapped and are ready to go, the copy_across method on the region forest objects are invoked, which is a low-level operation that performs the actual copying.

As we call copy_across we collect the completion events:

for (unsigned idx = 0; idx < src_requirements.size(); idx++)
{
  // Now issue the copies from source to destination
  copy_complete_events.insert(runtime->forest->copy_across(src_contexts[idx],
           dst_contexts[idx], src_requirements[idx],
           dst_requirements[idx], src_ref, dst_ref, sync_precondition));

  ...

  // Launch the complete task if necessary 
  Event copy_complete_event = Event::merge_events(copy_complete_events);

Then we use the copy_complete_event that represents all of the issued copies, and check to see if the work is done. If it isn't, we will spin up a new task predicated on the copy event that will eventually finish execution for us:

// Handle the case for marking when the copy completes
if (!copy_complete_event.has_triggered())
{
  // Issue a deferred trigger on our completion event
  // and mark that we are no longer responsible for
  // triggering our completion event.
  completion_event.trigger(copy_complete_event);
  need_completion_trigger = false;

  DeferredCompleteArgs deferred_complete_args;
  deferred_complete_args.hlr_id = HLR_DEFERRED_COMPLETE_ID;
  deferred_complete_args.proxy_this = this;
  util.spawn(HLR_TASK_ID, &deferred_complete_args, 
         sizeof(deferred_complete_args), copy_complete_event);
...

The deferred task that runs will eventually call deferred_complete on this instance of the CopyOp operation (notice the proxy_this = this line). The deferred_complete method for the CopyOp will turn around and call complete_execution. Boom, done.

Predicate Operations

Predicates are of type Predicate:

class Predicate {
  public:
    static const Predicate TRUE_PRED;
    static const Predicate FALSE_PRED;

Predicates can be created from boolean futures:

Predicate create_predicate(Context ctx, const Future &f);

A non-boolean future can be created using the deferred task execution model by launching a task to evaluate the future and then return a boolean future. Predicates can also be created by combining predicates using logic operators:

Predicate predicate_not(Context ctx, const Predicate &p);
Predicate predicate_and(Context ctx, const Predicate &p1, const Predicate &p2);
Predicate predicate_or(Context ctx, const Predicate &p1, const Predicate &p2);
...

All of these are themselves operations that are shuffled through the execution pipeline.

Predicate Runtime::create_predicate(Context ctx, const Future &f) 
{
  FuturePredOp *pred_op = get_available_future_pred_op();
  // Hold a reference before initialization
  Predicate result(pred_op);
  pred_op->initialize(ctx, f);
  add_to_dependence_queue(proc, pred_op);
...
}

That's it for operations for today.