Legion Runtime Class #17: Logical Region Tree Traversal
TweetOverview
In this class we'll start to talk about logical region tree traversal.
About Legion Runtime Class
These notes are closely based on the set of Legion Runtime Class videos produced by the Legion developers. They are my own notes and code walks, and any errors or things that are just plain wrong represent my own mistakes.
Today's notes are based on the following video:
Logical Dependence Analysis
Today we'll start looking at dependence analysis. Recall that the first stage of the operational pipeline is dependence analysis, which registers dependencies on operations that were issued before it. We've seen before how that is invoked via perform_dependence_analysis on the region tree.
    void MapOp::trigger_dependence_analysis(void)
    {
      begin_dependence_analysis();
      runtime->forest->perform_dependence_analysis(parent_ctx->get_context(),
                                                   this, 0/*idx*/, requirement,
                                                   privilege_path);
      end_dependence_analysis();
    }
Looking at other operations we see that the same function is used by all operations:
    void CopyOp::trigger_dependence_analysis(void)
    {
      begin_dependence_analysis();
      // Register a dependence on our predicate
      register_predicate_dependence();
      for (unsigned idx = 0; idx < src_requirements.size(); idx++)
      {
        runtime->forest->perform_dependence_analysis(parent_ctx->get_context(),
                                                     this, idx, 
                                                     src_requirements[idx],
                                                     src_privilege_paths[idx]);
      }
      for (unsigned idx = 0; idx < dst_requirements.size(); idx++)
      {
        unsigned index = src_requirements.size()+idx;
        runtime->forest->perform_dependence_analysis(parent_ctx->get_context(),
                                                     this, index, 
                                                     dst_requirements[idx],
                                                     dst_privilege_paths[idx]);
      }
      end_dependence_analysis();
    }
So today we will look into dependence analysis and how that works. At a high-level perform_dependence_analysis traverses the region tree from where the parent task has privileges down to where the operation is requesting privileges. Whenever we find other operations that are interfering along this path dependencies are registered.
The first thing we do in the function is grab a reference to the parent node in the tree. The parent is set by the application when the region requirement is specified.
    void RegionTreeForest::perform_dependence_analysis(RegionTreeContext ctx,
                                                  Operation *op, unsigned idx,
                                                  RegionRequirement &req,
                                                  RegionTreePath &path)
    {
      // If this is a NO_ACCESS, then we'll have no dependences so we're done
      if (IS_NO_ACCESS(req))
        return;
      RegionNode *parent_node = get_node(req.parent);
Next we compute a field mask for the privileges being requested by the region requirement. Note that these fields are specified by an application using an STL structure, but the FieldMask here is computed which is a very efficient way to compare the set of fields against other field sets during the traversal for detecting interference.
      FieldMask user_mask = 
        parent_node->column_source->get_field_mask(req.privilege_fields);
Next we create a logical user structure that stores a bunch of information about whats going on. The RegionUsage structure is a compact form of the region requirement that only stores what is necessary for the analysis. The tracing is an optimization that is used to reuse analysis results, but we won't cover that here today.
      // Then compute the logical user
      LogicalUser user(op, idx, RegionUsage(req), user_mask); 
      TraceInfo trace_info(op->already_traced(), op->get_trace(), idx, req); 
Finally we begin the traversal. The traversal is handled by register_logical_node, and we'll look at this in detail next.
      // Finally do the traversal, note that we don't need to hold the
      // context lock since the runtime guarantees that all dependence
      // analysis for a single context are performed in order
      parent_node->register_logical_node(ctx.get_id(), user, path, trace_info);
      // Now check to see if we have any simultaneous restrictions
      // we need to check
      if (req.restricted)
      {
        RestrictedTraverser traverser(ctx.get_id(), path);
        traverser.traverse(parent_node);
        // Check to see if there was user-level software
        // coherence for all of our fields.
        FieldMask restricted_mask = user_mask - traverser.get_coherence_mask();
        // If none of our fields are still restricted
        // then we can remove the restricted field on
        // our region requirement.  Otherwise we keep
        // the restriction.
        if (!restricted_mask)
          req.restricted = false; 
      }
    }
Register Logical Node
We start traversing the tree from the parent's node (see above). Notice that this isn't specific to a logical region nor logical partition, but rather handles both cases. This is a recursive function that traverses the tree, and the first thing we do is see if we have reached the destination.
We'll dissect this function non-linearly to make it a bit easier to display. At a high-level we perform a set of actions on every node, take the recursive path if we haven't reached our destination, or perform an action at the target. First we check to see if we have arrived at the target node.
    void RegionTreeNode::register_logical_node(ContextID ctx,
                                               const LogicalUser &user,
                                               RegionTreePath &path,
                                               const TraceInfo &trace_info)
    {
      LogicalState &state = *(logical_states.lookup_entry(ctx));
      const unsigned depth = get_depth();
      const bool arrived = !path.has_child(depth);
Next we skip the work that is done on every node, and show the else case for if we have arrived at the target because its very simple. We just visit the child. There is an open-only optimization, but we'll discuss that another time.
      else // We're still not there, so keep going
      {
        Color next_child = path.get_child(depth);
        RegionTreeNode *child = get_tree_child(next_child);
        if (open_only)
          child->open_logical_node(ctx, user, path, trace_info.already_traced);
        else
          child->register_logical_node(ctx, user, path, trace_info);
      }
    }
Next we'll look at what happens to every node. The first thing is we check to see if any sub-trees are open, that is, they have operations registered on them. We have a single path to our target, but other operations may be using other regions and are interfering. When this happens we close those trees and do analysis.
The LogicalCloser object is a helper for close operations. The siphon_logical_children call examines what needs to be closed, and also reports if the open-only optimization can be used.
      // First check to see if we need to do any close operations
      // Close up any children which we may have dependences on below
      LogicalCloser closer(ctx, user, arrived/*validates*/);
      // There is one special case where we don't need to actually make
      // any close operations. In the case where we have arrived and we
      // have READ_WRITE privileges, we know we will always be the first
      // one to this node and will do our own physical close, so there is
      // no need to register a close operations here.
      bool open_only = siphon_logical_children(closer, state, user.field_mask,
                !arrived || IS_READ_ONLY(user.usage) || IS_REDUCE(user.usage),
                                        arrived ? -1 : path.get_child(depth));
If there were any fields that had to be closed then at this point we register close operations that will do their own dependence analysis. We'll come back to this later.
      // We always need to create and register close operations
      // regardless of whether we are tracing or not
      // If we're not replaying a trace we need to do work here
      // See if we need to register a close operation
      if (closer.has_closed_fields())
      {
        // Generate the close operations         
        int next_child = arrived ? -1 : path.get_child(depth); 
        closer.initialize_close_operations(this, user.op, 
                                           next_child, trace_info);
        // Perform dependence analysis for all the close operations
        closer.perform_dependence_analysis(user,
                                           state.curr_epoch_users,
                                           state.prev_epoch_users);
        // Now we can flush out all the users dominated by closes
        const FieldMask &closed_mask = closer.get_closed_mask();
        filter_prev_epoch_users(state, closed_mask);
        filter_curr_epoch_users(state, closed_mask);
        // Now we can add the close operations to the current epoch
        closer.register_close_operations(state.curr_epoch_users);
      }
The next thing we do is some dependence analysis. To understand what we are going to see next we need to explain a bit more about the logical state that is attached to nodes. The state associated with a node is recorded in LogicalState. In particular, we are interested in the current and previous epoch users:
      LegionList<FieldState,LOGICAL_FIELD_STATE_ALLOC>::track_aligned field_states;
      LegionList<LogicalUser,CURR_LOGICAL_ALLOC>::track_aligned curr_epoch_users;
      LegionList<LogicalUser,PREV_LOGICAL_ALLOC>::track_aligned prev_epoch_users;
In order to avoid an N^2 cost dependence analysis phase, we use recognize that we don't actually need to check the dependencies between the current operation and all previous operations. Rather, we can rely on transitivity properties. Based on the privileges being requested and the previous requests, a new operation joining the current epoch either causes all of the current epoch to replace the previous epoch, or it must join the current epoch and also perform analysis on the previous epoch.
The routine perform_dependence_checks does this moving. Effectively what happens is we consider the current epoch first, and decide if the new operation dominates, meaning that there is a full transitive property in affect. If the new operation replaces the current epoch and the current epoch becomes the previous epoch. If not then we join the current epoch and also examine the previous epoch.
      // We also always do our dependence analysis even if we have
      // already traced because we need to pick up dependences on 
      // any dynamic close operations that we need to do
      // Now that we registered any close operation, do our analysis
      FieldMask dominator_mask = 
             perform_dependence_checks<CURR_LOGICAL_ALLOC,false>(user,
               state.curr_epoch_users, user.field_mask, arrived/*validates*/);
      FieldMask non_dominated_mask = user.field_mask - dominator_mask;
      // For the fields that weren't dominated, we have to check
      // those fields against the previous epoch's users
      if (!!non_dominated_mask)
      {
        perform_dependence_checks<PREV_LOGICAL_ALLOC,false>(user,
            state.prev_epoch_users, non_dominated_mask, arrived/*validates*/);
      }
Now we just have a bit more work to do if we arrived at the destination node. If we had any fields in our dominator mask, then we can filter the epochs based on the new user coming in. Finally we set a mapping reference which is used to prevent an operation that has run and possibly even completed from being reclaimed if there are other references to it.
      if (arrived)
      {
        // If we dominated and this is our final destination then we 
        // can filter the operations since we actually do dominate them
        if (!!dominator_mask)
        {
          // Dominator mask is not empty
          // Mask off all the dominated fields from the previous set
          // of epoch users and remove any previous epoch users
          // that were totally dominated
          filter_prev_epoch_users(state, dominator_mask); 
          // Mask off all dominated fields from current epoch users and move
          // them to prev epoch users.  If all fields masked off, then remove
          // them from the list of current epoch users.
          filter_curr_epoch_users(state, dominator_mask);
        }
        // Here is the only difference with tracing.  If we already
        // traced then we don't need to register ourselves as a user
        if (!trace_info.already_traced)
        {
          // Register ourself with as a current user of this region
          // Record a mapping reference on this operation
          user.op->add_mapping_reference(user.gen);
          // Add ourselves to the current epoch
          state.curr_epoch_users.push_back(user);
        }
      }
Performing Dependence Checks
Now we'll check out perform_dependence_checks. We can use user, the current operation, and prev_users, which comes from the set of users in an epoch that we just saw. The check_mask limits the set of fields that analysis is performed on.
    template<AllocationType ALLOC, bool HAS_SKIP>
    /*static*/ FieldMask RegionTreeNode::perform_dependence_checks(
      const LogicalUser &user, 
      typename LegionList<LogicalUser, ALLOC>::track_aligned &prev_users,
      const FieldMask &check_mask, bool validates_regions,
      Operation *to_skip /*= NULL*/, GenerationID skip_gen /* = 0*/)
    {
So this function really represents the indivisible actions in dependence analysis. This function is pretty big, so I'll just toss in the most important part. We iterate over all of the previous users, and see if we need to register a dependence for our current user.
      for (typename LegionList<LogicalUser, ALLOC>::track_aligned::iterator 
            it = prev_users.begin(); it != prev_users.end(); /*nothing*/)
      {
Are there any overlapping fields? If there are is overlap, then we need to continue doing analysis. However, if there aren't, we can just all the way to bottom and go on to the next user (we'll come back to that). So, when we have overlapping fields:
        FieldMask overlap = user_check_mask & it->field_mask;
        if (!!overlap)
        {
Are there dependencies based on user privileges? If there aren't any dependencies, then we just update masks and move on.
          DependenceType dtype = check_dependence_type(it->usage, user.usage);
          bool validate = validates_regions;
          switch (dtype)
            case NO_DEPENDENCE:
              {
                // No dependence so remove bits from the dominator mask
                dominator_mask -= it->field_mask;
                break;
              }
Some other cases here, but we'll skip these today.
            case ANTI_DEPENDENCE:
            case ATOMIC_DEPENDENCE:
            case SIMULTANEOUS_DEPENDENCE:
            case PROMOTED_DEPENDENCE:
              {
                // Mark that these kinds of dependences are not allowed
                // to validate region inputs
                validate = false;
                // No break so we register dependences just like
                // a true dependence
              }
Otherwise we have a true dependence and we'll try to register the dependence on the previous operation.
            case TRUE_DEPENDENCE:
              {
                // Do this after the logging since we might 
                // update the iterator.
                // If we can validate a region record which of our
                // predecessors regions we are validating, otherwise
                // just register a normal dependence
                if (user.op->register_region_dependence(user.idx, it->op, 
                                                        it->gen, it->idx,
                                                        dtype, validate,
                                                        overlap))
                {
                  // Now we can prune it from the list and continue
                  it = prev_users.erase(it);
                  continue;
                }
                else
                {
                  // hasn't commited, reset timeout and continue
                  it->timeout = LogicalUser::TIMEOUT;
                  it++;
                  continue;
                }
                break;
              }
Check Dependence Type
Above we saw check_dependence_type which figured out how to region users interfered (if at all). The policy for that is implemented in legion_utilities.h. If both read-only, then no dependencies.
    static inline DependenceType check_dependence_type(const RegionUsage &u1,
                                                       const RegionUsage &u2)
    {
      // Two readers are never a dependence
      if (IS_READ_ONLY(u1) && IS_READ_ONLY(u2))
      {
        return check_for_promotion(u1, NO_DEPENDENCE);
      }
If both reduction with the same op, then they usually don't have a dependence:
      else if (IS_REDUCE(u1) && IS_REDUCE(u2))
      {
        // If they are the same kind of reduction, no dependence, 
        // otherwise true dependence
        if (u1.redop == u2.redop)
          return check_for_promotion(u1, NO_DEPENDENCE);
        else
          return TRUE_DEPENDENCE;
      }
      else
      {
Then all the cases for at least one write. All the cases are just enumerated and tested. Check out the logic.
        // If anything exclusive 
        if (IS_EXCLUSIVE(u1) || IS_EXCLUSIVE(u2))
        {
          return check_for_anti_dependence(u1,u2,TRUE_DEPENDENCE/*default*/);
        }
        // Anything atomic (at least one is a write)
        else if (IS_ATOMIC(u1) || IS_ATOMIC(u2))
        {
          // If they're both atomics, return an atomic dependence
          if (IS_ATOMIC(u1) && IS_ATOMIC(u2))
          {
            return check_for_anti_dependence(u1,u2,
                                             ATOMIC_DEPENDENCE/*default*/); 
          }
          // If the one that is not an atomic is a read, we're also ok
          else if ((!IS_ATOMIC(u1) && IS_READ_ONLY(u1)) ||
                   (!IS_ATOMIC(u2) && IS_READ_ONLY(u2)))
          {
            return check_for_promotion(u1, NO_DEPENDENCE);
          }
          // Everything else is a dependence
          return check_for_anti_dependence(u1,u2,TRUE_DEPENDENCE/*default*/);
        }
        // If either is simultaneous we have a simultaneous dependence
        else if (IS_SIMULT(u1) || IS_SIMULT(u2))
        {
          return SIMULTANEOUS_DEPENDENCE;
        }
        else if (IS_RELAXED(u1) && IS_RELAXED(u2))
        {
          // TODO: Make this truly relaxed, right now it is the 
          // same as simultaneous
          return SIMULTANEOUS_DEPENDENCE;
          // This is what it should be: return NO_DEPENDENCE;
          // What needs to be done:
          // - RegionNode::update_valid_instances needs to allow multiple 
          //               outstanding writers
          // - RegionNode needs to detect relaxed case and make copies from all 
          //              relaxed instances to non-relaxed instance
        }
        // We should never make it here
        assert(false);
        return NO_DEPENDENCE;
      }
    } 
Registering Region Dependencies
Now let's check out the code for when we register a dependence. Register a mapping dependence between this operation and another operation. There are some special cases at the beginning, but usually we make it to perform_registration. The big thing here is that registration is performed on the target operation. The key thing here is that we update the count of the number of outstanding mapping dependencies. When that count goes to zero, that's how we move onto triggering mapping.
    bool Operation::register_region_dependence(unsigned idx, Operation *target,
                                          GenerationID target_gen, 
                                          unsigned target_idx,
                                          DependenceType dtype, bool validates,
                                          const FieldMask &dependent_mask)
    {
        ...
        Event all_mapped = Event::NO_EVENT;
        prune = target->perform_registration(target_gen, this, gen,
                                                registered_dependence,
                                                outstanding_mapping_deps,
                                                outstanding_speculation_deps,
                                                all_mapped);
        if (all_mapped.exists())
          dependent_children_mapped.insert(all_mapped);
      }
      if (registered_dependence)
      {
        incoming[target] = target_gen;
        // If we registered a mapping dependence then we can verify
        if (validates)
          verify_regions[target].insert(idx);
      }
     ...
That's all pretty complicated. I think it's time to go back and look at a write-up about what is going on conceptually in the various publications associated with Legion, but hopefully this gives a decent overview of how to walk through the code and the parts that are important.