Overview

In this class we'll look at fields, bitmasks, and serialization and deserialization.

About Legion Runtime Class

These notes are closely based on the set of Legion Runtime Class videos produced by the Legion developers. They are my own notes and code walks, and any errors or things that are just plain wrong represent my own mistakes.

Today's notes are based on the following video:

Dynamic Tree

Recall from the discussion regarding the region tree that objects attached a RegionTreeNode are used to track logical and physical state (see region_tree.h). In that post it was discussed that the structure, LegionStack, is a fixed size structure, and when things like new fields are allocated all of these structures in the tree may need to be adjusted. That has now changed.

A dynamic table has been added to the high-level runtime that replaces the LegionStack for tracking logical and physical state, allowing for dynamic allocation. The implementation is in legion_utilities.h:

    template<typename ALLOCATOR>
    class DynamicTable {
    public:
      typedef typename ALLOCATOR::IT IT;
      typedef typename ALLOCATOR::ET ET;
      typedef DynamicTableNodeBase<IT> NodeBase;
    public:
    ...

It is very similar to the low-level implementation, but rather than allocating full structures at the leafs, pointers are stored in the high-level runtime version that saves space when sparsely filled.

There are a lot of other helpful things that are located in legion_utilities.h.

Region Usage

The RegionUsage object stores information about privileges, coherence, and reductions, and is used during dependence analysis.

    struct RegionUsage {
    public:
      RegionUsage(void)
        : privilege(NO_ACCESS), prop(EXCLUSIVE), redop(0) { }
      RegionUsage(PrivilegeMode p, CoherenceProperty c, ReductionOpID r)
        : privilege(p), prop(c), redop(r) { }
      RegionUsage(const RegionRequirement &req)
        : privilege(req.privilege), prop(req.prop), redop(req.redop) { }
    public:
    ...

Dependence Checks

There are several static functions that are used to define different checks that are made during dependence analysis that encode some policy, for instance check_dependence_type:

    static inline DependenceType check_dependence_type(const RegionUsage &u1,
                                                       const RegionUsage &u2)
    ...

AutoLock

We've seen the auto lock many times before, which basically implements RAII pattern but is a little more feature rich than your standard mutex.

    class AutoLock { 
    public:
      AutoLock(Reservation r, unsigned mode = 0, bool exclusive = true,

Serialization

Helpers for implementing serialization and deserialization for sending data across the network.

    class Serializer {
    public:
      Serializer(size_t base_bytes = 4096)
    ...

    class Deserializer {
    public:
      Deserializer(const void *buf, size_t buffer_size)
        : total_bytes(buffer_size), buffer((const char*)buf), index(0)
    ...

Bit Mask

Another major utility component in Legion is a big mask. It's pretty much just how it sounds. It's used to describe lots of things like sets of fields and processors and nodes.

    template<typename T, unsigned int MAX,
             unsigned int SHIFT, unsigned int MASK>
    class BitMask {
    public:
      explicit BitMask(T init = 0);

It has a large interface that includes all sorts of set-like operations. The basic bit mask has a dense structure holding the bits. There is a two level bit mask that includes a summary mask that makes certain operations and tests very fast. Additionally, there are SSE and AVX optimized versions. Logic for selecting between the different implements is done at compile time:

#if defined(__AVX__)
#if (MAX_FIELDS > 256)
    typedef AVXTLBitMask<MAX_FIELDS> FieldMask;
#elif (MAX_FIELDS > 128)
    typedef AVXBitMask<MAX_FIELDS> FieldMask;
#elif (MAX_FIELDS > 64)
    typedef SSEBitMask<MAX_FIELDS> FieldMask;
#else
    typedef BitMask<FIELD_TYPE,MAX_FIELDS,FIELD_SHIFT,FIELD_MASK> FieldMask;
#endif
#elif defined(__SSE2__)

....

There are similar bits for NodeMask, ProcessorMask, etc..., in addition to FieldMask.

Field Space Node

Now we'll take a look at how fields are allocated. Some application will create a field allocator, and will allocate some fields by calling FieldAllocator::allocate_field.

    class FieldAllocator {
    public:
      FieldAllocator(void);
      ...
      inline FieldID allocate_field(size_t field_size, 
              FieldID desired_fieldid = AUTO_GENERATE_ID);
      ...

The size of the field is given, and some desired field id (but one can be auto generated, too). Below, the local parameter is used to specify if the field won't escape the task's context.

    inline FieldID FieldAllocator::allocate_field(size_t field_size, 
                                FieldID desired_fieldid /*= AUTO_GENERATE_ID*/)
    {
      return runtime->allocate_field(parent, field_space, 
                                 field_size, desired_fieldid, false/*local*/); 
    }

Now we've made it into the runtime. First thing is to grab a unique field id if one wasn't provided. We've covered unique ID creation before (recall that it uses the node ID as a stride within a space). Then we ask the forest to allocate the new field.

    FieldID Runtime::allocate_field(Context ctx, FieldSpace space,
                                          size_t field_size, FieldID fid,
                                          bool local)
    {
      if (fid == AUTO_GENERATE_ID)
        fid = get_unique_field_id();
      if (local)
        ctx->add_local_field(space, fid, field_size);
      else
      {
        forest->allocate_field(space, field_size, fid, local);
        ctx->register_field_creation(space, fid);
      }
      return fid;
    }

So far so good. We'll cover the common case below which, after looking up the node, is to just call allocate_field on the node.

    bool RegionTreeForest::allocate_field(FieldSpace handle, size_t field_size,
                                          FieldID fid, bool local)
    {
      FieldSpaceNode *node = get_node(handle);
      if (local && node->has_field(fid))
        return true;
      node->allocate_field(fid, field_size, local);
      return false;
    }

Ok, now we get to more interesting stuff. First thing is to allocate an index. An index can be thought of as a bit in a bit mask. In particular, we are concerned with the FieldMask which is a bitmask. We create a mapping between field ID and bitmask index, and want to find an index that is free.

The field mask tells us what fields have been allocated. Legion supports parallel field allocation. We want to do parallel allocation without communication, so we'd like each node to be able to select an index without communication. For this reason, the same field ID may map to different index values on different nodes. To handle comparisons, permutations are computed when bitmasks are transported.

    void FieldSpaceNode::allocate_field(FieldID fid, size_t size, bool local)
    {
      AutoLock n_lock(node_lock);
      // Find an index in which to allocate this field  
      unsigned index = allocate_index(local);

      fields[fid] = FieldInfo(size, index, local);
      // Send messages to all our subscribers telling them about the allocation
      // as long as it is not local.  Local fields get sent by the task contexts
      if (!local && !!creation_set)
      {
        SendFieldAllocationFunctor functor(handle, fid, size, 
                                           index, context->runtime);
        creation_set.map(functor);
      }
    }

So how do we allocate an index? The goal below is used as a preference to reduce the permutations between nodes. If the goal isn't set in the allocated indexes structure, we'll take it.

    unsigned FieldSpaceNode::allocate_index(bool local, int goal /*= -1*/)
    //--------------------------------------------------------------------------
    {
      // Assume we are already holding the node lock

      // We do something intelligent here to try and maintain
      // identity permutations as long as possible.  First, if there
      // is a target goal try and allocate it there since it has already
      // been assigned there on a remote node.
      if ((goal >= 0) && !allocated_indexes.is_set(goal))
      {
        unsigned result = goal;
        allocated_indexes.set_bit(result);
        return result;
      }
    ...

Here is the allocated_indexes mask in the FieldSpaceNode structure.

    class FieldSpaceNode {
    public:
    ...
      FieldMask allocated_indexes;
    ...

If we don't have a goal, then we need to find an open index. We won't cover the remainder of the allocate_index call. Basically there is an optimization in which a unique ID is computed, and one where a random guess is made.

The most frequently used method on the field node is to get the field mask from a set of field IDs. Recall that region requirements will specify which fields they care about. So the translation from field ID to the compact field mask is performed frequently.

      FieldMask get_field_mask(const std::set<FieldID> &fields) const;

For each field ID we look up the FieldInfo structure that we set when allocating the field, and grab the index out of there and set it in the bitmask that will be returned.

    FieldMask FieldSpaceNode::get_field_mask(
                              const std::set<FieldID> &privilege_fields) const
    {
      AutoLock n_lock(node_lock,1,false/*exclusive*/);
      FieldMask result;
      for (std::set<FieldID>::const_iterator it = privilege_fields.begin();
            it != privilege_fields.end(); it++)
      {
        std::map<FieldID,FieldInfo>::const_iterator finder = fields.find(*it);
        result.set_bit(finder->second.idx);
      }
      return result;
    }

Here is an example of where grabbing the field mask is used:

    void RegionTreeForest::perform_dependence_analysis(RegionTreeContext ctx,
                                                  Operation *op, unsigned idx,
                                                  RegionRequirement &req,
                                                  RegionTreePath &path)
    {
      // If this is a NO_ACCESS, then we'll have no dependences so we're done
      if (IS_NO_ACCESS(req))
        return;

      RegionNode *parent_node = get_node(req.parent);

      FieldMask user_mask = 
        parent_node->column_source->get_field_mask(req.privilege_fields);

Serialization

Now we'll look very briefly at region tree node serialization. Nodes are often sent around to other machines. It all starts with send_node.

    void FieldSpaceNode::send_node(AddressSpaceID target)
    //--------------------------------------------------------------------------
    {
      // See if this is in our creation set, if not, send it and all the fields
      AutoLock n_lock(node_lock);
      if (!creation_set.contains(target))
      {

The creation_set determines if we have already sent this node to a particular target in order to avoid sending it multiple times. The creation_set is of type NodeSet. A NodeSet is similar to the other mask structures we saw, but is efficient when sparse.

    public:
      NodeSet creation_set;
      NodeSet destruction_set;

If the target isn't in the creation set then we need to pack it up and send it off. The Serializer object knows how do that. So we pack up all the metadata for the object we are sending, and then fire it off.

        // First send the node info and then send all the fields
        Serializer rez;
        {
          RezCheck z(rez);
          rez.serialize(handle);
          rez.serialize<size_t>(semantic_info.size());
          for (LegionMap<SemanticTag,SemanticInfo>::aligned::iterator it = 
                semantic_info.begin(); it != semantic_info.end(); it++)
          {
          ...
        }
        context->runtime->send_field_space_node(target, rez);

After we've sent off the node to be created, we send off the field info. Note that the low-level messenger ensure that messages are received in sending order, so we don't have to worry about the target getting a request to allocate new fields on a node it hasn't seen yet.

        // Send all the field allocations
        for (std::map<FieldID,FieldInfo>::const_iterator it = 
              fields.begin(); it != fields.end(); it++)
        {
          // No need to send it if it has been destroyed
          if (!it->second.destroyed)
          {
            context->runtime->send_field_allocation(handle, it->first,
                it->second.field_size, it->second.idx, target);
          }
        }

Now record that we don't need to send again in the creation_set.

        // Finally add it to the creation set
        creation_set.add(target);
      }
      // Send any deletions if necessary
      if (!destruction_set.contains(target))
      {
        context->runtime->send_field_space_destruction(handle, target);
        destruction_set.add(target);
      }
    }

On the receiving side we unpack the bits for the FieldSpace node by deserializing them in exact same order. Note that the DerezCheck does some basic error checking. Recall from above the RezCheck call. These are matched up and serve as a very simple checksum that simply counts the number of bytes involved in the serialization step.

    /*static*/ void FieldSpaceNode::handle_node_creation(
          RegionTreeForest *context, Deserializer &derez, AddressSpaceID source)
    {
      DerezCheck z(derez);
      FieldSpace handle;
      derez.deserialize(handle);
      FieldSpaceNode *node = context->create_node(handle);

      node->add_creation_source(source);
      size_t num_semantic;
      derez.deserialize(num_semantic);
      for (unsigned idx = 0; idx < num_semantic; idx++)
      {
        SemanticTag tag;
        derez.deserialize(tag);
      ...
    }