The Messenger handles connections and generic messenges. A message will be dispatched to any registered dispatchers via the ms_dispatch virtual method on the Dispatcher interface. The OSD class implements the Dispatcher interface.

Update 06/08/14: this post is out of date with the current source tree. See OSD Request Processing Latency

There are two high-level asynchronous traces described below. The first is the process of receiving, preparing, and queueing a request. The second is from the perspective of separate worker threads that dequeue requests to be processed.

Message Dispatch and Req Enqueue

The trace begins when a message is dispatched to the OSD. At a high-level

  • bool OSD::ms_dispatch(Message *m)
    • src/osd/OSD.cc:4720

There are two paths that can be taken, both of which will arrive at OSD::dispatch_op.

  • void OSD::_dispatch(Message *m)

    • Construct a new OpRequest
    • src/osd/OSD.cc:4937
  • void OSD::do_waiters()

    • Grab an existing OpRequest
    • src/osd/OSD.cc:4840

Both _dispatch and do_waiters will then process a request.

  • void OSD::dispatch_op(OpRequestRef op)

    • src/osd/OSD.cc:4857
  • void OSD::handle_op(OpRequestRef op)

    • src/osd/OSD.cc:7352
  • void OSD::enqueue_op(PG *pg, OpRequestRef op)

    • src/osd/OSD.cc:7546
  • void PG::queue_op(OpRequestRef op)

    • src/osd/PG.cc:1707

The request is now living on a queue waiting to be picked up by a worker.

Request Processing

  • struct OpWQ: public ThreadPool::WorkQueueVal, PGRef >

    • src/osd/OSD.h:1101
  • void OSD::OpWQ::_process(PGRef pg, ThreadPool::TPHandle &handle)

    • src/osd/OSD.cc:7604
  • void OSD::dequeue_op(PGRef pg, OpRequestRef op, ThreadPool::TPHandle &handle)

    • src/osd/OSD.cc:7643
  • void ReplicatedPG::do_request(OpRequestRef op, ThreadPool::TPHandle &handle)

    • src/osd/ReplicatedPG.cc:1080
  • void ReplicatedPG::do_op(OpRequestRef op)

    • src/osd/ReplicatedPG.cc:1191
  • void ReplicatedPG::execute_ctx(OpContext *ctx)

    • src/osd/ReplicatedPG.cc:1706

The following sub-trace shows the patch taken to the actual logic behind a RADOS client write operation. All other client operations can be found down this patch as well. For instance, CEPH_OSD_OP_WRITE is sibling to all other client operations in a large switch statement in do_osd_ops.

  • int ReplicatedPG::prepare_transaction(OpContext *ctx)

    • src/osd/ReplicatedPG.cc:5055
  • int ReplicatedPG::doosdops(OpContext *ctx, vector& ops)

    • src/osd/ReplicatedPG.cc:2921
  • case CEPH_OSD_OP_WRITE

    • src/osd/ReplicatedPG.cc:3650

The accumulated transaction is submitted in issue_repop that will then call submit_transaction on the configured PGBackend (e.g. replication or erasure coding). The backend will communicate with replicas as well as run the transaction against the local object store.

  • void ReplicatedPG::issuerepop(RepGather *repop, utimet now)

    • src/osd/ReplicatedPG.cc:6660
  • virtual void submit_transaction(

    • src/osd/PGBackend.h:490