In this post I’m going to demonstrate how to dynamically extend the interface of objects in RADOS using the Lua scripting language, and then build an example service for image thumbnail generation and storage that performs remote image processing inside a target object storage device (OSD). We’re gonna have a lot of fun.
Note that this is a re-post of the article appearing at http://ceph.com/rados/dynamic-object-interfaces-with-lua/ which was published on October 29, 2013.
Rados Object Classes
One of the less publicized features of the RADOS object store is the ability to extend the object interface by writing C/C++ plugins that add new remote execution targets that may perform arbitrary operations on object data. The ability to add user-defined functionality to the OSD is a very powerful feature allowing applications to reduce network round-trips and data movement, exploit remote resources, and simplify otherwise complex interfaces by taking advantage of the transactional context within which remote operations execute. But that’s enough marketing—here is a very simple example that computes the MD5 hash of an object without transferring the object payload over the network.
Example: MD5 Hash of Object
The straightforward method for a client to compute the MD5 hash of an object is to first retrieve the entire object and then apply the MD5 hash function to the data locally. Using librados and the crypotpp library, this might look something like the following:
Here the client first reads the entire object over the network, and then computes the MD5 hash of the object data. However, transferring the entire object to the client can be avoided by introducing a custom object interface for computing the MD5 hash within the storage system. The following code snippet illustrates the basics of how an MD5 hash could be computed using the object class facility. Note that the following code would in practice be compiled into a shared library and loaded dynamically into a running OSD process, but we’ve omitted the deployment details to keep things simple (there are links at the end of this section to more information on getting started with object classes).
Before explaining the function
compute_md5, let’s see how a client would
compute_md5 to calculate the hash:
Here the client runs the librados exec method to invoke the
function remotely on the object named “myobj”. Note that the “myhashclass”
is a name that identifies the plugin (not shown in this tutorial), and may
contain many functions that can be invoked remotely. Now, through the power of
networking, and lots of hand waving, a client can invoke the `computemd5
function above which will run remotely on the OSD storing the target object
(these are lots of gory details about how this actually happens that are beyond
the scope of this document). When the remote method is executed, it performs a
transaction that atomically reads the object payload and computes the MD5 hash,
all within the OSD process, avoiding any network transfer of object data. At
the end of thecompute_md5` function the digest is written into the out
parameter that will be marshaled back to the client.
Now that is some pretty magical stuff right there. But, there are situations where the overhead of compiling C/C++ into a shared library–potentially with multiple target architectures–is too heavy weight. It’d be nice if we could inject and alter object interfaces on-the-fly. To address this need, we’ve created a mechanism for defining new object classes using the Lua scripting language, which I’ll describe next.
Additional Resources: Object Class Development
While it was necessary to introduce the concept of object classes, unfortunately a full tutorial on the subject is not in the scope of this post. Located on github is a “Hello, World” example object class containing extensive documentation. This resource is a good starting point, and if you have questions, please do not hesitate to ask questions on the Ceph mailing lists or IRC channels.
Dynamic Object Classes With Lua
In order to support dynamic generation of object interfaces, we’ve embedded the LuaJIT VM inside the OSD process. Why Lua, you may ask? The Lua language and its run-time are specifically designed as an embedded language, and when coupled with the LuaJIT virtual machine, near native performance can be achieved. Briefly, the current implementation expects a Lua script defining any number of functions to be sent to the OSD along with a client request that specifies which specific function in the script to execute. Now let’s dig into the details.
A Lua object class is an arbitrary Lua script containing at least one exported function handler that a client may invoke remotely. By building up a collection of handlers, new and interesting interfaces to objects can be constructed and dynamically loaded into a running RADOS cluster. The basic structure of a Lua object class is shown in the following code snippet:
In the above Lua script any number of functions and modules can be used to
support the behavior exported by the functions
handler2. A client
can remotely execute any registered function, provide an arbitrary input, and
receive an arbitrary output.
Object classes written in Lua may have many functions, only a subset of which
are handlers available to be directly invoked by a client. In order to make a
Lua function available, the function must be exported by registering it. This
is done using the
cls.register function. The following code snippet illustrates
how this works.
In the above example
cls.register(thehandler) exports the function thehandler,
making it available for clients to call. A client that attempts to call the
helper function (an unregistered function), will receive a return value of
Error Handling Semantics
In the previous section we presented an example object class method written in C++ that calculated the MD5 hash of an object. Returning to this example, notice that each operation on the object is carefully checked for failure, and an error code is returned if any operation fails. When a negative value is returned from an object class handler the current transaction will be aborted, and the return value is passed back to the client. When the handler has completed successfully a return value of zero will commit the transaction. While in C++ we must perform these checks explicitly, in Lua this common pattern for handling errors can be fully managed. Take as an example the following C++ object class handler:
handle1 will return
-EEXIST if the object already exists (or any
other error encountered when running
cls_cxx_create), and return zero if the
handler complete successfully. The same functionality can be constructed in
Lua, but when error handling fits this common pattern of aborting
automatically, the Lua object class run-time will automagically select the
correct return value. For instance in the following example,
handle3 have identical semantics to
handle1 defined above in C++.
Some operations return error codes that we may want to handle directly. For
example, when retrieving a value from the object map,
-ENOENT is used to
indicate that the given key was not found. If the handler code can deal with
this case (e.g. creating and initializing a new key), then it is simple enough
to just return all other error codes. This exact scenario is shown in the
following C++ handler, in which we abort on any error code that is not
The same handler can be constructed in Lua as follows:
The trick here is to call the
cls.map_get_val in protected mode via the Lua
pcall function, which prevents any errors from being automatically propagated
to the caller, allowing our handler to examine the return value.
An object class can write into the OSD log (e.g. /var/log/ceph/osd-0.log) to record debugging information using the cls.log function. The function takes any number of arguments which are converted into strings and separated by spaces in the final output. If the first argument is numeric then it is interpreted as a log-level. If no log-level is specified a default log-level is used.
Logging is useful in debugging script execution and can also be used to provide more detailed error information.
Object Payload I/O
The payload data of an object can be read from and written to using the
cls.write functions. Each function takes an offset and length
A key/value store supporting range queries (based on Google’s LevelDB) can be
accessed using the
cls.map_get_val functions. A key can be
any string and a value is a standard blob of any size.
The Lua object class facility is not yet in the mainline Ceph tree. The feature is located in the cls-lua branch, and can be checked out from github:
git clone git://github.com/ceph/ceph.git cls-lua
The normal procedures for building and installing Ceph from source apply, and the only dependency is that LuaJIT development libraries be installed. These dependencies are available on Ubuntu. In addition, more functionality than is listed in this post has been implemented, and a set of unit tests are available in the source tree demonstrating the the full range of features.
Lua Client Libraries
Before we jump into the sample application, I’ll introduce two additional components that will make our life easier. The first is Lua bindings for librados, and the second is a Lua library that hides the details of serializing Lua scripts for execution within the OSD.
Lua bindings for the librados client library are available on github at https://github.com/noahdesu/lua-rados/. Here we will provide a brief overview for context. Please consult the full documentation for additional information. Ok, let’s jump right in. The following code snippet shows how to connect to a RADOS cluster:
Next, open a client I/O context for a particular pool:
Now the Lua client can interact with objects, such as setting an extended attribute:
Those are the basics of writing RADOS clients in Lua. Now, let’s run some remote scripts from a Lua client.
The protocol for sending a script to an OSD is fairly simple, but is easily wrapped up in a convenience library. The cls-lua-client library does just that, building on top of the lua-rados library described in the previous section. Assuming that we have connected to a RADOS cluster and constructed an I/O context object, a remote Lua script can be executed as in the following example. First, let’s create a Lua string containing the script we want to execute.
The script above will send to its output the string “Hello, world!” if the
input is zero-length. Otherwise, it will reply with “Hello,
input is substituted with the input sent from the client. This can be
remotely executed using the cls-lua-client library as follows:
Executing this would produce the output:
Hello, world! Hello, John!
Great, now we have all the pieces to start building a sample application!
Example Application: Image Thumbnail Service
As a driving example we will construct a service on top of RADOS that stores and generates image thumbnails. The service is very simple, and has the following properties.
- Writing an image into an object sets the “base” or “original” image data.
- A thumbnail computed from the base image can be generated remotely inside the OSD.
- The original image and any generated thumbnail can be retrieved.
In the following examples I’ll demonstrate the core of the service. In practice these routines would be added to a larger project or executable, and of course made more robust against errors and different edge case scenarios. A fully functional example of this can be found in the cls-lua-client project on github.
Storing an Image
To store an image in RADOS we first read it from a local file, and then write it to the object. In order to support storage and retrieval of different thumbnails, we record the location and size of an image blob in the object index under a key describing it. In this simple example writing an image sets its base image, so we store it under the key “original”.
In the previous example two round-trips were required to 1) set the object data and 2) update the index. These can be done atomically in a single round-trip by using a co-designed interface, demonstrated in the following script:
The script reads the image from the file and sends the image as the input to a script which executes on the OSD, taking care of the write and index update at the same time. Neat!
Retrieving an Image
To read a particular version of an image we need to look-up the offset and length for the target image blob stored in the object index. In the following example the index look-up and object read are performed remotely, and the image is returned to the client if it exists. In the next section I’ll show how the spec string is stored, but for context it describes the specification for creating a thumbnail (e.g. 500×400 pixels).
The image returned from the script is then written to the output file.
Thumbnails are generated using Lua wrappers to ImageMagick available on github
at https://github.com/leafo/magick. A thumbnail is generated using the
magick.thumb function, passing in an image blob and a thumbnail specification
string (e.g. 500×300 pixels). The script that runs remotely first reads the
original image, computes the thumbnail, appends the thumbnail to the object
payload, and then records the offset and size of the thumbnail in the object
index under a key equal to the specification string.
And that’s it folks… on-the-fly custom RADOS object interfaces! Want to contribute? We are continually improving the Lua bindings and the internal Lua object class API and are always looking for feedback. Thanks for stopping by!