I am going to attempt to publish a recurring post on this blog that I construct when the number of tabs in the browsers on several devices become overwhelming. I'll post them here with a brain dump of what I was thinking. This is the first such post.

In-Memory Computing

I'm very excited about the potential of various types of in-memory computing. Network performance continues to improve, and we are seeing some indication with EC2 SR-IOV and Azure Infiniband support (Windows-only) that RDMA is going to become available in the cloud. As locality continues to matter less, there are some interesting things that we can consider.

The most recent concept I've been pondering has been the use of tmpfs for accelerating workloads. For instance, database storage is placed in tmpfs to accelerate a set of queries such as analytics. This is viable for many users even when considering the volatile nature of such storage. In fact, the RocksDB wiki contains in-memory benchmarks and some indication that they have been tuning the system for better in-memory performance.

One example of where this may work well is the use of RocksDB to accelerate sorting of genomic data. A fork of SAMtools has been created which replaces the custom external merge-sort algorithm in SAMtools with RocksDB which is highly optimized for disk and SSD media.


In this case using tmpfs will offer limited performance increases as the dataset sizes are far greater than RAM capacity. However, we can consider the constructing of network attached RAM disks. A related project in which remote RAM disks are mounted as a local file system is described here:


I think that this particular approach is interesting, and using iSER combined with RAID-0 may work very well. We can consider other approaches as well. The GASNet system allows remote memory to be accessed as a global address space. If we combine that (or maybe MPI) with sorting algorithms can we provide extremely fast genomic data sorting?


When a KVM virtual machine is created a user-space process provides some memory for the guest to run within. What happens if that memory is actually backed by low-latency RDMA accessible RAM?



I would like to do the following:

  1. Programmatically generate some LLVM IR that processes some memory
  2. Create a read-only memory map of a file
  3. Fork a worker to run the IR against the file
  4. Return the results and free the LLVM IR

It'd be cool to do that in Go, but at this point Rust or C seems like a better candidate. Although, using Go for coordination and some low-level C helpers to do this might be the best solution:





Creating database systems is the cool thing to do right now. But there is a huge design space when it comes to database systems. One of the most obvious ones is how the hell do I even get a query into my system. Here is an SQL template library for C++:


Well, I guess the DSL approach is different than actually producing a generic AST for a query. Maybe this does that too. More broadly, here is a paper about database architecture design trade-offs: