A challenge in designing systems for scientific data analysis is a lack of representative data sets and queries. In the world of relational database systems, the TPC benchmarks serve as a common tool for comparing performance. However, there has been little work done in producing benchmarks representative of scientific data analysis workloads. One such solution is the SS-DB benchmark. From the Science Benchmark (SS-DB) website:

SS-DB is representative of the processing performed in a number of scientific domains in addition to astronomy, including earth science, oceanography, and medical image analysis.

The SS-DB website links to the data generator tool used to produce SS-DB data sets, which are generated in a raw binary format. Systems such as SciHadoop are designed to process NetCDF data using Hadoop. The following tool converts a raw SS-DB data set to a NetCDF file that can be used in existing tools, such as SciHadoop or the NetCDF Operator Suite.

Tool: ssdbnc3loader.c

The following Gist includes the ssdbnc3loader.c tool source. Building the tool requires the NetCDF development libraries. Example build and usage:

gcc -Wall -o tool ssdb_nc3_loader.c -lnetcdf
usage: ./tool [-c] -i <img path> -z <z-dim idx> -s <img size> -n <out.nc>