A challenge in designing systems for scientific data analysis is a lack of representative data sets and queries. In the world of relational database systems, the TPC benchmarks serve as a common tool for comparing performance. However, there has been little work done in producing benchmarks representative of scientific data analysis workloads. One such solution is the SS-DB benchmark. From the Science Benchmark (SS-DB) website:
SS-DB is representative of the processing performed in a number of scientific domains in addition to astronomy, including earth science, oceanography, and medical image analysis.
The SS-DB website links to the data generator tool used to produce SS-DB data sets, which are generated in a raw binary format. Systems such as SciHadoop are designed to process NetCDF data using Hadoop. The following tool converts a raw SS-DB data set to a NetCDF file that can be used in existing tools, such as SciHadoop or the NetCDF Operator Suite.
The following Gist includes the ssdbnc3loader.c tool source. Building the tool requires the NetCDF development libraries. Example build and usage:
gcc -Wall -o tool ssdb_nc3_loader.c -lnetcdf usage: ./tool [-c] -i <img path> -z <z-dim idx> -s <img size> -n <out.nc>