Publishing dynamic data on

February 16, 2014

Suppose you would like to publish some data, for example to accompany a journal article. One way would be to put a CSV file on your website, and share the URL with your colleagues. However CSV has many limitations: it only works for tabular structures, has limited type safety (pretty much everything gets coersed into strings) and leads to loss of numeric precision.

There are many alternative data interchange formats, each with their own benefits and limitations. For example JSON is widely supported and can be parsed in almost any language, however it can be verbose and slow. A binary format such as Protocol Buffers is more efficient, but many users might not know how to parse it. You could even use save or saveRDS in R to share the native R structures, however this limits your audience to R users.

Retrieving dynamic data

What we really need is a method to publish the data itself rather than some representation of the data in a particular format. With OpenCPU you can publish R objects (including datasets) in a way that lets the clients select the format and formatting options for retrieving the dataset. This is implemented using native R functionality to include arbitrary data/objects in packages, and standard R functions for exporting these data. For example, the CRAN package MASS includes a dataset called bacteria:


Via OpenCPU, the dataset can downloaded by anyone, using one of many formats:

Format Export Function URL (short)
text print
CSV write.csv
TSV write.table
JSON jsonlite::toJSON
Protocol Buffers RProtoBuf::serialize_pb
RData save
ascii R dput

The client can also control formatting options by passing HTTP parameters. These parameters map directly to function arguments for the respective export function in the table above. Some random examples:

Output Format Equivalent URL on Public OpenCPU Server
write.csv(bacteria, row.names=TRUE)
jsonlite::toJSON(Boston, digits=4)
jsonlite::toJSON(Boston, dataframe="columns")
jsonlite::toJSON(Boston, pretty=FALSE)

Creating a data package

To start publishing your own dynamic data you need to put your data objects in an R package following the standard guidelines as documented in section 1.1.6 of Writing R Extensions. This might sound cumbersome, but once you get a hold of it, it only takes a few seconds. You’ll realize that packages are actually a beautiful, standardized and well-tested container format for R objects and much more. Have a look at the data folder in the opencpu/appdemo package for some examples.

After creating and installing your package on your local R, test it using the OpenCPU single user server:


Publishing dynamic data on

To make your data available through the public OpenCPU server and, all you need to do is put your package up on Github. OpenCPU requires the name of the Github repository to match the name of the R package it contains. Use devtools to test if your package is working:

install_github("pkgname", "username")

If this succeeds you’re good to go. Navigate to where username is your Github login. By default the OpenCPU public server updates packages installed from Github every 24 hours. However, the Github webhook can be used to update the package immediately every time a commit is pushed to github.

Publishing dynamic data on your own server

OpenCPU does not lock you into some commercial hosting service. Your data is stored on Github in a standard format under your control. The public server is there for your convenience. You can also install your own OpenCPU cloud server to publish data at e.g. No need to put anything on Github, just install the package in R on the server.