Faster arrays and matrices in jsonlite 0.9.20

May 11, 2016


Yesterday a new version of the jsonlite package was released to CRAN. This update includes no new features, it only introduces performance optimizations.

Large Matrices

The jsonlite package was already highly optimized for converting vectors and data frames to json. However Gregory Jefferis and Duncan Murdoch had found that conversion of tall matrices as used by rglwidget was slower than expected.

It turned out this was indeed an edge case that I had overlooked. The new version of jsonlite fixes this problem and matrix conversion should be about 200 times faster than before. Technical details follow below; first a benchmark:

# Old version!
> system.time(j<-toJSON(matrix(1L, ncol = 3, nrow = 50000)))
   user  system elapsed
  4.715   0.015   4.729

# New version!
> system.time(j<-toJSON(matrix(1L, ncol = 3, nrow = 50000)))
   user  system elapsed
  0.022   0.002   0.023

This artificial example (every field has the number 1) highlights the improvement. The relative improvement might be less for matrices with actual data because of additional time spent on number formatting double/integer values (which was already optimized in jsonlite a while ago).

Technical Details

So what was the problem? The previous version of jsonlite had an elegant solution that would recurse through the dimensions of a matrix/array and apply json conversion on each of its elements. E.g. for a matrix (2D array) it would convert each row to json, and then combine the results. However it turns out that the apply call below is really slow.

# Technical example, don't use this code !
x <- matrix(1L, ncol = 3, nrow = 50000)
rows <- apply(x, 1, jsonlite:::asJSON)
json <- jsonlite:::collapse(rows, indent = NA)

The new version exploits the fact that matrices and arrays are homogenous (i.e. all elements have the same type). It first removes the dimensions from the array using c(x) and converts all of the individual elements to json with a single call to asJSON. This results in a significant speedup because asJSON is only called once rather than n times.

# Technical example, don't use this code !
str <- jsonlite:::asJSON(c(x), collapse = FALSE)
dim(str) <- dim(x)
rows <- apply(str, 1, jsonlite:::collapse, indent = NA)
json <- jsonlite:::collapse(rows, indent = NA)

Things get a bit more complicated for higher dimensional arrays, especially with toJSON(x, pretty = TRUE) but this illustrates the core issue.

You might be thinking: can we avoid apply alltogether? Yes! For the important case of 2 dimensional arrays jsonlite has a complete C implementation which makes toJSON on matrices is extra fast. For higher dimensional arrays it currently still uses the solution above, which performs quite well. We might be able to further optimize this case by porting this to C as well, but working with high dimensional arrays in C makes my head hurt.