jsonlite 0.9.13: high performance number formatting

October 25, 2014


The jsonlite package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.13 appeared on CRAN which is the third release in a relatively short period focusing on performance optimization.

Fast number formatting

Version 0.9.11 and 0.9.12 had already introduced majors speedup by porting critical bottlenecks to C code and switching to a better JSON parser. The current release focuses on number formatting and incorporates C code from modp_numtoa which is several times faster than as.character, formatC or sprintf for converting doubles and integers to strings (your mileage may vary depending on platform and precision).

library(ggplot2)
nrow(diamonds)
# [1] 53940
system.time(jsonlite::toJSON(diamonds, dataframe = "row"))
#   user  system elapsed
#  0.319   0.007   0.325
system.time(jsonlite::toJSON(diamonds, dataframe = "col"))
#   user  system elapsed
#  0.073   0.002   0.075

Using the same benchmark from previous posts, time to convert the diamonds data to row-based json has gone down from 0.619s to 0.325s on my machine (about 2x speedup from jsonlite 0.9.12), and converting to column-based json has gone down from 0.330s to 0.075s (about 4x speedup).

Comparing to other JSON packages

When comparing JSON packages, it should be noted that the comparsion is never entirely fair because different packages use different settings and defaults for missing values, number of digits, etc. Both rjson and RJSONIO only support the column based format for encoding data frames. Using their default settings:

system.time(rjson::toJSON(diamonds))
#   user  system elapsed
#  0.279   0.004   0.281
system.time(RJSONIO::toJSON(diamonds))
#   user  system elapsed
#  0.918   0.027   0.944

For this particular dataset, jsonlite is about 3.5x faster than rjson and about 12x faster than RJSONIO (on my machine) to generate column-based JSON. These differences are relatively large because 7 out of the 10 columns in the diamonds dataset are numeric.