jsonlite 0.9.12: now even lighter and faster

September 29, 2014

The jsonlite package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.12 appeared on CRAN which includes a completely rewritten json parser and more optimized C code for json generation. The new parser is based on yajl which is smaller and faster than libjson, and much easier to compile.

Error handling

My favorite feature of yajl is that it gives helpful error messages when parsing invalid JSON, for example:

fromJSON('[1,2,falsse,4]')
# Error in parseJSON(txt) : lexical error: invalid string in json text.
#                               [1,2,falsse,4]
#                     (right here) ------^

fromJSON('["foo", "bla\nbla"]')
# Error in parseJSON(txt) : lexical error: invalid character inside string.
#                            ["foo", "bla bla"]
#                     (right here) ------^

fromJSON('[1,2,3,4] {}')
# Error in parseJSON(txt) : parse error: trailing garbage
#                             [1,2,3,4] {}
#                     (right here) ------^

This makes debugging much easier, especially when dealing fast changing dynamic data from the web.

Unicode parsing

The yajl parser always correctly converts escaped unicode sequences into UTF-8 characters:

fromJSON('["\\u5bff\u53f8","Z\\u00fcrich"]')
# [1] "寿司"   "Zürich"

Escaped unicode was already supported in the previous version of jsonlite, however it was expensive and not enabled by default. With yajl we get this for free :-)

Integer parsing

Another cool feature is that yajl parses numbers into integers when possible:

class(fromJSON('[13,14,15]'))
# [1] "integer"

Performance

Performance of both parsing and generating JSON has again tremendously improved in this version. Some benchmarks:

library(jsonlite)
library(microbenchmark)
data(diamonds, package="ggplot2")
json_rows <- toJSON(diamonds)
json_columns <- toJSON(diamonds, dataframe = "columns")
microbenchmark(
   toJSON(diamonds),
   toJSON(diamonds, dataframe = "columns"),
   fromJSON(json_rows),
   fromJSON(json_columns),
   times=10
)
# Unit: milliseconds
#                                    expr      min       lq   median       uq       max neval
#                        toJSON(diamonds) 587.6984 591.3231 619.1590 630.3588  661.5118    10
# toJSON(diamonds, dataframe = "columns") 317.6793 325.3809 330.6444 339.9898  343.7466    10
#                     fromJSON(json_rows) 890.9832 899.3334 939.3230 979.6338 1059.9770    10
#                  fromJSON(json_columns) 188.5764 201.8463 238.1272 279.7607  293.1195    10

If we compare this to the previous blog post we can see that generating JSON to row-based data frames (the default) is approx 2x faster than the previous version. Parsing row-based json is about 2.5x faster, and parsing column-based json is almost 5x faster!

Streaming JSON

Version 0.9.12 introduces some cool streaming functionality. This is a topic in itself and I will blog about this later in the week. Have a look at examples from the stream_in and stream_out manual pages till then.