OpenCPU2023-02-15T23:28:05+00:00https://www.opencpu.org/Jeroen Oomshttps://www.opencpu.org/images/https://www.opencpu.org/favicon.icoOpenCPU 2.1 Release: Scalable R Services2018-11-22T00:00:00+00:00https://www.opencpu.org/posts/opencpu-201
<a href="https://www.opencpu.org/posts/opencpu-201"><img alt="opencpu logo" src="https://www.opencpu.org/images/stockplot.png"></a>
<p>OpenCPU provides a mature and robust system for hosting R based services. The server exposes a simple <a href="https://www.opencpu.org/api.html">HTTP API</a> for calling R functions, scripts and managing data. The Cloud Server is completely free and scales up to many concurrent users. This provides a reliable foundation for intergrating R into any environment.</p>
<p>The 2.1 branch is the new major release of OpenCPU. The changes in this version are mostly internal, and make the server a bit lighter and faster. The built-in CI system has switched to the lightweight <a href="https://cloud.r-project.org/web/packages/remotes/index.html">remotes</a> package for installing packages from GitHub. Moreover the <code class="language-plaintext highlighter-rouge">opencpu-server</code> package has been tweaked to work better inside docker. Also we now target R 3.5 on server installations.</p>
<p>The user facing features are unchanged; see the <a href="https://www.opencpu.org/posts/opencpu-2-0/">opencpu 2.0 announcement post</a> for a brief overview.</p>
<h2 id="upgrading">Upgrading</h2>
<p>The version 2.1.0 is available from <a href="https://cran.r-project.org/package=opencpu">CRAN</a>, <a href="https://www.opencpu.org/download.html">Launchpad</a>, <a href="https://hub.docker.com/u/opencpu">Dockerhub</a>, <a href="https://software.opensuse.org/download.html?project=home:jeroenooms:opencpu-2.1&package=opencpu">OBS</a> and the <a href="https://archive.opencpu.org/">server archive</a>.</p>
<p>The recommended platform for running the server is Ubuntu 18.04 or 16.04, which can be installed <a href="https://www.opencpu.org/download.html">directly from the PPA</a>. For Fedora and CentOS you can download installers from the <a href="https://archive.opencpu.org/centos-6/">server achive</a>. All binaries from the archive have been <a href="https://www.opencpu.org/posts/opencpu-with-docker/">built on dockerhub</a> and depend on the current version of R from <a href="https://apps.fedoraproject.org/packages/R-devel">Fedora / EPEL</a>.</p>
<p>The easiest way to get started is by deploying your packages on the <a href="https://www.opencpu.org/cloud.html">public cloud server</a> by enabling the opencpu webhook in your GitHub repository.</p>
<h2 id="docker">Docker</h2>
<p>Another easy way to get started is using docker, which also runs on Windows these days. Images based on various platforms are published on <a href="https://hub.docker.com/u/opencpu">dockerhub</a> The <a href="https://hub.docker.com/r/opencpu/rstudio">opencpu/rstudio</a> image is recommended for development: it runs both <code class="language-plaintext highlighter-rouge">opencpu-server</code> and <code class="language-plaintext highlighter-rouge">rstudio-server</code> which are very powerful together.</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Run server as executable</span>
docker run <span class="nt">--name</span> mybox <span class="nt">-t</span> <span class="nt">-p</span> 80:80 opencpu/rstudio
<span class="c"># OR: if port 80 is taken use port 8004</span>
docker run <span class="nt">--name</span> mybox <span class="nt">-t</span> <span class="nt">-p</span> 8004:8004 opencpu/rstudio
</code></pre></div></div>
<p>Now simply open <code class="language-plaintext highlighter-rouge">http://localhost/ocpu/</code> and <code class="language-plaintext highlighter-rouge">http://localhost/rstudio/</code> in your browser! Login via rstudio with user: <code class="language-plaintext highlighter-rouge">opencpu</code> (passwd: <code class="language-plaintext highlighter-rouge">opencpu</code>) to build and install packages.</p>
<p>To connect to a running container (e.g. for installing system libraries) get a root shell:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Replace 'mybox' with the container name or id</span>
docker <span class="nb">exec</span> <span class="nt">-i</span> <span class="nt">-t</span> mybox /bin/bash
</code></pre></div></div>
<p>Use the <a href="https://hub.docker.com/r/opencpu/base">opencpu/base</a> image for deployments. Also see the <a href="https://github.com/jeroen/opencpu-server/tree/master/docker#readme">docker readme</a>.</p>
Why Use Docker with R? A DevOps Perspective2017-10-16T00:00:00+00:00https://www.opencpu.org/posts/opencpu-with-docker
<a href="https://www.opencpu.org/posts/opencpu-with-docker"><img alt="opencpu logo" src="https://www.opencpu.org/images/stockplot.png"></a>
<p>There have been several blog posts going around about why one would use Docker with R.
In this post I’ll try to add a DevOps point of view and explain how containerizing
R is used in the context of the OpenCPU system for building and deploying R servers.</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Has anyone in the <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> world written really well about the *why* of their use of Docker, as opposed to the the *how*?</p>— Jenny Bryan (@JennyBryan) <a href="https://twitter.com/JennyBryan/status/913785731998289920?ref_src=twsrc%5Etfw">September 29, 2017</a></blockquote>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<h2 id="1-easy-development">1: Easy Development</h2>
<p>The flagship of the OpenCPU system is the <a href="/download.html">OpenCPU server</a>:
a mature and powerful Linux stack for embedding R in systems and applications.
Because OpenCPU is completely open source we can build and ship on DockerHub. A ready-to-go linux server with both OpenCPU and RStudio
can be started using the following (use port 8004 or 80):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run -t -p 8004:8004 opencpu/rstudio
</code></pre></div></div>
<p>Now simply open <a href="http://localhost:8004/ocpu/">http://localhost:8004/ocpu/</a> and
<a href="http://localhost:8004/rstudio/">http://localhost:8004/rstudio/</a> in your browser!
Login via rstudio with user: <code class="language-plaintext highlighter-rouge">opencpu</code> (passwd: <code class="language-plaintext highlighter-rouge">opencpu</code>) to build or install apps.
See the <a href="https://hub.docker.com/r/opencpu/rstudio/">readme</a> for more info.</p>
<p>Docker makes it easy to get started with OpenCPU. The container gives you the full
flexibility of a Linux box, without the need to install anything on your system.
You can install packages or apps via rstudio server, or use <code class="language-plaintext highlighter-rouge">docker exec</code> to a
root shell on the running server:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Lookup the container ID
docker ps
# Drop a shell
docker exec -i -t eec1cdae3228 /bin/bash
</code></pre></div></div>
<p>From the shell you can install additional software in the server, customize the apache2 httpd
config (auth, proxies, etc), tweak R options, optimize performance by preloading data or
packages, etc.</p>
<h2 id="2-shipping-and-deployment-via-dockerhub">2: Shipping and Deployment via DockerHub</h2>
<p>The most powerful use if Docker is shipping and deploying applications via DockerHub. To create a fully standalone
application container, simply use a standard <a href="https://hub.docker.com/u/opencpu/">opencpu image</a>
and add your app.</p>
<p>For the purpose of this blog post I have wrapped up some of the <a href="https://www.opencpu.org/apps.html">example apps</a> as docker containers by adding a very simple <code class="language-plaintext highlighter-rouge">Dockerfile</code> to each repository. For example the <a href="https://rwebapps.ocpu.io/nabel/www/">nabel</a> app has a <a href="https://github.com/rwebapps/nabel/blob/master/Dockerfile">Dockerfile</a> that contains the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FROM opencpu/base
RUN R -e 'devtools::install_github("rwebapps/nabel")'
</code></pre></div></div>
<p>It takes the standard <a href="https://hub.docker.com/r/opencpu/base/">opencpu/base</a>
image and then installs the nabel app from the Github <a href="https://github.com/rwebapps">repository</a>.
The result is a completeley isolated, standalone application. The application can be
started by anyone using e.g:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run -d -p 8004:8004 rwebapps/nabel
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">-d</code> daemonizes on port 8004. Now open the app via: <a href="http://localhost:8004/ocpu/library/nabel">http://localhost:8004/ocpu/library/nabel</a>. Obviously you can tweak the <code class="language-plaintext highlighter-rouge">Dockerfile</code> to install whatever extra software or settings you need
for your application.</p>
<p>Containerized deployment shows the true power of docker: it allows for shipping fully
self contained appliations that work out of the box, without installing any software or
relying on paid hosting services. If you do prefer professional hosting, there are
many companies that will gladly host docker applications for you on scalable infrastructure.</p>
<h2 id="3-cross-platform-building">3 Cross Platform Building</h2>
<p>There is a third way Docker is used for OpenCPU. At each release we build
the <code class="language-plaintext highlighter-rouge">opencpu-server</code> installation package for half a dozen operating systems, which
get published on <a href="https://archive.opencpu.org">https://archive.opencpu.org</a>.
This process has been fully automated using DockerHub. The following images automatically
build the enitre stack from source:</p>
<ul>
<li><a href="https://hub.docker.com/r/opencpu/ubuntu-16.04/">opencpu/ubuntu-16.04</a></li>
<li><a href="https://hub.docker.com/r/opencpu/debian-9/">opencpu/debian-9</a></li>
<li><a href="https://hub.docker.com/r/opencpu/fedora-25/">opencpu/fedora-25</a></li>
<li><a href="https://hub.docker.com/r/opencpu/fedora-26/">opencpu/fedora-26</a></li>
<li><a href="https://hub.docker.com/r/opencpu/centos-6/">opencpu/centos-6</a></li>
<li><a href="https://hub.docker.com/r/opencpu/centos-7/">opencpu/centos-7</a></li>
</ul>
<p>DockerHub automatically rebuilds this images when a new release is published on Github.
All that is left to do is run a <a href="https://github.com/opencpu/archive/blob/gh-pages/update.sh">script</a>
which pull down the images and copies the <code class="language-plaintext highlighter-rouge">opencpu-server</code> binaries to the <a href="https://archive.opencpu.org">archive server</a>.</p>
Announcing OpenCPU 2.0: Building and Deploying Scalable R Apps and Services2017-07-14T00:00:00+00:00https://www.opencpu.org/posts/opencpu-2-0
<a href="https://www.opencpu.org/posts/opencpu-2-0"><img alt="opencpu logo" src="https://www.opencpu.org/images/stockplot.png"></a>
<p>OpenCPU 2.0 provides the most robust system available today for building and deploying R based apps and services. The server exposes a simple <a href="https://www.opencpu.org/api.html">HTTP API</a> for calling with R functions, scripts and managing data, which provides a very solid basis for intergrating R into any environment. The OpenCPU 2.0 cloud server naturally scales up to many concurrent users and is entirely available under the business friendly Apache2 license – at no extra cost.</p>
<p>The 2.0 branch is the biggest upgrade to the system since the 1.0 release 4 years ago. The server API is backwards compatible so that existing clients and apps will keep working. Internals have been rewritten to make development easier and further enhance the performance and robustness of the server system.</p>
<p>The version 2.0.3 is available from <a href="https://cran.r-project.org/package=opencpu">CRAN</a>, <a href="https://www.opencpu.org/download.html">Launchpad</a>, <a href="https://hub.docker.com/u/opencpu">Dockerhub</a>, <a href="https://software.opensuse.org/download.html?project=home:jeroenooms:opencpu-2.0&package=opencpu">OBS</a> and the <a href="https://archive.opencpu.org/">server archive</a>. Below a brief overview of improvements in OpenCPU 2.0!</p>
<h2 id="opencpu-apps">OpenCPU Apps</h2>
<p>The 2.0 version makes it even easier to build and deploy R webapps. An app in OpenCPU is simply an R package which may include a web frontend that interacts with R functions from the same package via the OpenCPU API. By using the R package format as a container for shipping web applications OpenCPU apps natively support for dependencies, namespaces, embedded data, documentation, etc.</p>
<p>Apps can be run or deployed in many ways.</p>
<ul>
<li>Run or develop locally using the single user server in R using <code class="language-plaintext highlighter-rouge">opencpu::ocpu_start_app()</code></li>
<li>Deploy for free on <code class="language-plaintext highlighter-rouge"><yourname>.ocpu.io</code> or <code class="language-plaintext highlighter-rouge">cloud.opencpu.org</code> using the <a href="https://www.opencpu.org/cloud.html">CI webhook</a></li>
<li>Host your own opencpu-server, either internally or on the internet</li>
<li>Ship and deploy apps in docker containers</li>
</ul>
<p>Several example apps are available from <a href="https://github.com/rwebapps">rwebapps</a> Github repository. You can try each app on the public <a href="https://www.opencpu.org/apps.html">cloud server</a> or you can run it locally in R using the single-user server.</p>
<h2 id="single-user-server">Single User server</h2>
<p>Ther OpenCPU single-user server allows for running OpenCPU inside an interactive R session on any platform. To install the latest version in R:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">install.packages</span><span class="p">(</span><span class="s2">"opencpu"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Version 2.0 has made it much easier to run and develop OpenCPU apps using the single user server. For example to run the <a href="https://github.com/rwebapps/stockapp">rwebapps/stockapp</a> app:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">opencpu</span><span class="o">::</span><span class="n">ocpu_start_app</span><span class="p">(</span><span class="s2">"rwebapps/stockapp"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><a href="https://rwebapps.ocpu.io/stockapp/www/"><img alt="stockplot" src="../../images/stockplot.png" class="img-responsive" /></a></p>
<p>Or try the very cool <a href="https://github.com/rwebapps/markdownapp">rwebapps/markdownapp</a>:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">opencpu</span><span class="o">::</span><span class="n">ocpu_start_app</span><span class="p">(</span><span class="s2">"rwebapps/markdownapp"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><a href="https://rwebapps.ocpu.io/markdownapp/www/"><img alt="markdownapp" src="../../images/markdownapp.png" class="img-responsive" /></a></p>
<p>Also try any of the other <a href="https://github.com/rwebapps/">rwebapps</a>. Each of these apps can also be used on <code class="language-plaintext highlighter-rouge">https://rwebapps.ocpu.io/<app></code>.</p>
<h2 id="cloud-server-and-ocpuio">Cloud Server and OCPU.IO</h2>
<p>The new version makes it super easy to publish your apps and packages on the public cloud server via the Github CI. All you need to do is set the <a href="https://www.opencpu.org/cloud.html">OpenCPU webhook</a> in your Github repository or Github organization.</p>
<p>Upon your next git push, your package will immediately become available on a fancy private subdomain <code class="language-plaintext highlighter-rouge">https://<yourname>.ocpu.io/<pkg></code> named after your github username or organization.</p>
<p><a href="https://www.opencpu.org/cloud.html"><img alt="webook" src="../../images/githook.png" class="img-responsive" /></a></p>
<p>Note again that in OpenCPU <strong>an app is just an R package</strong>. You can start deploying any R package on ocpu.io to call it remotely or just for fun, even if the package does not contain any special web front-end.</p>
<h3 id="dependency-remotes">Dependency Remotes</h3>
<p>Your app or package might depend on other CRAN packages as specified in the package <code class="language-plaintext highlighter-rouge">DESCRIPTION</code> file according to the standard R mechanics. However sometimes your package depends on an R package which is not on CRAN, for example from Github.</p>
<p>To deploy packages on OpenCPU which have non-cran dependencies, specify the <code class="language-plaintext highlighter-rouge">Remote</code> in the <code class="language-plaintext highlighter-rouge">DESCRIPTION</code> according to the <a href="https://cran.r-project.org/web/packages/devtools/vignettes/dependencies.html">devtools vignette</a>. Internally the OpenCPU webhook simply uses <code class="language-plaintext highlighter-rouge">devtools::install_github()</code> to install your package, so it supports everything that <code class="language-plaintext highlighter-rouge">install_github</code> does.</p>
<p>You can even pass custom arguments to <code class="language-plaintext highlighter-rouge">install_github</code> by adding them to the webhook URL as http parameters.</p>
<h2 id="improved-data-interchange">Improved Data Interchange</h2>
<p>The most difficult part of building R apps and services is data interchange: getting complex structures efficiently and reliably in and out of R. A lot of energy in OpenCPU 2.0 has gone into further optimizing this critical part of the system.</p>
<p>The three <a href="https://www.opencpu.org/api.html#api-arguments">major data formats</a> in OpenCPU are now fully implemented by myself in highly optimized C/C++ packages:</p>
<ul>
<li><strong>json</strong>: opencpu uses <code class="language-plaintext highlighter-rouge">jsonlite::fromJSON()</code> for reading and <code class="language-plaintext highlighter-rouge">jsonlite::toJSON()</code> for writing json.</li>
<li><strong>protobuf</strong>: opencpu uses <code class="language-plaintext highlighter-rouge">protolite::serialize_pb()</code> and <code class="language-plaintext highlighter-rouge">protolite::unserialize_pb()</code> to convert between objects and protocol buffers.</li>
<li><strong>multipart/form-data</strong>: (POST only) opencpu uses <code class="language-plaintext highlighter-rouge">webutils::parse_multipart()</code> for parsing multipart.</li>
</ul>
<p>Obviously these packages are not limited to OpenCPU; they may be used by other systems as well.</p>
<h3 id="dataframes">DataFrames</h3>
<p>A special role in R is reserved for Data Frames, the common data structure for storing tabular data sets. OpenCPU adds additional output types for retrieving data frames in NDJSON, SPSS, SAS or STATA format.</p>
<p>For example the following URLS retrieve the “diamonds” dataset from the “ggplot2” package in various formats:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cran.ocpu.io/ggplot2/data/diamonds/csv
https://cran.ocpu.io/ggplot2/data/diamonds/json
https://cran.ocpu.io/ggplot2/data/diamonds/ndjson
https://cran.ocpu.io/ggplot2/data/diamonds/pb
https://cran.ocpu.io/ggplot2/data/diamonds/feather
https://cran.ocpu.io/ggplot2/data/diamonds/rda
https://cran.ocpu.io/ggplot2/data/diamonds/rds
https://cran.ocpu.io/ggplot2/data/diamonds/spss
https://cran.ocpu.io/ggplot2/data/diamonds/sas
https://cran.ocpu.io/ggplot2/data/diamonds/stata
</code></pre></div></div>
<p>This also shows an additional use case for OpenCPU: publishing datasets in an format agnostic way using the “lazydata” feature from R packaging format.</p>
<p>It is completely valid to create an R package which contains only a dataset (no functions) and deploy it on OCPU.IO to make it available in a dozen formats at once!</p>
<h2 id="server-binaries">Server Binaries</h2>
<p>OpenCPU 2.0 has further improved <code class="language-plaintext highlighter-rouge">opencpu-server</code>, the highly configurable multi-user server implementation, to run on various distributions as well as docker. This makes installing (and uninstalling) an opencpu production server easy for users or system administrators.</p>
<p>The recommended platform is still Ubuntu 16.04 (Xenial) because it supports AppArmor. This is also the platform we use to host <a href="https://cloud.opencpu.org">cloud.opencpu.org</a> and <a href="https://cloud.opencpu.org">ocpu.io</a>. Installation is easy:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Requires Ubuntu 16.04 (Xenial)</span>
<span class="nb">sudo </span>add-apt-repository <span class="nt">-y</span> ppa:opencpu/opencpu-2.0
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get upgrade
<span class="c"># Installs OpenCPU server</span>
<span class="nb">sudo </span>apt-get <span class="nb">install</span> <span class="nt">-y</span> opencpu-server
<span class="c"># Optional: installs rstudio in http://yourhost/rstudio</span>
<span class="nb">sudo </span>apt-get <span class="nb">install</span> <span class="nt">-y</span> rstudio-server
</code></pre></div></div>
<p>New in version 2.0 is that we provide binary installation packages for Debian 9, Fedora 25, CentOS 6 and 7. These binaries are built on <a href="https://hub.docker.com/r/opencpu/rstudio/">dockerhub:opencpu</a> and can also be dowloaded from <a href="https://archive.opencpu.org/">https://archive.opencpu.org</a>.</p>
<h2 id="docker">Docker</h2>
<p>We now provide serveral docker images for running opencpu-server both for development or deployment. The <a href="https://hub.docker.com/r/opencpu/rstudio/">opencpu/rstudio</a> docker image runs both opencpu-server as well as rstudio-server which is nice for development. To start the docker container on port 80 with name “mybox” you would run:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">--name</span> mybox <span class="nt">-t</span> <span class="nt">-p</span> 80:80 opencpu/rstudio
</code></pre></div></div>
<p>If port 80 is taken on your machine you can also use 8004. Once this runs you can navigate to <a href="http://localhost/ocpu">http://localhost/ocpu</a> and <a href="http://localhost/rstudio">http://localhost/rstudio</a> in your browser to get started. You can login rstudio with username/password: opencpu/opencpu.</p>
<p>To get a root shell on the server (for example to install system libraries needed by certain R packages) simply run:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Replace 'mybox' with the --name above</span>
docker <span class="nb">exec</span> <span class="nt">-i</span> <span class="nt">-t</span> mybox /bin/bash
</code></pre></div></div>
<p>From the shell you can easily install R packages or <code class="language-plaintext highlighter-rouge">apt-get install</code> system libraries or modify the server configuration in <code class="language-plaintext highlighter-rouge">/etc/opencpu</code>.</p>
<h2 id="roadmap">Roadmap</h2>
<p>OpenCPU 2.0 server is a major step forward towards a robust system for building and deploying R based apps and services. We will keep improving the server implementations based on our experiences and feedback from users and developers.</p>
<p>Next up is updating the documentation to explain some of the powerful new features that were introduced in the 2.0 branch. We will also be updating the <a href="https://github.com/opencpu/opencpu.js">opencpu.js</a> JavaScript client and build some cool new R webapps, which is what OpenCPU was built for in the first place!</p>
New in jsonlite 0.9.22: distinguish between double and integer2016-06-15T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-0-9-22
<a href="https://www.opencpu.org/posts/jsonlite-0-9-22"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>Today a new version of the <a href="https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html">jsonlite</a> package was released to CRAN. This update includes a few internal enhancements and one new feature.</p>
<h2 id="doubles-vs-integers">Doubles vs integers</h2>
<p>The new <code class="language-plaintext highlighter-rouge">always_decimal</code> parameter forces formatting of doubles in decimal notation. That is to include at least one digit right of the decimal dot. This allows us to distingish them from integers, if you need this.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="n">json_x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">toJSON</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">always_decimal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1,2,3,4,5] </span><span class="w">
</span><span class="p">(</span><span class="n">json_y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">toJSON</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">always_decimal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1.0,2.0,3.0,4.0,5.0] </span><span class="w">
</span></code></pre></div></div>
<p>By formatting doubles this way they naturally get parsed back into doubles. So we can roundtrip numbers between R and json without losing type:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">identical</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_x</span><span class="p">))</span><span class="w">
</span><span class="c1"># TRUE</span><span class="w">
</span><span class="n">identical</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_y</span><span class="p">))</span><span class="w">
</span><span class="c1"># TRUE</span><span class="w">
</span></code></pre></div></div>
<p>You should only use this if you really need it. The json format itself does not specify number types, hence there is no guarantee that an arbitrary json parser will distinguish between integers and doubles. Indeed, most json parsers might simply parse any number into a double, which is totally correct as well.</p>
<p>Also setting <code class="language-plaintext highlighter-rouge">always_decimal = TRUE</code> introduces some performance overhead.</p>
<h2 id="numbers-in-mongodb-and-mongolite">Numbers in MongoDB and Mongolite</h2>
<p>The main motivation for this feature was to insert data from R into MongoDB using the <a href="https://cran.r-project.org/web/packages/mongolite/vignettes/intro.html">mongolite</a> package. Several users of mongolite had <a href="https://github.com/jeroenooms/mongolite/issues/38">requested</a> that it would be nice to retain number types, especially when reading the data from MongoDB back into a strong typed language such as C++.</p>
<p>The latest version of <code class="language-plaintext highlighter-rouge">mongolite</code> automatically takes advantage of this feature:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Get latest mongolite</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"jeroenooms/mongolite"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Assuming you have a local `mongod` running</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">mongolite</span><span class="p">)</span><span class="w">
</span><span class="n">df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">))</span><span class="w">
</span><span class="n">m</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mongo</span><span class="p">(</span><span class="s2">"testnum"</span><span class="p">)</span><span class="w">
</span><span class="n">m</span><span class="o">$</span><span class="n">insert</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="w">
</span><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">m</span><span class="o">$</span><span class="n">find</span><span class="p">()</span><span class="w">
</span><span class="n">identical</span><span class="p">(</span><span class="n">out</span><span class="p">,</span><span class="w"> </span><span class="n">df</span><span class="p">)</span><span class="w">
</span><span class="c1"># TRUE</span><span class="w">
</span></code></pre></div></div>
<p>This makes it even more seamless to use MongoDB as a backend for storing data frames in R!</p>
OpenCPU release 1.62016-05-20T00:00:00+00:00https://www.opencpu.org/posts/opencpu-1-6
<a href="https://www.opencpu.org/posts/opencpu-1-6"><img alt="opencpu logo" src="https://www.opencpu.org/images/stockplot.png"></a>
<p>Following a few weeks of testing, OpenCPU 1.6 has been released. OpenCPU is a production-ready system for embedded statistical computing with R. It provides a neat <a href="https://www.opencpu.org/api.html">API</a> for remotely calling R functions over HTTP via e.g. JSON or <a href="https://gist.github.com/jeroenooms/1984c784a6eff71f508f">Protocol Buffers</a>. The OpenCPU server implementation is stable and has been thorougly tested. It runs on all major Linux distributions and plays nicely with the RStudio server IDE (<a href="https://youtu.be/kAfVWxiZ-Cc?t=847">demo</a>).</p>
<p>Similarly to shiny, OpenCPU can run as a single-user development server within the interactive R session, and as a multi-user (cloud) server for deployments on Linux. Unlinke shiny however, the cloud server comes at no extra cost. On the contrary: you are encouraged to take advantage of the cloud server which is much faster and includes cool features like user libraries, concurrent sessions, continuous integration, customizable security policies, etc.</p>
<h3 id="improvements-protolite-and-feather">Improvements: protolite and feather</h3>
<p>The OpenCPU API has not changed from the 1.4 and 1.5 branch. The version bump indicates that this version targets the R 3.3 and supports the new Ubuntu 16.04. Furthermore the underlying stack of bundled R packages has been upgraded. Navigate to <a href="https://cloud.opencpu.org/ocpu/info"><code class="language-plaintext highlighter-rouge">/ocpu/info</code></a> on your OpenCPU server to inspect the exact versions of all packages used by the system.</p>
<p>This version introduces two major improvements for binary data interchange. First the RProtoBuf dependency has been replaced by the much smaller <a href="https://cran.r-project.org/web/packages/protolite/index.html">protolite</a> package, which has an optimized version of protobuf object serialization. The OpenCPU already had an API for exporting data to Protocol Buffers, it’s just much faster now.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">httr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">protolite</span><span class="p">)</span><span class="w">
</span><span class="n">req</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">GET</span><span class="p">(</span><span class="s2">"https://demo.ocpu.io/ggplot2/data/diamonds/pb"</span><span class="p">)</span><span class="w">
</span><span class="n">mydiamonds</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">unserialize_pb</span><span class="p">(</span><span class="n">content</span><span class="p">(</span><span class="n">req</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<p>New in this version is the <code class="language-plaintext highlighter-rouge">feather</code> output format which can be parsed/generated with the new <a href="https://cran.r-project.org/web/packages/feather/index.html">feather</a> package.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">curl</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">feather</span><span class="p">)</span><span class="w">
</span><span class="n">curl_download</span><span class="p">(</span><span class="s2">"https://demo.ocpu.io/ggplot2/data/diamonds/feather"</span><span class="p">,</span><span class="w"> </span><span class="s2">"diamonds.feather"</span><span class="p">)</span><span class="w">
</span><span class="n">mydiamonds</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_feather</span><span class="p">(</span><span class="s2">"diamonds.feather"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Both <code class="language-plaintext highlighter-rouge">pb</code> and <code class="language-plaintext highlighter-rouge">feather</code> are a binary alternative to the text based <code class="language-plaintext highlighter-rouge">json</code> format:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">curl</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">con</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://demo.ocpu.io/ggplot2/data/diamonds/json"</span><span class="p">)</span><span class="w">
</span><span class="n">mydiamonds</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<h3 id="installation-and-upgrading">Installation and upgrading</h3>
<p>The <a href="https://www.opencpu.org/download.html">download</a> page has instructions for installing the opencpu server on various distributions, either from source or using precompiled binaries. To upgrade an existing installation of opencpu on ubuntu, simply run:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>add-apt-repository ppa:opencpu/opencpu-1.6
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get dist-upgrade</code></pre></figure>
<p>Note that this will also upgrade the version of R to 3.3.0 (if you have not already done so) which might require that you reinstall some of your R packages.</p>
<p>You can also install opencpu-server on any version of Debian/Ubuntu/Fedora/CentOS/RHEL by building the deb/rpm installation package from source. This is really easy, see the readme for <a href="https://github.com/jeroenooms/opencpu-server/tree/master/debian#readme">deb</a> or <a href="https://github.com/jeroenooms/opencpu-server/tree/master/rpm#readme">rpm</a>.</p>
<h3 id="getting-started">Getting started</h3>
<p>For those completely new to OpenCPU there several resources to get started. The <a href="https://youtu.be/kAfVWxiZ-Cc">presentation</a> from last year’s useR conference gives a broad overview of the system including some basic demo’s. The <a href="https://www.opencpu.org/apps.html">example apps</a> and <a href="http://jsfiddle.net/user/opencpu/fiddles/">jsfiddle scripts</a> show how to use the <a href="https://www.opencpu.org/jslib.html">opencpu.js</a> JavaScript client. The <a href="https://opencpu.github.io/server-manual/opencpu-server.pdf">server manual</a> has contains documentation on configuring your opencpu cloud server (although installation should work out of the box).</p>
<p>Finally this <a href="http://arxiv.org/abs/1406.4806">paper</a> from my thesis describes more generally the challenges of embedded scientific computing, and the benefits (both technical and human) of decoupling your statistical computing from your front-end or application layer.</p>
<h3 id="the-public-demo-server">The public demo server</h3>
<p>To deploy your OpenCPU apps on the public server, simply push your R package to Github and configure the <a href="https://www.opencpu.org/api.html#api-ci">webhook</a> in your repository. Whenever you push an update to Github the package will be reinstalled on the server and can directly be used remotely by anyone on the internet. You can either use the full url or the <code class="language-plaintext highlighter-rouge">ocpu.io</code> shorthand url:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">https://cloud.opencpu.org/ocpu/github/{username}/{package}/</code></li>
<li><code class="language-plaintext highlighter-rouge">https://{username}.ocpu.io/{package}/</code></li>
</ul>
<p>These urls are fully equivalent. Simply replace <code class="language-plaintext highlighter-rouge">{username}</code> with your github username, and <code class="language-plaintext highlighter-rouge">{package}</code> with your package name. Note that the package name must be identical to the github repository name (as is usually the case).</p>
<h3 id="on-writing-packages">On writing packages</h3>
<p>One prerequisite for using OpenCPU is knowing how to create an R package. There is no way around this; packages are the natural container format for shipping and deploying code/data/manuals in R, and the OpenCPU API assumes this format. Luckily, writing R packages is super easy these days and can be done in less than (<a href="https://youtu.be/kAfVWxiZ-Cc?t=847">10 seconds</a>) using for example RStudio.</p>
<p>The good thing is that once you passed this little hurdle, the full power and flexibility of R and it’s packaging become available to your applications and APIs. Hadley’s latest <a href="http://r-pkgs.had.co.nz/">book</a> on writing R packages gives a nice overview of the R packaging system, and the OpenCPU API provides an easy HTTP interface to all of these features.</p>
Faster arrays and matrices in jsonlite 0.9.202016-05-11T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-0-9-20
<a href="https://www.opencpu.org/posts/jsonlite-0-9-20"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>Yesterday a new version of the <a href="https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html">jsonlite</a> package was released to CRAN. This update includes no new features, it only introduces performance optimizations.</p>
<h2 id="large-matrices">Large Matrices</h2>
<p>The jsonlite package was already highly optimized for converting vectors and data frames to json. However Gregory Jefferis and Duncan Murdoch had found that conversion of tall matrices as used by <a href="https://cran.r-project.org/web/packages/rglwidget/index.html">rglwidget</a> was slower than expected.</p>
<p>It turned out this was indeed an edge case that I had overlooked. The new version of jsonlite fixes this problem and matrix conversion should be about 200 times faster than before. Technical details follow below; first a benchmark:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Old version!</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">system.time</span><span class="p">(</span><span class="n">j</span><span class="o"><-</span><span class="n">toJSON</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">50000</span><span class="p">)))</span><span class="w">
</span><span class="n">user</span><span class="w"> </span><span class="n">system</span><span class="w"> </span><span class="n">elapsed</span><span class="w">
</span><span class="m">4.715</span><span class="w"> </span><span class="m">0.015</span><span class="w"> </span><span class="m">4.729</span><span class="w">
</span><span class="c1"># New version!</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">system.time</span><span class="p">(</span><span class="n">j</span><span class="o"><-</span><span class="n">toJSON</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">50000</span><span class="p">)))</span><span class="w">
</span><span class="n">user</span><span class="w"> </span><span class="n">system</span><span class="w"> </span><span class="n">elapsed</span><span class="w">
</span><span class="m">0.022</span><span class="w"> </span><span class="m">0.002</span><span class="w"> </span><span class="m">0.023</span><span class="w">
</span></code></pre></div></div>
<p>This artificial example (every field has the number 1) highlights the improvement. The relative improvement might be less for matrices with actual data because of additional time spent on number formatting double/integer values (which was already optimized in jsonlite a <a href="https://www.opencpu.org/posts/jsonlite-release-0-9-13/">while ago</a>).</p>
<h2 id="technical-details">Technical Details</h2>
<p>So what was the problem? The previous version of jsonlite had an elegant solution that would recurse through the dimensions of a matrix/array and apply json conversion on each of its elements. E.g. for a matrix (2D array) it would convert each row to json, and then combine the results. However it turns out that the <code class="language-plaintext highlighter-rouge">apply</code> call below is really slow.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Technical example, don't use this code !</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">50000</span><span class="p">)</span><span class="w">
</span><span class="n">rows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">:::</span><span class="n">asJSON</span><span class="p">)</span><span class="w">
</span><span class="n">json</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">:::</span><span class="n">collapse</span><span class="p">(</span><span class="n">rows</span><span class="p">,</span><span class="w"> </span><span class="n">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>The new version exploits the fact that matrices and arrays are homogenous (i.e. all elements have the same type). It first removes the dimensions from the array using <code class="language-plaintext highlighter-rouge">c(x)</code> and converts all of the individual elements to json with a single call to <code class="language-plaintext highlighter-rouge">asJSON</code>. This results in a significant speedup because <code class="language-plaintext highlighter-rouge">asJSON</code> is only called once rather than <code class="language-plaintext highlighter-rouge">n</code> times.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Technical example, don't use this code !</span><span class="w">
</span><span class="n">str</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">:::</span><span class="n">asJSON</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">x</span><span class="p">),</span><span class="w"> </span><span class="n">collapse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="nf">dim</span><span class="p">(</span><span class="n">str</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">dim</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="n">rows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">:::</span><span class="n">collapse</span><span class="p">,</span><span class="w"> </span><span class="n">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="n">json</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">:::</span><span class="n">collapse</span><span class="p">(</span><span class="n">rows</span><span class="p">,</span><span class="w"> </span><span class="n">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Things get a bit more complicated for higher dimensional arrays, especially with <code class="language-plaintext highlighter-rouge">toJSON(x, pretty = TRUE)</code> but this illustrates the core issue.</p>
<p>You might be thinking: can we avoid <code class="language-plaintext highlighter-rouge">apply</code> alltogether? Yes! For the important case of 2 dimensional arrays jsonlite has a complete C implementation which makes <code class="language-plaintext highlighter-rouge">toJSON</code> on matrices is extra fast. For higher dimensional arrays it currently still uses the solution above, which performs quite well. We might be able to further optimize this case by porting this to C as well, but working with high dimensional arrays in C makes my head hurt.</p>
Stemming and Spell Checking in R2016-03-21T00:00:00+00:00https://www.opencpu.org/posts/hunspell-1-2
<a href="https://www.opencpu.org/posts/hunspell-1-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/ijsco.jpg"></a>
<p>Last week we <a href="https://www.opencpu.org/posts/hunspell-release/">introduced</a> the new hunspell R package. This week a new version was released which adds support for additional languages and text analysis features.</p>
<h3 id="additional-languages">Additional languages</h3>
<p>By default hunspell uses the US English dictionary <code class="language-plaintext highlighter-rouge">en_US</code> but the new version allows for checking and analyzing in other languages as well. The <code class="language-plaintext highlighter-rouge">?hunspell</code> help page has detailed instructions on how to install additional dictionaries.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">hunspell</span><span class="p">)</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">hunspell_info</span><span class="p">(</span><span class="s2">"ru_RU"</span><span class="p">)</span><span class="w">
</span><span class="o">$</span><span class="n">dict</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"/Users/jeroen/workspace/hunspell/tests/testdict/ru_RU.dic"</span><span class="w">
</span><span class="o">$</span><span class="n">encoding</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"UTF-8"</span><span class="w">
</span><span class="o">$</span><span class="n">wordchars</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span></code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span><span class="w"> </span><span class="n">hunspell</span><span class="p">(</span><span class="s2">"чёртова карова"</span><span class="p">,</span><span class="w"> </span><span class="n">dict</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ru_RU"</span><span class="p">)[[</span><span class="m">1</span><span class="p">]]</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"карова"</span><span class="w">
</span></code></pre></div></div>
<p>It turned out this feature was much more difficult to implement than I expected. Much of the Hunspell library dates from before UTF-8 became popular and therefore many dictionaries use local 8 bit character encodings such as <code class="language-plaintext highlighter-rouge">ISO-8859-1</code> for English and <code class="language-plaintext highlighter-rouge">KOI8-R</code> for Russian. To spell check in these languages, the character encoding of the document text has to match that of the dictionary. However R only supports <code class="language-plaintext highlighter-rouge">latin</code> and <code class="language-plaintext highlighter-rouge">UTF-8</code> so we need to convert strings in C with <code class="language-plaintext highlighter-rouge">iconv</code>, which opens up a new can of worms. Anyway it should all work now.</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr"><a href="https://twitter.com/opencpu">@opencpu</a> hunspell_stem could be very useful in interpretation issues of e.g. <a href="https://twitter.com/hashtag/wordclouds?src=hash">#wordclouds</a>.</p>— Jelle Geertsma (@rdatasculptor) <a href="https://twitter.com/rdatasculptor/status/709320443778506752">March 14, 2016</a></blockquote>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<h3 id="text-analysis-and-wordclouds">Text analysis and wordclouds</h3>
<p>In last weeks <a href="https://www.opencpu.org/posts/hunspell-release/">post</a> we showed how to parse and spell check a latex file:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Check an entire latex document</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">hunspell</span><span class="p">)</span><span class="w">
</span><span class="n">setwd</span><span class="p">(</span><span class="n">tempdir</span><span class="p">())</span><span class="w">
</span><span class="n">download.file</span><span class="p">(</span><span class="s2">"http://arxiv.org/e-print/1406.4806v1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"1406.4806v1.tar.gz"</span><span class="p">,</span><span class="w"> </span><span class="n">mode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"wb"</span><span class="p">)</span><span class="w">
</span><span class="n">untar</span><span class="p">(</span><span class="s2">"1406.4806v1.tar.gz"</span><span class="p">)</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="s2">"content.tex"</span><span class="p">,</span><span class="w"> </span><span class="n">warn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">bad_words</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hunspell</span><span class="p">(</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"latex"</span><span class="p">)</span><span class="w">
</span><span class="n">sort</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">unlist</span><span class="p">(</span><span class="n">bad_words</span><span class="p">)))</span><span class="w">
</span></code></pre></div></div>
<p>The new version also exposes the parser directly, so you can easily extract words and derive the stems to summarize some text, for example to display in a wordcloud.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Summarize text by stems (e.g. for wordcloud)</span><span class="w">
</span><span class="n">allwords</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hunspell_parse</span><span class="p">(</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"latex"</span><span class="p">)</span><span class="w">
</span><span class="n">stems</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">unlist</span><span class="p">(</span><span class="n">hunspell_stem</span><span class="p">(</span><span class="n">unlist</span><span class="p">(</span><span class="n">allwords</span><span class="p">)))</span><span class="w">
</span><span class="n">words</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">head</span><span class="p">(</span><span class="n">sort</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">stems</span><span class="p">),</span><span class="w"> </span><span class="n">decreasing</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">),</span><span class="w"> </span><span class="m">200</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
Hunspell: Spell Checker and Text Parser for R2016-03-14T00:00:00+00:00https://www.opencpu.org/posts/hunspell-release
<a href="https://www.opencpu.org/posts/hunspell-release"><img alt="opencpu logo" src="https://www.opencpu.org/images/spelling.png"></a>
<p>Hunspell is the spell checker library used in LibreOffice, OpenOffice, Mozilla Firefox, Google Chrome, Mac OS X, InDesign, and a few more. Base R has some spell checking functionality via the <code class="language-plaintext highlighter-rouge">aspell</code> function which wraps the aspell or hunspell command line program on supported systems. The new hunspell <a href="https://cran.r-project.org/web/packages/hunspell">R package</a> on the other hand directly links to the hunspell c++ library and works on all platforms without installing additional dependencies.</p>
<h3 id="basic-tools">Basic tools</h3>
<p>The <code class="language-plaintext highlighter-rouge">hunspell_check</code> function takes a vector of words and checks each individual word for correctness.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">hunspell</span><span class="p">)</span><span class="w">
</span><span class="n">words</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"beer"</span><span class="p">,</span><span class="w"> </span><span class="s2">"wiskey"</span><span class="p">,</span><span class="w"> </span><span class="s2">"wine"</span><span class="p">)</span><span class="w">
</span><span class="n">hunspell_check</span><span class="p">(</span><span class="n">words</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] TRUE FALSE TRUE</span><span class="w">
</span></code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">hunspell</code> function takes a character vector with text (in plain, latex or man format) and returns a list with incorrect words for each line.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bad_words</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hunspell</span><span class="p">(</span><span class="s2">"spell checkers are not neccessairy for langauge ninja's"</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">bad_words</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] "neccessairy" "langauge" "ninja's" </span><span class="w">
</span></code></pre></div></div>
<p>Finally <code class="language-plaintext highlighter-rouge">hunspell_suggest</code> is used to suggest correct alternatives for each (incorrect) input word.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hunspell_suggest</span><span class="p">(</span><span class="n">bad_words</span><span class="p">[[</span><span class="m">1</span><span class="p">]])</span><span class="w">
</span><span class="c1">## [[1]]</span><span class="w">
</span><span class="c1">## [1] "necessary" "necessarily" "necessaries" "recessionary" "accessory" "incarcerate" </span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## [[2]]</span><span class="w">
</span><span class="c1">## [1] "language" "Langeland" "Lagrange" "Lange" "gaugeable" "linkage" "Langland" </span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## [[3]]</span><span class="w">
</span><span class="c1">## [1] "ninjas" "Janina's" "Nina's" "ninja" "Janine's" "meninx" "nark's"</span><span class="w">
</span></code></pre></div></div>
<h3 id="parsing-text">Parsing text</h3>
<p>The first challenge in spell-checking is extracting individual words from formatted text. The <code class="language-plaintext highlighter-rouge">hunspell</code> function supports three parsers via the <code class="language-plaintext highlighter-rouge">format</code> parameter: plain text, latex and man. For example to check the <a href="http://arxiv.org/abs/1406.4806">OpenCPU paper</a> for spelling errors we use the latex source code:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">download.file</span><span class="p">(</span><span class="s2">"http://arxiv.org/e-print/1406.4806v1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"1406.4806v1.tar.gz"</span><span class="p">,</span><span class="w"> </span><span class="n">mode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"wb"</span><span class="p">)</span><span class="w">
</span><span class="n">untar</span><span class="p">(</span><span class="s2">"1406.4806v1.tar.gz"</span><span class="p">)</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="s2">"content.tex"</span><span class="p">,</span><span class="w"> </span><span class="n">warn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">words</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hunspell</span><span class="p">(</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"latex"</span><span class="p">)</span><span class="w">
</span><span class="n">sort</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">unlist</span><span class="p">(</span><span class="n">words</span><span class="p">)))</span><span class="w">
</span></code></pre></div></div>
<p>Base R also has a few filters to extract words from R, Sweave or Rd code, see <code class="language-plaintext highlighter-rouge">RdTextFilter</code>, <code class="language-plaintext highlighter-rouge">SweaveTeXFilter</code> in tools. For example to check your R package manual for typos (assuming you are in the pkg source dir)</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span><span class="p">(</span><span class="n">file</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">list.files</span><span class="p">(</span><span class="s2">"man"</span><span class="p">,</span><span class="w"> </span><span class="n">full.names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)){</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="s2">"\nFile"</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="s2">":\n "</span><span class="p">)</span><span class="w">
</span><span class="n">txt</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tools</span><span class="o">::</span><span class="n">RdTextFilter</span><span class="p">(</span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="n">keepSpacing</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">sQuote</span><span class="p">(</span><span class="n">sort</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">unlist</span><span class="p">(</span><span class="n">hunspell</span><span class="p">(</span><span class="n">txt</span><span class="p">))))),</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="s2">", "</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<h3 id="morphological-analysis">Morphological analysis</h3>
<p>A cool feature in hunspell is the morphological analysis. The <code class="language-plaintext highlighter-rouge">hunspell_analyze</code> function will show you how a word breaks down into a valid stem plus affix. Hunspell uses a special dictionary format to determine if a stem+affix combination is valid in a given language.</p>
<p>For example suppose we take a few variations of the word <em>love</em>. To get the possible stems+affix for each word:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hunspell_analyze</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"love"</span><span class="p">,</span><span class="w"> </span><span class="s2">"loving"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lovingly"</span><span class="p">,</span><span class="w"> </span><span class="s2">"loved"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lover"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lovely"</span><span class="p">,</span><span class="w"> </span><span class="s2">"love"</span><span class="p">))</span><span class="w">
</span><span class="c1">## [1] " st:love"</span><span class="w">
</span><span class="c1">## [1] " st:loving" " st:love fl:G"</span><span class="w">
</span><span class="c1">## [1] " st:lovingly"</span><span class="w">
</span><span class="c1">## [1] " st:loved" " st:love fl:D"</span><span class="w">
</span><span class="c1">## [1] " st:lover" " st:love fl:R"</span><span class="w">
</span><span class="c1">## [1] " st:lovely" " st:love fl:Y"</span><span class="w">
</span><span class="c1">## [1] " st:love"</span><span class="w">
</span></code></pre></div></div>
<p>Alternatively the <code class="language-plaintext highlighter-rouge">hunspell_stem</code> returns only the stem. Not sure how you would use this but it’s certainly cool.</p>
<h3 id="thanks">Thanks!</h3>
<p>Thanks to Daniel Falbel for <a href="https://discuss.ropensci.org/t/r-interface-with-hunspell/327">suggesting</a> this package on the rOpenSci forums!</p>
OpenCPU Server Release 1.5.42016-02-05T00:00:00+00:00https://www.opencpu.org/posts/opencpu-1-5-4
<a href="https://www.opencpu.org/posts/opencpu-1-5-4"><img alt="opencpu logo" src="https://www.opencpu.org/images/stockplot.png"></a>
<p>Version 1.5.4 of the OpenCPU server has been released to <a href="https://launchpad.net/~opencpu/+archive/ubuntu/opencpu-1.5">Launchpad</a> (Ubuntu) and <a href="http://software.opensuse.org/download.html?project=home:jeroenooms:opencpu-1.5&package=opencpu">OBS</a> (Fedora). This update does not introduce any changes to the OpenCPU API itself; it improves to the deb/rpm installation packages and upgrades the bundled opencpu system R <a href="https://github.com/jeroenooms/opencpu-server/tree/v1.5/opencpu-lib">package library</a>.</p>
<h3 id="installing-and-updating">Installing and Updating</h3>
<p>Existing Ubuntu and Fedora serves that are already running the 1.5 branch will automatically update the next time they run <code class="language-plaintext highlighter-rouge">apt-get update</code> or <code class="language-plaintext highlighter-rouge">yum update</code>. Alternatively, to install OpenCPU server on a fresh Ubuntu 14.04 machine:</p>
<figure class="highlight"><pre><code class="language-sh" data-lang="sh"><span class="nb">sudo </span>add-apt-repository <span class="nt">-y</span> ppa:opencpu/opencpu-1.5
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get <span class="nb">install</span> <span class="nt">-y</span> opencpu</code></pre></figure>
<p>Or to install it on Fedora 22 or 23 from <a href="http://software.opensuse.org/download.html?project=home:jeroenooms:opencpu-1.5&package=opencpu">OBS</a>:</p>
<figure class="highlight"><pre><code class="language-sh" data-lang="sh"><span class="nb">cd</span> /etc/yum.repos.d/
wget http://download.opensuse.org/repositories/home:jeroenooms:opencpu-1.5/Fedora_23/home:jeroenooms:opencpu-1.5.repo
yum <span class="nb">install </span>opencpu</code></pre></figure>
<p>To install OpenCPU server on other distributions, simplfy follow the instructions to build the <a href="https://github.com/jeroenooms/opencpu-server/tree/master/debian#readme">deb</a> (Debian/Ubuntu) or <a href="https://github.com/jeroenooms/opencpu-server/blob/master/rpm/buildscript.sh">rpm</a> (Fedora/CentOS/RHEL) packages from source, which is very easy.</p>
<h3 id="the-opencpu-package-library">The OpenCPU Package Library</h3>
<p>Because OpenCPU is implemented completely in R, the server stack ships with a private library of R packages needed by the system in <code class="language-plaintext highlighter-rouge">/usr/lib/opencpu/library</code>. The isolated library allows you to freely install/upgrade/uninstall your own R packages on your server without accidentaly breaking the OpenCPU server. This is critical to guarantee the system is stable at all times and unaffected by whatever crazy things are happening in R.</p>
<p>However a side effect of this design is that for these system packages, the user might see a different package version when calling R via the OpenCPU API than when running R from the terminal on the same server. This is unfortunate because the OpenCPU is meant to provide a transparent HTTP API to the system’s R installation. One solution would be to add the opencpu library to your <code class="language-plaintext highlighter-rouge">.libPaths()</code> but this is unnecessarily annoying and complicated.</p>
<p>To make this easier, the OpenCPU rpm/deb packages now automatically create symlinks to the OpenCPU system library in the global R package library. Thereby the OpenCPU system library is still safely isolated, but the packages are also visible when running R in the terminal, hence we don’t need to install them again. Hopefully this makes managing packages on your OpenCPU server a little easier.</p>
Commonmark: Super Fast Markdown Rendering in R2016-02-03T00:00:00+00:00https://www.opencpu.org/posts/commonmark-fast
<a href="https://www.opencpu.org/posts/commonmark-fast"><img alt="opencpu logo" src="https://www.opencpu.org/images/warpeace.jpg"></a>
<p>A few months ago I first announced the commonmark R package. Since then there have been a few more releases… time for an update!</p>
<h3 id="what-is-commonmark">What is CommonMark?</h3>
<p>Markdown is used in many places these days, however the original <a href="https://daringfireball.net/projects/markdown/syntax">spec</a> actually leaves some ambiguity which makes it difficult to optimize and leads to inconsistencies between implementations.
Commonmark is an initiative led by John MacFarlane at UC Berkeley (also the author of pandoc) to standardize the markdown syntax.
Besides a <a href="http://spec.commonmark.org">specification</a>, the commonmark team provides reference implementations for C (<a href="https://github.com/jgm/cmark">cmark</a>) and JavaScript (<a href="https://github.com/jgm/commonmark.js">commonmark.js</a>).</p>
<p>The <a href="https://cran.r-project.org/web/packages/commonmark/index.html">commonmark R package</a> wraps around cmark which converts markdown text into various formats, including html, latex and groff man. This makes commonmark very suitable for e.g. writing manual pages which are often stored in exactly these formats. In addition the package exposes the markdown parse tree in xml format to support customized output handling.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Load library</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">commonmark</span><span class="p">)</span><span class="w">
</span><span class="c1"># Render some markdown</span><span class="w">
</span><span class="n">md</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="n">curl</span><span class="o">::</span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://raw.githubusercontent.com/yihui/knitr/master/NEWS.md"</span><span class="p">))</span><span class="w">
</span><span class="n">html</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">markdown_html</span><span class="p">(</span><span class="n">md</span><span class="p">)</span><span class="w">
</span><span class="n">man</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">markdown_man</span><span class="p">(</span><span class="n">md</span><span class="p">)</span><span class="w">
</span><span class="n">tex</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">markdown_latex</span><span class="p">(</span><span class="n">md</span><span class="p">)</span><span class="w">
</span><span class="c1"># Syntax tree</span><span class="w">
</span><span class="n">xml</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">markdown_xml</span><span class="p">(</span><span class="n">md</span><span class="p">)</span><span class="w">
</span><span class="c1"># Back to (standardized) markdown</span><span class="w">
</span><span class="n">cm</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">markdown_commonmark</span><span class="p">(</span><span class="n">md</span><span class="p">)</span></code></pre></figure>
<p>Currently, commonmark only specifies the original markdown elements: italic, bold, headings, links, images, quotes, paragraphs, lists, horizontal rule, and code blocks. Extensions from pandoc that were introduced later on such as tables are not supported.</p>
<h3 id="commonmark-is-fast">CommonMark is fast</h3>
<p>The cmark library is written in elegant C code and highly optimized. It <a href="https://github.com/jgm/cmark#readme">renders</a> a Markdown version of <em>War and Peace</em> in the blink of an eye (127 milliseconds on a ten year old laptop, vs. 100-400 milliseconds for an eye blink). A simple benchmark in R confirms that our example above is converted to any of the formats in only a few milliseconds.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">microbenchmark</span><span class="p">)</span><span class="w">
</span><span class="n">microbenchmark</span><span class="p">(</span><span class="w">
</span><span class="n">markdown_html</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">markdown_html</span><span class="p">(</span><span class="n">md</span><span class="p">),</span><span class="w">
</span><span class="n">markdown_man</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">markdown_man</span><span class="p">(</span><span class="n">md</span><span class="p">),</span><span class="w">
</span><span class="n">markdown_latex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">markdown_latex</span><span class="p">(</span><span class="n">md</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: milliseconds</span><span class="w">
</span><span class="c1"># expr min lq mean median uq max neval</span><span class="w">
</span><span class="c1"># markdown_html 3.228492 3.243339 3.318437 3.263184 3.359420 3.902745 100</span><span class="w">
</span><span class="c1"># markdown_man 5.768978 5.803062 5.885971 5.862607 5.942159 6.177985 100</span><span class="w">
</span><span class="c1"># markdown_latex 5.906757 5.946995 6.049409 6.001677 6.107563 7.619014 100</span></code></pre></figure>
<p>The main benefit, besides Tolstoy saving some time on typesetting, is that cmark alows for shipping documents such as help pages in native markdown format and render them on-the-fly in html/latex/man without noticable performance overhead. This is very nice for editing and maintaining any sort of portable, dynamic documentation.</p>
<h3 id="markdown-in-r-documentation">Markdown in R documentation</h3>
<p>Several people have independently had the idea to add support for markdown to R documentation which would be super awesome. Gábor has started a package called <a href="https://github.com/gaborcsardi/maxygen">maxygen</a> which might <a href="https://github.com/klutometis/roxygen/pull/431">get merged</a> into roxygen2 at some point. This allows for inserting emphasis, boldface, codeblocks, lists, links, and images in your roxygen fields using simple markdown notation rather than the ugly Rd format.</p>
<p>There has also been some <a href="https://stat.ethz.ch/pipermail/r-devel/2015-May/071219.html">discussion</a> on the r-devel mailing list about extending support for markdown in R and CRAN, but that mostly seems to concern NEWS and README files.</p>
New in V8: Calling R, from JavaScript, from R, from Javascript...2016-02-02T00:00:00+00:00https://www.opencpu.org/posts/v8-release-0-10
<a href="https://www.opencpu.org/posts/v8-release-0-10"><img alt="opencpu logo" src="https://www.opencpu.org/images/v8engine.jpg"></a>
<p>The V8 package provides an R interface to Google’s open source JavaScript engine. The package is completely self contained and requires no runtime dependencies, making it very easy to execute JavaScript code from R. A hand full of CRAN packages use V8 to provide R bindings to useful JavaScript libraries. Have a look at the <a href="https://cran.r-project.org/web/packages/V8/vignettes/v8_intro.html">v8 vignette</a> to get started.</p>
<h2 id="callback-to-r">Callback To R</h2>
<p>New in version 0.10 is the ability to call back to R from within JavaScript using the <code class="language-plaintext highlighter-rouge">console.r</code> API. This is most easily demonstrated via V8’s interactive JavaScript console:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ctx</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">V8</span><span class="o">::</span><span class="n">v8</span><span class="p">()</span><span class="w">
</span><span class="n">ctx</span><span class="o">$</span><span class="n">console</span><span class="p">()</span></code></pre></figure>
<p>From JavaScript we can read/write R objects via <code class="language-plaintext highlighter-rouge">console.r.get</code> and <code class="language-plaintext highlighter-rouge">console.r.assign</code>, analogous to <code class="language-plaintext highlighter-rouge">get</code> and <code class="language-plaintext highlighter-rouge">assign</code> in R. The final argument is an optional list with arguments passed to <code class="language-plaintext highlighter-rouge">toJSON</code> or <code class="language-plaintext highlighter-rouge">fromJSON</code> which are used behind the scenes to convert objects between R and JavaScript.</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="c1">// read the iris object into JS</span>
<span class="kd">var</span> <span class="nx">iris</span> <span class="o">=</span> <span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="kd">get</span><span class="p">(</span><span class="dl">"</span><span class="s2">iris</span><span class="dl">"</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">iris_col</span> <span class="o">=</span> <span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="kd">get</span><span class="p">(</span><span class="dl">"</span><span class="s2">iris</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span><span class="na">dataframe</span> <span class="p">:</span> <span class="dl">"</span><span class="s2">col</span><span class="dl">"</span><span class="p">})</span>
<span class="c1">//write an object back to the R session</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="dl">"</span><span class="s2">iris2</span><span class="dl">"</span><span class="p">,</span> <span class="nx">iris</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="dl">"</span><span class="s2">iris3</span><span class="dl">"</span><span class="p">,</span> <span class="nx">iris</span><span class="p">,</span> <span class="p">{</span><span class="na">simplifyVector</span> <span class="p">:</span> <span class="kc">false</span><span class="p">})</span></code></pre></figure>
<p>Use <code class="language-plaintext highlighter-rouge">console.r.call</code> to call R functions. The first argument should be a string which evaluates to a function. The second argument contains a list of arguments passed to the function, similar to <code class="language-plaintext highlighter-rouge">do.call</code> in R. Both named and unnamed lists are supported. The return object is returned to JavaScript via JSON.</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="c1">//calls rnorm(n=2, mean=10, sd=5)</span>
<span class="kd">var</span> <span class="nx">out</span> <span class="o">=</span> <span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="dl">'</span><span class="s1">rnorm</span><span class="dl">'</span><span class="p">,</span> <span class="p">{</span><span class="na">n</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span><span class="na">mean</span><span class="p">:</span><span class="mi">10</span><span class="p">,</span> <span class="na">sd</span><span class="p">:</span><span class="mi">5</span><span class="p">})</span>
<span class="kd">var</span> <span class="nx">out</span> <span class="o">=</span> <span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="dl">'</span><span class="s1">rnorm</span><span class="dl">'</span><span class="p">,</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="c1">//anonymous function</span>
<span class="kd">var</span> <span class="nx">out</span> <span class="o">=</span> <span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="dl">'</span><span class="s1">function(x){x^2}</span><span class="dl">'</span><span class="p">,</span> <span class="p">{</span><span class="na">x</span><span class="p">:</span><span class="mi">12</span><span class="p">})</span></code></pre></figure>
<p>There is also a <code class="language-plaintext highlighter-rouge">console.r.eval</code> function, which evaluates raw R code. It takes only a single argument (the string to evaluate) and does not return anything. Output is printed to the console.</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nb">eval</span><span class="p">(</span><span class="dl">'</span><span class="s1">sessionInfo()</span><span class="dl">'</span><span class="p">)</span></code></pre></figure>
<p>Besides automatically converting objects, V8 also propagates exceptions between R, C++ and JavaScript up and down the stack. Hence you can catch R errors as JavaScript exceptions when calling an R function from JavaScript or vice versa. If nothing gets caught, exceptions bubble all the way up as R errors in your top-level R session.</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="c1">//raise an error in R</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="dl">'</span><span class="s1">stop("ouch!")</span><span class="dl">'</span><span class="p">)</span>
<span class="c1">//catch error from JavaScript</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="dl">'</span><span class="s1">stop("ouch!")</span><span class="dl">'</span><span class="p">)</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">Uhoh R had an error: </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">e</span><span class="p">)</span>
<span class="p">}</span></code></pre></figure>
<p>Thanks to Barret Schloerke for <a href="https://github.com/jeroenooms/V8/issues/20">suggesting</a> this feature and Dirk for pointing me in the right direction on how to call R functions from Rcpp (which is <a href="https://github.com/jeroenooms/V8/blob/v0.10/src/V8.cpp#L75-L84">surprisingly easy</a>).</p>
Using webp in R: A New Format for Lossless and Lossy Image Compression2016-01-25T00:00:00+00:00https://www.opencpu.org/posts/webp-release
<a href="https://www.opencpu.org/posts/webp-release"><img alt="opencpu logo" src="https://www.opencpu.org/images/pancake.png"></a>
<p>A while ago I blogged about <a href="../brotli-benchmarks">brotli</a>, a new general purpose compression algorithm promoted by Google as an alternative to gzip. The same company also happens to be working on a new format for images called <a href="https://developers.google.com/speed/webp">webp</a>, which is actually a derivative of the VP8 video format. Google claims webp provides superior compression for both lossless (png) and lossy (jpeg) bitmaps, and even though the format is currently only supported in Google Chrome, it seems indeed promising.</p>
<p>The <a href="https://cran.rstudio.com/web/packages/webp/">webp</a> R package allows for reading/writing webp bitmap arrays so that we can convert between other bitmap formats. For example, let’s take this photo of a delicious and nutritious <a href="https://www.instagram.com/feelgoodbyfood/">feelgoodbyfood</a> spelt-pancake with coconut sprinkles and homemade espresso (see <a href="https://www.feelgoodbyfood.nl/7x-winters-ontbijt">here</a> for 7 other healthy winter breakfasts!)</p>
<p><img src="../../images/pancake.jpg" class="img-responsive" /></p>
<p>We read the jpeg file into a bitmap and then write it to webp:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">webp</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">jpeg</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">curl</span><span class="p">)</span><span class="w">
</span><span class="n">curl_download</span><span class="p">(</span><span class="s2">"https://www.opencpu.org/images/pancake.jpg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"pancake.jpg"</span><span class="p">)</span><span class="w">
</span><span class="n">bitmap</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readJPEG</span><span class="p">(</span><span class="s2">"pancake.jpg"</span><span class="p">)</span><span class="w">
</span><span class="n">write_webp</span><span class="p">(</span><span class="n">bitmap</span><span class="p">,</span><span class="w"> </span><span class="s2">"pancake.webp"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Only works in Google Chrome</span><span class="w">
</span><span class="n">browseURL</span><span class="p">(</span><span class="s2">"pancake.webp"</span><span class="p">)</span></code></pre></figure>
<p>Of course it works the other way around as well. To read the webp image back into a bitmap and write it to png:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">png</span><span class="p">)</span><span class="w">
</span><span class="n">bitmap2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_webp</span><span class="p">(</span><span class="s2">"pancake.webp"</span><span class="p">)</span><span class="w">
</span><span class="n">writePNG</span><span class="p">(</span><span class="n">bitmap2</span><span class="p">,</span><span class="w"> </span><span class="s2">"pancake.png"</span><span class="p">)</span><span class="w">
</span><span class="n">browseURL</span><span class="p">(</span><span class="s2">"pancake.png"</span><span class="p">)</span></code></pre></figure>
<h2 id="rendering-graphics-to-webp">Rendering graphics to webp</h2>
<p>The best way to write plots in webp format is using an svg device and then render to bitmap with the <a href="../svg-release">rsvg package</a>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># create an svg image</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">svglite</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">svglite</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">)</span><span class="w">
</span><span class="n">qplot</span><span class="p">(</span><span class="n">mpg</span><span class="p">,</span><span class="w"> </span><span class="n">wt</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">cyl</span><span class="p">))</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span><span class="w">
</span><span class="c1"># render it into a high definition bitmap image</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rsvg</span><span class="p">)</span><span class="w">
</span><span class="n">rsvg_webp</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"plot.webp"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1920</span><span class="p">)</span><span class="w">
</span><span class="n">browseURL</span><span class="p">(</span><span class="s2">"plot.webp"</span><span class="p">)</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">write_webp</code> function has a <code class="language-plaintext highlighter-rouge">quality</code> parameter (integer between 1 and 100) which can be used to tune the quality-size trade-off for lossy compression. A <code class="language-plaintext highlighter-rouge">quality=100</code> equals lossless compression; the default <code class="language-plaintext highlighter-rouge">quality=80</code> provides considerable size reduction with negligible loss of quality.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">rsvg</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">webp</span><span class="p">)</span><span class="w">
</span><span class="n">tiger</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rsvg</span><span class="p">(</span><span class="s2">"http://dev.w3.org/SVG/tools/svgweb/samples/svg-files/tiger.svg"</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">720</span><span class="p">)</span><span class="w">
</span><span class="n">write_webp</span><span class="p">(</span><span class="n">tiger</span><span class="p">,</span><span class="w"> </span><span class="s2">"tiger100.webp"</span><span class="p">,</span><span class="w"> </span><span class="n">quality</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w">
</span><span class="n">write_webp</span><span class="p">(</span><span class="n">tiger</span><span class="p">,</span><span class="w"> </span><span class="s2">"tiger80.webp"</span><span class="p">,</span><span class="w"> </span><span class="n">quality</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">80</span><span class="p">)</span><span class="w">
</span><span class="n">write_webp</span><span class="p">(</span><span class="n">tiger</span><span class="p">,</span><span class="w"> </span><span class="s2">"tiger50.webp"</span><span class="p">,</span><span class="w"> </span><span class="n">quality</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">50</span><span class="p">)</span></code></pre></figure>
<p>Unfortunately webp will probably not become mainstream until it gets implemented by all browsers. But performance seems pretty good so perhaps it could actually be useful for large image compression in scientific applications.</p>
The 'rsvg' Package: High Quality Image Rendering in R2016-01-25T00:00:00+00:00https://www.opencpu.org/posts/svg-release
<a href="https://www.opencpu.org/posts/svg-release"><img alt="opencpu logo" src="https://www.opencpu.org/images/tiger.png"></a>
<p>The new <a href="https://cran.r-project.org/web/packages/rsvg/index.html">rsvg</a> package renders (vector based) SVG images into high-quality bitmap arrays. The resulting image is an array of 3 dimensions: height * width * 4 (RGBA) and can be written to png, jpeg or webp format:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># create an svg image</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">svglite</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">svglite</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">)</span><span class="w">
</span><span class="n">qplot</span><span class="p">(</span><span class="n">mpg</span><span class="p">,</span><span class="w"> </span><span class="n">wt</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">cyl</span><span class="p">))</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span><span class="w">
</span><span class="c1"># render it into a bitmap array</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rsvg</span><span class="p">)</span><span class="w">
</span><span class="n">bitmap</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rsvg</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">)</span><span class="w">
</span><span class="nf">dim</span><span class="p">(</span><span class="n">bitmap</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] 504 720 4</span><span class="w">
</span><span class="c1"># write to format</span><span class="w">
</span><span class="n">png</span><span class="o">::</span><span class="n">writePNG</span><span class="p">(</span><span class="n">bitmap</span><span class="p">,</span><span class="w"> </span><span class="s2">"bitmap.png"</span><span class="p">)</span><span class="w">
</span><span class="n">jpeg</span><span class="o">::</span><span class="n">writeJPEG</span><span class="p">(</span><span class="n">bitmap</span><span class="p">,</span><span class="w"> </span><span class="s2">"bitmap.jpg"</span><span class="p">,</span><span class="w"> </span><span class="n">quality</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">webp</span><span class="o">::</span><span class="n">write_webp</span><span class="p">(</span><span class="n">bitmap</span><span class="p">,</span><span class="w"> </span><span class="s2">"bitmap.webp"</span><span class="p">,</span><span class="w"> </span><span class="n">quality</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span></code></pre></figure>
<p>The advantage of storing your plots in svg format is they can be rendered later into an arbitrary resolution and format <strong><em>without loss of quality</em></strong>! Each rendering fucntion takes a <code class="language-plaintext highlighter-rouge">width</code> and <code class="language-plaintext highlighter-rouge">height</code> parameter. When neither width or height is set bitmap resolution matches that of the input svg. When either width or height is specified, the image is scaled proportionally. When both width and height are specified, the image is stretched into the requested size. For example suppose we need to render the plot into ultra HD so that it is crisp as toast when printed a conference poster:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># render it into a bitmap array</span><span class="w">
</span><span class="n">bitmap</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rsvg</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3840</span><span class="p">)</span><span class="w">
</span><span class="n">png</span><span class="o">::</span><span class="n">writePNG</span><span class="p">(</span><span class="n">bitmap</span><span class="p">,</span><span class="w"> </span><span class="s2">"plot_4k.png"</span><span class="p">,</span><span class="w"> </span><span class="n">dpi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">144</span><span class="p">)</span><span class="w">
</span><span class="n">browseURL</span><span class="p">(</span><span class="s2">"plot_4k.png"</span><span class="p">)</span></code></pre></figure>
<p>Rather than actually dealing with the bitmap array in R, rsvg also allows you to directly render the image to various output formats, which is slighly faster.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># render straight to output format</span><span class="w">
</span><span class="n">rsvg_pdf</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"out.pdf"</span><span class="p">)</span><span class="w">
</span><span class="n">rsvg_ps</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"out.ps"</span><span class="p">)</span><span class="w">
</span><span class="n">rsvg_svg</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"out.svg"</span><span class="p">)</span><span class="w">
</span><span class="n">rsvg_png</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"out.png"</span><span class="p">)</span><span class="w">
</span><span class="n">rsvg_webp</span><span class="p">(</span><span class="s2">"plot.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"out.webp"</span><span class="p">)</span></code></pre></figure>
<p>Added bonus is that librsvg does not only do a really good job rendering, it is also super fast. It would even be fast enough to render the svg <a href="http://dev.w3.org/SVG/tools/svgweb/samples/svg-files/tiger.svg">tiger</a> on the fly at 10~20fps!</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">download.file</span><span class="p">(</span><span class="s2">"http://dev.w3.org/SVG/tools/svgweb/samples/svg-files/tiger.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"tiger.svg"</span><span class="p">)</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">bin</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rsvg_raw</span><span class="p">(</span><span class="s2">"tiger.svg"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.048 0.003 0.057</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">rsvg_webp</span><span class="p">(</span><span class="s2">"tiger.svg"</span><span class="p">,</span><span class="w"> </span><span class="s2">"tiger.webp"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.097 0.006 0.115</span></code></pre></figure>
<p>Note the <code class="language-plaintext highlighter-rouge">webp</code> format is the new high-quality image format by Google which I will talk about in <a href="../webp-release">another post</a>.</p>
Compression Benchmarks: brotli, gzip, xz, bz22015-11-27T00:00:00+00:00https://www.opencpu.org/posts/brotli-benchmarks
<a href="https://www.opencpu.org/posts/brotli-benchmarks"><img alt="opencpu logo" src="https://www.opencpu.org/images/brotli1.png"></a>
<p>Brotli is a new compression algorithm optimized for the web, in particular small text documents. Brotli decompression is at least as fast as for gzip while significantly improving the compression ratio. The price we pay is that compression is much slower than gzip. Brotli is therefore most effective for serving static content such as fonts and html pages.</p>
<p>The <a href="https://cran.r-project.org/web/packages/brotli/index.html">brotli</a> package is now on CRAN and supports both compression and decompression of the brotli format. Let’s benchmark the available compression formats in R using a some example text data from the <a href="https://raw.githubusercontent.com/wch/r-source/trunk/COPYING">COPYING</a> file.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">brotli</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="c1"># Example data</span><span class="w">
</span><span class="n">myfile</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">file.path</span><span class="p">(</span><span class="n">R.home</span><span class="p">(),</span><span class="w"> </span><span class="s2">"COPYING"</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readBin</span><span class="p">(</span><span class="n">myfile</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">(),</span><span class="w"> </span><span class="n">file.info</span><span class="p">(</span><span class="n">myfile</span><span class="p">)</span><span class="o">$</span><span class="n">size</span><span class="p">)</span><span class="w">
</span><span class="c1"># The usual suspects</span><span class="w">
</span><span class="n">y1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">memCompress</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="s2">"gzip"</span><span class="p">)</span><span class="w">
</span><span class="n">y2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">memCompress</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="s2">"bzip2"</span><span class="p">)</span><span class="w">
</span><span class="n">y3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">memCompress</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="s2">"xz"</span><span class="p">)</span><span class="w">
</span><span class="n">y4</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">brotli_compress</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></code></pre></figure>
<p>Confirm that all algorithms are indeed lossless:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">stopifnot</span><span class="p">(</span><span class="n">identical</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">memDecompress</span><span class="p">(</span><span class="n">y1</span><span class="p">,</span><span class="w"> </span><span class="s2">"gzip"</span><span class="p">)))</span><span class="w">
</span><span class="n">stopifnot</span><span class="p">(</span><span class="n">identical</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">memDecompress</span><span class="p">(</span><span class="n">y2</span><span class="p">,</span><span class="w"> </span><span class="s2">"bzip2"</span><span class="p">)))</span><span class="w">
</span><span class="n">stopifnot</span><span class="p">(</span><span class="n">identical</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">memDecompress</span><span class="p">(</span><span class="n">y3</span><span class="p">,</span><span class="w"> </span><span class="s2">"xz"</span><span class="p">)))</span><span class="w">
</span><span class="n">stopifnot</span><span class="p">(</span><span class="n">identical</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">brotli_decompress</span><span class="p">(</span><span class="n">y4</span><span class="p">)))</span></code></pre></figure>
<h2 id="compression-ratio">Compression ratio</h2>
<p>If we compare compression ratios, we can see Brotli significantly outperformes the competition for this example.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Combine data</span><span class="w">
</span><span class="n">alldata</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="w"> </span><span class="p">(</span><span class="w">
</span><span class="n">algo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"gzip"</span><span class="p">,</span><span class="w"> </span><span class="s2">"bzip2"</span><span class="p">,</span><span class="w"> </span><span class="s2">"xz (lzma2)"</span><span class="p">,</span><span class="w"> </span><span class="s2">"brotli"</span><span class="p">),</span><span class="w">
</span><span class="n">ratio</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">y1</span><span class="p">),</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">y2</span><span class="p">),</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">y3</span><span class="p">),</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">y4</span><span class="p">))</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">alldata</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">algo</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">algo</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ratio</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_bar</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">xlab</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Compressed ratio (less is better)"</span><span class="p">)</span></code></pre></figure>
<p><img src="../../images/brotli1.png" alt="brotli compression ratio" /></p>
<h2 id="decompression-speed">Decompression speed</h2>
<p>Perhaps the most important performance dimension for internet formats is decompression speed. Clients should be able to decompress quickly, even with limited resources such as on browsers and mobile devices.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">microbenchmark</span><span class="p">)</span><span class="w">
</span><span class="n">bm</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">microbenchmark</span><span class="p">(</span><span class="w">
</span><span class="n">memDecompress</span><span class="p">(</span><span class="n">y1</span><span class="p">,</span><span class="w"> </span><span class="s2">"gzip"</span><span class="p">),</span><span class="w">
</span><span class="n">memDecompress</span><span class="p">(</span><span class="n">y2</span><span class="p">,</span><span class="w"> </span><span class="s2">"bzip2"</span><span class="p">),</span><span class="w">
</span><span class="n">memDecompress</span><span class="p">(</span><span class="n">y3</span><span class="p">,</span><span class="w"> </span><span class="s2">"xz"</span><span class="p">),</span><span class="w">
</span><span class="n">brotli_decompress</span><span class="p">(</span><span class="n">y4</span><span class="p">),</span><span class="w">
</span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">alldata</span><span class="o">$</span><span class="n">decompression</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">summary</span><span class="p">(</span><span class="n">bm</span><span class="p">)</span><span class="o">$</span><span class="n">median</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">alldata</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">algo</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">algo</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">decompression</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_bar</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">xlab</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Decompression time (less is better)"</span><span class="p">)</span></code></pre></figure>
<p><img src="../../images/brotli2.png" alt="brotli decompression speed" /></p>
<p>We see that brotli is very similar to gzip in decompression speed. We also see why bzip2 and xz have never replaced gzip as the standard compression method on the internet, even though they have better compression ratio: they are several times slower to decompress.</p>
<h2 id="compression-speed">Compression speed</h2>
<p>So far Brotli showed the best compression ratio, with decompression performance comparable to gzip. But there is no such thing as a free pastry in Switzerland. Here is the caveat: compressing data with brotli is complex and slow:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">microbenchmark</span><span class="p">)</span><span class="w">
</span><span class="n">bm</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">microbenchmark</span><span class="p">(</span><span class="w">
</span><span class="n">memCompress</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="s2">"gzip"</span><span class="p">),</span><span class="w">
</span><span class="n">memCompress</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="s2">"bzip2"</span><span class="p">),</span><span class="w">
</span><span class="n">memCompress</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="s2">"xz"</span><span class="p">),</span><span class="w">
</span><span class="n">brotli_compress</span><span class="p">(</span><span class="n">x</span><span class="p">),</span><span class="w">
</span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">alldata</span><span class="o">$</span><span class="n">compression</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">summary</span><span class="p">(</span><span class="n">bm</span><span class="p">)</span><span class="o">$</span><span class="n">median</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">alldata</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">algo</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">algo</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">compression</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_bar</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">xlab</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Compression time (less is better)"</span><span class="p">)</span></code></pre></figure>
<p><img src="../../images/brotli3.png" alt="brotli compression speed" /></p>
<p>Hence we can conclude that Brotli is mostly nice for clients, with decompression performance comparable to gzip while significantly improving the compression ratio. These are powerful properties for serving static content such as fonts and html pages.</p>
<p>However compression performance, at least for the current implementation, is considerably slower than gzip, which makes Brotli unsuitable for on-the-fly compression in http servers or other data streams.</p>
Sodium: A Modern and Easy-to-Use Crypto Library2015-10-19T00:00:00+00:00https://www.opencpu.org/posts/sodium-0-2
<a href="https://www.opencpu.org/posts/sodium-0-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/securitycat.jpg"></a>
<p>This week a new package called <a href="https://cran.r-project.org/web/packages/sodium/index.html">sodium</a> was released on CRAN. This package implements bindings to <a href="https://github.com/jedisct1/libsodium#readme">libsodium</a>: a modern, easy-to-use software library for encryption, decryption, signatures, password hashing and more.</p>
<p>Libsodium is actually a portable fork of Daniel Bernstein’s famous <a href="http://nacl.cr.yp.to/">NaCL</a> crypto library, which provides core operations needed to build higher-level cryptographic tools. It is not intended for implementing standardized protocols such as TLS, SSH or GPG, you still need something like OpenSSL for that. Sodium only supports a limited set of state-of-the-art elliptic curve methods, resulting in a simple but very powerful tool-kit for building secure applications.</p>
<h3 id="getting-started-with-sodium">Getting started with Sodium</h3>
<p>The package includes two nice vignettes to get you started:</p>
<ul>
<li><a href="https://cran.r-project.org/web/packages/sodium/vignettes/intro.html">Introduction to Sodium for R</a>: basic hands-on introduction to the sodium R package. Gives an overview of the available encryption methods and examples of how to use them</li>
<li><a href="https://cran.r-project.org/web/packages/sodium/vignettes/crypto101.html">How does cryptography work</a>: a conceptual intro on cryptographic methods with examples from Sodium</li>
</ul>
<p>If you always wanted to understand how encryption works without getting a degree in computer science, check out the latter. The basic techniques are easy to understand because cryptographers have done a great job at abstracting the mathematical details into simple hash functions and Diffie-Hellman functions.</p>
<h3 id="installing-sodium">Installing Sodium</h3>
<p>On Windows on OSX simply install the binary packages from CRAN:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s2">"sodium"</span><span class="p">)</span></code></pre></figure>
<p>On Linux you need sodium shared library which is called <code class="language-plaintext highlighter-rouge">libsodium-dev</code> on Debian/Ubuntu and <code class="language-plaintext highlighter-rouge">libsodium-devel</code> on Fedora/EPEL. Because this library is relatively young, it is only available for recent versions of these distributions. For Ubuntu 12.04 and 14.04 there are backports available from <a href="https://launchpad.net/~chris-lea/+archive/ubuntu/libsodium">Launchpad</a>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">sudo</span><span class="w"> </span><span class="n">add</span><span class="o">-</span><span class="n">apt</span><span class="o">-</span><span class="n">repository</span><span class="w"> </span><span class="n">ppa</span><span class="o">:</span><span class="n">chris</span><span class="o">-</span><span class="n">lea</span><span class="o">/</span><span class="n">libsodium</span><span class="w">
</span><span class="n">sudo</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="n">update</span><span class="w">
</span><span class="n">sudo</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="n">libsodium</span><span class="o">-</span><span class="n">dev</span></code></pre></figure>
<p>On CentOS/RHEL you need to <a href="https://fedoraproject.org/wiki/EPEL/FAQ#How_can_I_install_the_packages_from_the_EPEL_software_repository.3F">activate EPEL</a> before installing <code class="language-plaintext highlighter-rouge">libsodium-devel</code>.</p>
Curl 0.9.2: tweaks and proxies for windows2015-08-10T00:00:00+00:00https://www.opencpu.org/posts/curl-release-0-9-2
<a href="https://www.opencpu.org/posts/curl-release-0-9-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/curllogo.jpg"></a>
<p>Version 0.9.2 of <a href="https://cran.r-project.org/package=curl">curl</a> has been released to CRAN. The curl package implements a modern and flexible web client for R and is the foundation for the popular <a href="https://cran.r-project.org/package=httr">httr</a> package. This update includes mostly tweaks for Windows.</p>
<h3 id="faster-downloading">Faster downloading</h3>
<p>Alex Deng from Microsoft had diagnosed a problem with <code class="language-plaintext highlighter-rouge">curl_fetch_memory</code> (which is used by httr) being slower than expected on Windows. After some testing it turned out that the implemenation of <code class="language-plaintext highlighter-rouge">realloc</code> (to grow the buffer that holds downloaded data) is <a href="https://blog.kowalczyk.info/article/2be/realloc-on-Windows-vs-Linux.html">poorly optimized</a> on Windows. It basically copies the entire memory block every time the size is increased, which results in a lot of copying for large downloads.</p>
<p>The new release includes a <a href="https://github.com/cran/curl/blob/0.9.2/src/utils.c#L108">tweak</a> to increase the buffer size exponentially, which solves the problem. This fix is wrapped in an <code class="language-plaintext highlighter-rouge">#ifdef _WIN32</code> because usually the operating system does a better job in optimizing memory allocation than the programmer. But Windows needs a little help sometimes.</p>
<h3 id="updated-libcurl">Updated libcurl</h3>
<p>This release uses the latest build of <a href="https://github.com/rwinlib/libcurl">libcurl</a> and its dependencies from the <a href="https://github.com/rwinlib">rwinlib</a> repository. These include:</p>
<ul>
<li>libcurl 7.43.0</li>
<li>openssl 1.0.2d</li>
<li>libssh2 1.6.0</li>
<li>libiconv 1.14-5</li>
<li>libidn 1.31-1</li>
</ul>
<p>The libcurl <a href="http://curl.haxx.se/changes.html">changelog</a> lists the new features and bug fixes from this release.</p>
<h3 id="working-with-proxies">Working with proxies</h3>
<p>The new version includes two functions specifically for Windows to lookup system proxy settings. This can be used to configure curl to use the same proxy server, which is required to connect to the internet on some networks.</p>
<p>The <code class="language-plaintext highlighter-rouge">ie_proxy_info</code> function looks up your current proxy settings as configured in Internet Explorer. In the case of a dynamic proxy, the <code class="language-plaintext highlighter-rouge">ie_get_proxy_for_url</code> function shows if and which proxy should be used to connect to a particular URL. If your have an “automatic configuration script” this involves downloading and executing a <a href="https://en.wikipedia.org/wiki/Proxy_auto-config">PAC file</a>.</p>
<p>You <em>should</em> be able to use address returned by <code class="language-plaintext highlighter-rouge">ie_get_proxy_for_url</code> as the <a href="http://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html"><code class="language-plaintext highlighter-rouge">proxy</code> option</a> in the curl handle to automatically use the correct proxy server for a given URL. However I do not have access to a network with a proxy server so I cannot actually test this feature. If you are on such a network, please help testing this feature.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">curl_proxy</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">){</span><span class="w">
</span><span class="n">proxy</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ie_get_proxy_for_url</span><span class="p">(</span><span class="n">url</span><span class="p">)</span><span class="w">
</span><span class="n">h</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">new_handle</span><span class="p">(</span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">verbose</span><span class="p">,</span><span class="w"> </span><span class="n">proxy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">proxy</span><span class="p">)</span><span class="w">
</span><span class="n">curl</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">handle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">h</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">con</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">curl_proxy</span><span class="p">(</span><span class="s2">"https://httpbin.org/get"</span><span class="p">)</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">con</span><span class="p">)</span></code></pre></figure>
<p>I also created a <a href="https://gist.github.com/jeroenooms/1250e73f93acfffb0e9a">gist</a> with some more details to test this feature. If it doesn’t work immediately, try fiddling around with some of the other libcurl <a href="http://curl.haxx.se/libcurl/c/curl_easy_setopt.html">proxy options</a> and let me know what works!</p>
Mongolite 0.5: authentication and iterators2015-07-29T00:00:00+00:00https://www.opencpu.org/posts/mongolite-release-0-5
<a href="https://www.opencpu.org/posts/mongolite-release-0-5"><img alt="opencpu logo" src="https://www.opencpu.org/images/mongo.png"></a>
<p>A new version of the <a href="http://cran.r-project.org/web/packages/mongolite/index.html">mongolite</a> package has appeared on CRAN. Mongolite builds on <a href="http://cran.r-project.org/web/packages/jsonlite/index.html">jsonlite</a> to provide a simple, high-performance MongoDB client for R, which makes storing small or large data in a database as easy as converting it to/from JSON. Have a look at the <a href="http://cran.r-project.org/web/packages/mongolite/vignettes/intro.html">vignette</a> or <a href="http://bit.ly/mongo-slides">useR2015 slides</a> to get started with inserting, json queries, aggregation and map-reduce.</p>
<h3 id="authentication-and-mongolabs">Authentication and mongolabs</h3>
<p>This release fixes an issue with the authentication mechanism that was reported by Dean Attali. The new version should properly authenticate to secured mongodb servers.</p>
<p>Try running the code below to grab some flights data from my mongolabs server:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># load the package</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">mongolite</span><span class="p">)</span><span class="w">
</span><span class="n">stopifnot</span><span class="p">(</span><span class="n">packageVersion</span><span class="p">(</span><span class="s2">"mongolite"</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s2">"0.5"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Connect to the 'flights' dataset</span><span class="w">
</span><span class="n">flights</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mongo</span><span class="p">(</span><span class="s2">"flights"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"mongodb://readonly:test@ds043942.mongolab.com:43942/jeroen_test"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Count data for query</span><span class="w">
</span><span class="n">flights</span><span class="o">$</span><span class="n">count</span><span class="p">(</span><span class="s1">'{"day":1,"month":1}'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Get data for query</span><span class="w">
</span><span class="n">jan1_flights</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">flights</span><span class="o">$</span><span class="n">find</span><span class="p">(</span><span class="s1">'{"day":1,"month":1}'</span><span class="p">)</span></code></pre></figure>
<p>While debugging this, I found that <a href="https://mongolab.com/">mongolab</a> is actually very cool. You can sign up for a your own free (up to 500MB) mongodb server and easily create data collections with one or more read-only and/or read-write user accounts. This provides a pretty neat way to publish some data (read-only) or sync and collaborate with colleagues (read-write).</p>
<h3 id="iterators">Iterators</h3>
<p>Another feature request from some early adopters was to add support for iterators. Usually you want to use the <code class="language-plaintext highlighter-rouge">mongo$find()</code> method which automatically converts data from a query into a dataframe. However sometimes you need finer control over the individual documents.</p>
<p>The new version adds a <code class="language-plaintext highlighter-rouge">mongo$iterate()</code> method to manually iteratate over the individual records from a query without any automatic simplification. Using the same example query as above:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Connect to the 'flights' dataset</span><span class="w">
</span><span class="n">flights</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mongo</span><span class="p">(</span><span class="s2">"flights"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"mongodb://readonly:test@ds043942.mongolab.com:43942/jeroen_test"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Create iterator</span><span class="w">
</span><span class="n">iter</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">flights</span><span class="o">$</span><span class="n">iterate</span><span class="p">(</span><span class="s1">'{"day":1,"month":1}'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Iterate over individual records</span><span class="w">
</span><span class="k">while</span><span class="p">(</span><span class="o">!</span><span class="nf">is.null</span><span class="p">(</span><span class="n">doc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">iter</span><span class="o">$</span><span class="n">one</span><span class="p">())){</span><span class="w">
</span><span class="c1"># do something with the row here</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">doc</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Currently the iterator has 3 methods: <code class="language-plaintext highlighter-rouge">one()</code>, <code class="language-plaintext highlighter-rouge">batch(n = 1000)</code> and <code class="language-plaintext highlighter-rouge">page(n = 1000)</code>. The <code class="language-plaintext highlighter-rouge">iter$one</code> method will pop one document from iterator (it would be called <code class="language-plaintext highlighter-rouge">iter$next()</code> if that was not a reserved keyword in R). Both <code class="language-plaintext highlighter-rouge">iter$batch(n)</code> and <code class="language-plaintext highlighter-rouge">iter$page(n)</code> pop n documents at once. The difference is that <code class="language-plaintext highlighter-rouge">iter$batch</code> returns a list of at most length n whereas <code class="language-plaintext highlighter-rouge">iter$page</code> returns a data frame with at most n rows.</p>
<p>Once the iterator is exhausted, its methods will only return <code class="language-plaintext highlighter-rouge">NULL</code>.</p>
OpenCPU release 1.52015-07-05T00:00:00+00:00https://www.opencpu.org/posts/opencpu-1-5
<a href="https://www.opencpu.org/posts/opencpu-1-5"><img alt="opencpu logo" src="https://www.opencpu.org/images/stockplot.png"></a>
<p>Following a few weeks of testing, OpenCPU 1.5 has been released. OpenCPU is a production-ready framework for embedded statistical computing with R. The system provides a neat <a href="https://www.opencpu.org/api.html">API</a> for remotely calling R functions over HTTP via e.g. JSON or <a href="https://gist.github.com/jeroenooms/1984c784a6eff71f508f">Protocol Buffers</a>. The OpenCPU server implementation is very stable and has been thorougly tested. It runs on all major Linux distributions and plays nicely with the RStudio server IDE (<a href="https://youtu.be/kAfVWxiZ-Cc?t=847">demo</a>).</p>
<p>Similarly to shiny, OpenCPU has a single-user/development edition that runs within the interactive R session, and a multi-user (cloud) server for deployments on Linux. Unlinke shiny however, the cloud server comes at no extra cost. On the contrary: you are encouraged to take advantage of the cloud server which is much faster and includes cool features like user libraries, concurrent sessions, continuous integration, customizable security policies, etc.</p>
<h3 id="new-in-opencpu-15">New in OpenCPU 1.5</h3>
<p>The OpenCPU API itself has not changed from the 1.4 branch, but the entire underlying stack has been upgraded, hence the version bump. The server now builds on:</p>
<ul>
<li>R 3.2.1</li>
<li>stringi 0.5-5</li>
<li>jsonlite 0.9.16</li>
<li>devtools 1.8.0</li>
<li>RStudio 0.99 (optional)</li>
</ul>
<p>Navigate to <a href="https://cloud.opencpu.org/ocpu/info"><code class="language-plaintext highlighter-rouge">/ocpu/info</code></a> on your OpenCPU server to inspect the exact versions of all packages used by the system.</p>
<p>In addition to an upgraded package library, this version includes many small tweaks for the deb/rpm installation packages and docker files. Redhat distributions like Fedora and CentOS are now automatically configured with the required SELinux policies.</p>
<h3 id="installation-and-upgrading">Installation and upgrading</h3>
<p>The <a href="https://www.opencpu.org/download.html">download</a> page has instructions for installing the opencpu server on various distributions, either from source or using precompiled binaries. To upgrade an existing installation of opencpu on ubuntu, simply run:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>add-apt-repository ppa:opencpu/opencpu-1.5
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get dist-upgrade</code></pre></figure>
<p>Note that this will also upgrade the version of R to 3.2.1 (if you have not already done so) which might require that you reinstall some of your R packages.</p>
<h3 id="getting-started">Getting started</h3>
<p>For those completely new to OpenCPU there several resources to get started. The <a href="https://youtu.be/kAfVWxiZ-Cc">presentation</a> from last year’s useR conference gives a broad overview of the system including some basic demo’s. The <a href="https://www.opencpu.org/apps.html">example apps</a> and <a href="http://jsfiddle.net/user/opencpu/fiddles/">jsfiddle scripts</a> show how to use the <a href="https://www.opencpu.org/jslib.html">opencpu.js</a> JavaScript client. The <a href="http://opencpu.github.io/server-manual/opencpu-server.pdf">server manual</a> has contains documentation on configuring your opencpu cloud server (although installation should work out of the box).</p>
<p>Finally this <a href="http://arxiv.org/abs/1406.4806">paper</a> from my thesis describes more generally the challenges of embedded scientific computing, and the benefits (both technical and human) of decoupling your statistical computing from your front-end or application layer.</p>
<h3 id="the-public-demo-server">The public demo server</h3>
<p>To deploy your OpenCPU apps on the public server, simply push your R package to Github and configure the <a href="https://www.opencpu.org/api.html#api-ci">webhook</a> in your repository. Whenever you push an update to Github the package will be reinstalled on the server and can directly be used remotely by anyone on the internet. You can either use the full url or the <code class="language-plaintext highlighter-rouge">ocpu.io</code> shorthand url:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">https://cloud.opencpu.org/ocpu/github/{username}/{package}/</code></li>
<li><code class="language-plaintext highlighter-rouge">https://{username}.ocpu.io/{package}/</code></li>
</ul>
<p>These urls are fully equivalent. Simply replace <code class="language-plaintext highlighter-rouge">{username}</code> with your github username, and <code class="language-plaintext highlighter-rouge">{package}</code> with your package name. Note that the package name must be identical to the github repository name (as is usually the case).</p>
<h3 id="on-writing-packages">On writing packages</h3>
<p>One prerequisite for using OpenCPU is knowing how to create an R package. There is no way around this; packages are the natural container format for shipping and deploying code/data/manuals in R, and the OpenCPU API assumes this format. Luckily, writing R packages is super easy these days and can be done in less than (<a href="https://youtu.be/kAfVWxiZ-Cc?t=847">10 seconds</a>) using for example RStudio.</p>
<p>The good thing is that once you passed this little hurdle, the full power and flexibility of R and it’s packaging become available to your applications and APIs. Hadley’s latest <a href="http://r-pkgs.had.co.nz/">book</a> on writing R packages gives a nice overview of the R packaging system, and the OpenCPU API provides an easy HTTP interface to all of these features.</p>
Secure password hashing in R with bcrypt2015-06-19T00:00:00+00:00https://www.opencpu.org/posts/bcrypt-release
<a href="https://www.opencpu.org/posts/bcrypt-release"><img alt="opencpu logo" src="https://www.opencpu.org/images/openbsd.gif"></a>
<p>The new package <a href="http://cran.r-project.org/web/packages/bcrypt/">bcrypt</a> provides an R interface to the OpenBSD ‘blowfish’ password hashing algorithm described in <a href="http://www.openbsd.org/papers/bcrypt-paper.pdf"><em>A Future-Adaptable Password Scheme</em></a> by <a href="http://research.google.com/pubs/author1.html">Niels Provos</a>. The implementation is derived from the <a href="https://pypi.python.org/pypi/py-bcrypt/">py-bcrypt</a> module for Python which is a wrapper for the OpenBSD implementation.</p>
<p>Bcrypt is used for secure password hashing. The main difference with regular digest algorithms such as md5 / sha256 is that the bcrypt algorithm is specifically designed to be cpu intensive in order to protect against brute force attacks. This means that hasing with bcrypt is terribly slow, which is a feature. The complexity of the algorithm is configurable via the <code class="language-plaintext highlighter-rouge">log_rounds</code> parameter.</p>
<p>The API from the R package is exactly the same as <a href="http://www.mindrot.org/projects/py-bcrypt/">the one from python</a>: the <code class="language-plaintext highlighter-rouge">hashpw</code> function calculates a hash from a password using a random salt. Validating the hash is done by reshashing the password using the hash as a salt.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Secret message as a string</span><span class="w">
</span><span class="n">passwd</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"supersecret"</span><span class="w">
</span><span class="c1"># Create the hash</span><span class="w">
</span><span class="n">hash</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hashpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">)</span><span class="w">
</span><span class="n">hash</span><span class="w">
</span><span class="c1">## [1] "$2a$12$1G8N3Xnp11oHt0RJf7SCMeWib7DpEOgpE5lXwjE2BATHJqFFxci6u"</span><span class="w">
</span><span class="c1"># To validate the hash</span><span class="w">
</span><span class="n">identical</span><span class="p">(</span><span class="n">hash</span><span class="p">,</span><span class="w"> </span><span class="n">hashpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">hash</span><span class="p">))</span><span class="w">
</span><span class="c1">## TRUE</span><span class="w">
</span><span class="c1"># Wrapper that does the same</span><span class="w">
</span><span class="n">checkpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">hash</span><span class="p">)</span><span class="w">
</span><span class="c1">## TRUE</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">gensalt</code> function generates a salt for use with <code class="language-plaintext highlighter-rouge">hashpw</code> and specifies the complexity of the algorithm via the <code class="language-plaintext highlighter-rouge">log_rounds</code> parameter. The first few characters in the salt string hold the bcrypt version and value for log_rounds. The remainder stores 16 bytes of base64 encoded randomness for seeding the hashing algorithm.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Use varying complexity:</span><span class="w">
</span><span class="n">hash11</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hashpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">gensalt</span><span class="p">(</span><span class="m">11</span><span class="p">))</span><span class="w">
</span><span class="n">hash12</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hashpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">gensalt</span><span class="p">(</span><span class="m">12</span><span class="p">))</span><span class="w">
</span><span class="n">hash13</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hashpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">gensalt</span><span class="p">(</span><span class="m">13</span><span class="p">))</span><span class="w">
</span><span class="c1"># Takes longer to verify (or crack)</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">checkpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">hash11</span><span class="p">))</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.155 0.000 0.156 </span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">checkpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">hash12</span><span class="p">))</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.312 0.000 0.312 </span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">checkpw</span><span class="p">(</span><span class="n">passwd</span><span class="p">,</span><span class="w"> </span><span class="n">hash13</span><span class="p">))</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.640 0.002 0.642</span></code></pre></figure>
HTTPS for CRAN: how and why2015-06-14T00:00:00+00:00https://www.opencpu.org/posts/cran-https
<p><strong>Correction (June 18):</strong> <em>An earlier version of this post stated that currently no CRAN mirrors support https. Martin has pointed out that this is incorrect. As of writing, 7 of the official CRAN mirrors already have full https support.</em></p>
<a href="https://www.opencpu.org/posts/cran-https"><img alt="opencpu logo" src="https://www.opencpu.org/images/securitycat.jpg"></a>
<p>R gained some basic support for https in version 3.2.0 (see <a href="http://cran.r-project.org/doc/manuals/r-release/NEWS.html">NEWS</a>) via the <code class="language-plaintext highlighter-rouge">method = "libcurl"</code> argument in base functions <code class="language-plaintext highlighter-rouge">download.file</code> and <code class="language-plaintext highlighter-rouge">url</code>. The global option <code class="language-plaintext highlighter-rouge">download.file.method</code> is used to make this the default.</p>
<p>Unfortunately the implementation has a few limitations: there is no way to set request options (authentication, proxy, headers, TLS options, etc) and the functions do not expose an http status code or response headers. Because they also do not raise an error when the request fails with an http error (as do the other download methods), this leaves you to guess if the retrieved content is what you were expecting or an error page.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Raises an error</span><span class="w">
</span><span class="n">download.file</span><span class="p">(</span><span class="s2">"http://httpbin.org/status/418"</span><span class="p">,</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(),</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"internal"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Does not raise an error</span><span class="w">
</span><span class="n">download.file</span><span class="p">(</span><span class="s2">"http://httpbin.org/status/418"</span><span class="p">,</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(),</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"libcurl"</span><span class="p">)</span><span class="w">
</span><span class="c1"># What it should do</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">curl</span><span class="p">)</span><span class="w">
</span><span class="n">curl_download</span><span class="p">(</span><span class="s2">"http://httpbin.org/status/418"</span><span class="p">,</span><span class="w"> </span><span class="n">tempfile</span><span class="p">())</span></code></pre></figure>
<p>Anyway it is good enough for downloading static files from public servers, which is all we need for now.</p>
<h3 id="cran-and-libcurl">CRAN and libcurl</h3>
<p>Because <code class="language-plaintext highlighter-rouge">install.packages</code> and friends wrap around <code class="language-plaintext highlighter-rouge">download.file</code>, we can use this new feature to download R packages from CRAN via https. <del>None of the currently available CRAN servers seems to support https, so</del> I created a demo server at <a href="https://cran.opencpu.org">https://cran.opencpu.org</a>. This is not a real mirror, it is just a https proxy to the <a href="http://cran.us.r-project.org/">US mirror</a>. See below for a list of other CRAN servers that support https.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Install a package over https</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"ggplot2"</span><span class="p">,</span><span class="w"> </span><span class="n">repos</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://cran.opencpu.org"</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"libcurl"</span><span class="p">)</span></code></pre></figure>
<p>Use a script like this to opt-in globally on machines where libcurl is available:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Enable CRAN https everywhere</span><span class="w">
</span><span class="k">if</span><span class="p">(</span><span class="n">capabilities</span><span class="p">(</span><span class="s2">"libcurl"</span><span class="p">)){</span><span class="w">
</span><span class="n">options</span><span class="p">(</span><span class="n">repos</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://cran.opencpu.org"</span><span class="p">,</span><span class="w"> </span><span class="n">download.file.method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"libcurl"</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Hopefully the admins in Vienna will at some point enable https for the main <a href="https://cran.r-project.org/">cran server</a> in the same way they have done for <a href="https://r-forge.r-project.org/">r-forge</a> (which is literally the neighborhing ip address).</p>
<h3 id="update-cran-servers-with-https">Update: CRAN servers with https</h3>
<p>As Martin has pointed out in his comment, some CRAN mirrors do already support https without advertising it. Below a script that tests each available server from the mirror list for https:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Script to list CRAN servers with https</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">curl</span><span class="p">)</span><span class="w">
</span><span class="n">h</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">new_handle</span><span class="p">(</span><span class="n">timeout_ms</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">30000</span><span class="p">,</span><span class="w"> </span><span class="n">connecttimeout_ms</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5000</span><span class="p">)</span><span class="w">
</span><span class="n">mirrors</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read.csv</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://svn.r-project.org/R/trunk/doc/CRAN_mirrors.csv"</span><span class="p">))</span><span class="w">
</span><span class="n">mirrors</span><span class="o">$</span><span class="n">SSL</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">vapply</span><span class="p">(</span><span class="n">mirrors</span><span class="o">$</span><span class="n">URL</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">url</span><span class="p">){</span><span class="w">
</span><span class="n">https_url</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">sub</span><span class="p">(</span><span class="s2">"^http://"</span><span class="p">,</span><span class="w"> </span><span class="s2">"https://"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">),</span><span class="w"> </span><span class="s2">"src/contrib/PACKAGES"</span><span class="p">)</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="s2">"Trying"</span><span class="p">,</span><span class="w"> </span><span class="n">https_url</span><span class="p">,</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">)</span><span class="w">
</span><span class="n">identical</span><span class="p">(</span><span class="m">200L</span><span class="p">,</span><span class="w"> </span><span class="n">try</span><span class="p">(</span><span class="n">curl_fetch_memory</span><span class="p">(</span><span class="n">https_url</span><span class="p">,</span><span class="w"> </span><span class="n">handle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">h</span><span class="p">)</span><span class="o">$</span><span class="n">status</span><span class="p">))</span><span class="w">
</span><span class="p">},</span><span class="w"> </span><span class="n">logical</span><span class="p">(</span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="n">subset</span><span class="p">(</span><span class="n">mirrors</span><span class="p">,</span><span class="w"> </span><span class="n">SSL</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Name"</span><span class="p">,</span><span class="s2">"URL"</span><span class="p">))</span></code></pre></figure>
<p>It turns out that there are currently 7 servers that have properly setup https:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Name URL
22 China (Beijing 4) https://mirrors.tuna.tsinghua.edu.cn/CRAN/
23 China (Hefei) https://mirrors.ustc.edu.cn/CRAN/
26 Colombia (Cali) https://www.icesi.edu.co/CRAN/
74 Switzerland https://stat.ethz.ch/CRAN/
79 UK (Bristol) https://www.stats.bris.ac.uk/R/
89 USA (KS) https://rweb.quant.ku.edu/cran/
99 USA (TN) https://mirrors.nics.utk.edu/cran/
</code></pre></div></div>
<p>Hopefully more will follow soon.</p>
<h3 id="why-cran-and-https">Why CRAN and https?</h3>
<p>Using https can stop some, but not all, MITM attacks. Encrypting the connection with the CRAN server prevents intermediate parties such as your ISP, (anti)virus, or any other user on your network from snooping or tampering with the connection. When it comes to CRAN, security is probably more of a concern than privacy, especially when using public networks on e.g. airports, coffee shops or campuses. It is easy for hackers or viruses to hijack wifi connections and inject malicious code or executables into unencrypted traffic. Using https guarantees that at least the connection between you and your CRAN mirror is secure.</p>
<p>Of course this does not fully guarantee the integrity of your download. You are basically putting your faith in the hands of your CRAN mirror (or the owner of the domain to be more specific). If the mirror server gets hacked, or somebody manages to tamper with the mirroring process itself (which is done using rsync without any encryption) packages can still get infected.</p>
<p>Linux distributions solve this problem by making package authors sign the checksum of the package with a private key. This signature is used to automatically verify the integrity of a download from the author’s public key before installation, regardless of how the package was obtained. Simon has implemented some of this for R in <a href="https://github.com/s-u/PKI">PKI</a> but unfortunately this was never adopted by CRAN. But at least with https we can somewhat safely install R packages from within a coffee shop now, which solves the most urgent problem.</p>
The curl package: a modern R interface to libcurl2015-06-09T00:00:00+00:00https://www.opencpu.org/posts/curl-release-0-8
<p><strong>TL;DR:</strong> <em>Check out the <a href="http://cran.r-project.org/web/packages/curl/vignettes/intro.html">vignette</a> or the <a href="https://github.com/hadley/httr#installation">development version</a> of httr.</em></p>
<a href="https://www.opencpu.org/posts/curl-release-0-8"><img alt="opencpu logo" src="https://www.opencpu.org/images/curllogo.jpg"></a>
<p>The package I put most time and effort in this year is <a href="http://cran.r-project.org/web/packages/curl/vignettes/intro.html">curl</a>. Last week version 0.8 was published on CRAN which fixes the last outstanding <a href="https://github.com/jeroenooms/curl/commit/80e0f72d248a1a812af2fe0f5adec772c9e18c0a">bug</a> for Solaris. The package is pretty much done at this point: stable, well tested, and does everything it needs to; nothing more nothing less…</p>
<p>From the description:</p>
<blockquote>
<p>The curl() and curl_download() functions provide highly configurable drop-in replacements for base url() and download.file() with better performance, support for encryption (https://, ftps://), ‘gzip’ compression, authentication, and other ‘libcurl’ goodies. The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces.</p>
</blockquote>
<p>The initial <a href="https://www.opencpu.org/posts/curl-release-0-2/">motivation</a> of the package was to implement a <a href="http://stackoverflow.com/questions/30445875/what-exactly-is-a-connection-in-r/30446224#30446224">connnection interface</a> with SSL (https) support, something R has always been lacking (see also <a href="https://www.opencpu.org/posts/jsonlite-streaming/">json streaming</a>). But since then the package has matured into a full featured HTTP client. By now it has become exactly what I promised it would not be: a complete replacement of RCurl.</p>
<h3 id="what-about-rcurl">What about RCurl?</h3>
<p>Good question. The <a href="http://www.omegahat.org/RCurl/">RCurl</a> package by all-star R-core member Duncan Temple-Lang is one of the most widely used R packages. The first CRAN release was about 11 years ago and it has since then been the standard networking client for R. The <a href="http://www.omegahat.org/RCurl/RCurlJSS.pdf">paper</a> shows that Duncan was (as with most of his work) ahead of his time, describing tools and technology that are now part of the standard data-science workflow.</p>
<p>The RCurl package was also the basis of Hadley’s popular <a href="https://github.com/hadley/httr">httr</a> package, which started to reveal some shortcomings, including memory leaks, build problems, performance regressions and <a href="http://recology.info/2014/12/multi-handle/">mysterious errors</a>. Now a bug or two we can fix, but from the RCurl <a href="https://github.com/omegahat/RCurl/blob/master/src/curl.c">source</a> code it becomes obvious that a lot has changed over the past 10 years. Both R and libcurl have matured a lot, and the internet has largely converged to (REST style) HTTP and SSL, with other protocols slowly being phased out. Also Duncan is a busy guy and seems to have largely moved on to other projects. And so we are going to need a rewrite from scratch…</p>
<p>The curl package is inspired by the good parts of RCurl but with an implementation that takes advantage of modern features in R such as the connection interface and external pointers with proper finalizers. This allows for a much simpler interface to libcurl that has better performance, supports streaming, and handles that automatically clean up after themselves. Moreover curl is deliberately very minimal and only contains the essential foundations for interacting with libcurl. High-level logic and utilities can be provided by other packages that build on curl, such as httr. The result is a small, clean and powerful package that takes 2 seconds to compile and will hopefully prove to be reliable and low maintenance.</p>
<h3 id="getting-started-with-curl-and-httr">Getting started with curl and httr</h3>
<p>The best introduction to the curl package is the <a href="http://cran.r-project.org/web/packages/curl/vignettes/intro.html">vignette</a> which has some nice examples to get you started. Moreover the <a href="http://github.com/hadley/httr">development version of httr</a> has already been migrated from RCurl to curl. To install using devtools:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"hadley/httr"</span><span class="p">)</span></code></pre></figure>
<p>Note that devtools itself depends on httr so you might need to restart R after updating httr. If you are seeing some <code class="language-plaintext highlighter-rouge">ERROR: loading failed</code> error (especially on Windows) just restart R and try again.</p>
New package commonmark: yet another markdown parser?2015-06-03T00:00:00+00:00https://www.opencpu.org/posts/commonmark-release-0-4
<a href="https://www.opencpu.org/posts/commonmark-release-0-4"><img alt="opencpu logo" src="https://www.opencpu.org/images/markdown-everywhere.jpg"></a>
<p>Last week the <a href="http://cran.r-project.org/web/packages/commonmark/index.html">commonmark</a> package was released on CRAN. The package implements some very thin R bindings to John Macfarlane’s (author of pandoc) <code class="language-plaintext highlighter-rouge">cmark</code> library. From the cmark <a href="https://github.com/jgm/cmark#readme">readme</a>:</p>
<blockquote>
<p>cmark is the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. It provides a shared library (libcmark) with functions for parsing CommonMark documents to an abstract syntax tree (AST), manipulating the AST, and rendering the document to HTML, groff man, CommonMark, or an XML representation of the AST.</p>
</blockquote>
<p>Each of the R wrapping functions parses markdown and renders it to one of the output formats:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">md</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"
## Test
My list:
- foo
- bar"</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">markdown_html</code> function converts markdown to HTML:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">commonmark</span><span class="p">)</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">markdown_html</span><span class="p">(</span><span class="n">md</span><span class="p">))</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-html" data-lang="html"><span class="nt"><h2></span>Test<span class="nt"></h2></span>
<span class="nt"><p></span>My list:<span class="nt"></p></span>
<span class="nt"><ul></span>
<span class="nt"><li></span>foo<span class="nt"></li></span>
<span class="nt"><li></span>bar<span class="nt"></li></span>
<span class="nt"></ul></span></code></pre></figure>
<p>Obviously the dynamic content rendered from markdown is not a full HTML document in itself. To create a full HTML page you would insert one or more of these snippets in an HTML template with static header and footer content and possibly some css/js to make the page more exciting.</p>
<p>The <code class="language-plaintext highlighter-rouge">markdown_xml</code> function gives the parse tree in xml format:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">cat</span><span class="p">(</span><span class="n">markdown_xml</span><span class="p">(</span><span class="n">md</span><span class="p">))</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="cp"><?xml version="1.0" encoding="UTF-8"?></span>
<span class="cp"><!DOCTYPE CommonMark SYSTEM "CommonMark.dtd"></span>
<span class="nt"><document></span>
<span class="nt"><header</span> <span class="na">level=</span><span class="s">"2"</span><span class="nt">></span>
<span class="nt"><text></span>Test<span class="nt"></text></span>
<span class="nt"></header></span>
<span class="nt"><paragraph></span>
<span class="nt"><text></span>My list:<span class="nt"></text></span>
<span class="nt"></paragraph></span>
<span class="nt"><list</span> <span class="na">type=</span><span class="s">"bullet"</span> <span class="na">tight=</span><span class="s">"true"</span><span class="nt">></span>
<span class="nt"><item></span>
<span class="nt"><paragraph></span>
<span class="nt"><text></span>foo<span class="nt"></text></span>
<span class="nt"></paragraph></span>
<span class="nt"></item></span>
<span class="nt"><item></span>
<span class="nt"><paragraph></span>
<span class="nt"><text></span>bar<span class="nt"></text></span>
<span class="nt"></paragraph></span>
<span class="nt"></item></span>
<span class="nt"></list></span>
<span class="nt"></document></span></code></pre></figure>
<p>Most of the value in commonmark and is probably in the latter. There already exist a few nice markdown converters for R including the popular <a href="http://rmarkdown.rstudio.com/">rmarkdown</a> package, which uses pandoc to convert markdown to several presentation formats.</p>
<p>The formal commonmark spec makes markdown suitable for more strict documentation purposes, where we might currently be inclined to use json or xml. For example we could use it to parse <code class="language-plaintext highlighter-rouge">NEWS.md</code> files from R packages in a way that allows for archiving and indexing individual news items, without ambiguity over indentation rules and such.</p>
Getting started with MongoDB in R2015-05-15T00:00:00+00:00https://www.opencpu.org/posts/mongolite-release-0-3
<a href="https://www.opencpu.org/posts/mongolite-release-0-3"><img alt="opencpu logo" src="https://www.opencpu.org/images/nosql.jpg"></a>
<p>The first stable version of the new <a href="http://cran.r-project.org/web/packages/mongolite/index.html">mongolite</a> package has appeared on CRAN. Mongolite builds on <a href="http://cran.r-project.org/web/packages/jsonlite/index.html">jsonlite</a> to provide a simple, high-performance MongoDB client for R, which makes storing and accessing small or large data as easy as converting it to/from JSON. The <a href="http://cran.r-project.org/web/packages/mongolite/vignettes/intro.html">package vignette</a> has some examples to get you started with inserting, json queries, aggregation and map-reduce. MongoDB itself is open source and installation is easy (e.g. <code class="language-plaintext highlighter-rouge">brew install mongodb</code>).</p>
<p>If you use, or (think) you might want to use MongoDB with R, please <a href="https://github.com/jeroenooms/mongolite/issues/5">get in touch</a>. I am interested to hear your about your problems and use cases to make this package fit everyones needs. I will also be <a href="https://www.opencpu.org/posts/jsonlite-and-mongolite/">presenting</a> this and related work at UseR 2015 and the annual French R Meeting.</p>
Upcoming talks about jsonlite and mongolite2015-05-01T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-and-mongolite
<a href="https://www.opencpu.org/posts/jsonlite-and-mongolite"><img alt="opencpu logo" src="https://www.opencpu.org/images/useR-large.png"></a>
<p>This summer I will be giving an invited talk at the annual <a href="http://r2015-grenoble.sciencesconf.org/resource/page/id/1">French R Meeting</a> in Grenoble as well as a shorter talk at <a href="http://user2015.math.aau.dk/">UseR 2015</a> in Aalborg. The presentations will feature some recent R packages in the json/web space (<a href="http://cran.r-project.org/web/packages/curl/index.html">curl</a>, <a href="http://cran.r-project.org/web/packages/jsonlite/index.html">jsonlite</a>, <a href="http://cran.r-project.org/web/packages/mongolite/index.html">mongolite</a>, <a href="http://cran.r-project.org/web/packages/V8/index.html">V8</a>), and show how these tools can be combined for building interoperable data pipelines with R.</p>
<p>Below the official abstract.</p>
<h2 id="abstract-jsonlite-and-mongolite">Abstract: jsonlite and mongolite</h2>
<blockquote>
<p>The jsonlite package provides a powerful JSON parser and generator that has become one of standard methods for getting data in and out of R. We discuss some recent additions to the package, in particular support streaming (large) data over http(s) connections. We then introduce the new mongolite package: a high-performance MongoDB client based on jsonlite. MongoDB (from “humongous”) is a popular open-source document database for storing and manipulating very big JSON structures. It includes a JSON query language and an embedded V8 engine for in-database aggregation and map-reduce. We show how mongolite makes inserting and retrieving R data to/from a database as easy as converting it to/from JSON, without the bureaucracy that comes with traditional databases. Users that are already familiar with the JSON format might find MongoDB a great companion to the R language and will enjoy the benefits of using a single format for both serialization and persistency of data.</p>
</blockquote>
JSON serialization now even faster and prettier2015-04-13T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-release-0-9-16
<a href="https://www.opencpu.org/posts/jsonlite-release-0-9-16"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>The <a href="http://cran.rstudio.org/web/packages/jsonlite/index.html">jsonlite</a> package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.16 appeared on CRAN which has a new prettifying system, improved performance and some additional tweaks for the new mongolite package.</p>
<h2 id="improved-performance">Improved Performance</h2>
<p>Everyones favorite feature of jsonlite: performance. We found a way to significanlty speed up <code class="language-plaintext highlighter-rouge">toJSON</code> for data frames for the cases of <code class="language-plaintext highlighter-rouge">dataframe="rows"</code> (the default) or <code class="language-plaintext highlighter-rouge">dataframe="values"</code>. On my macbook I now get these results:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">data</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="o">=</span><span class="s2">"ggplot2"</span><span class="p">)</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rows"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.133 0.003 0.136</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"columns"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.070 0.003 0.072</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"values"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.094 0.005 0.099</span></code></pre></figure>
<p>A somewhat larger dataset:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">data</span><span class="p">(</span><span class="n">flights</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="o">=</span><span class="s2">"nycflights13"</span><span class="p">)</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">flights</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rows"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 1.506 0.072 1.578</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">flights</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"columns"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.585 0.024 0.608</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">flights</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"values"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.873 0.039 0.912</span></code></pre></figure>
<p>That is pretty darn fast for a text based serialization format. By comparison, we easily beat <code class="language-plaintext highlighter-rouge">write.csv</code> which is actually a much more simple output format:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">system.time</span><span class="p">(</span><span class="n">write.csv</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="o">=</span><span class="s2">"/dev/null"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.361 0.003 0.364</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">write.csv</span><span class="p">(</span><span class="n">flights</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="o">=</span><span class="s2">"/dev/null"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 3.284 0.033 3.318</span></code></pre></figure>
<h2 id="pretty-even-prettier">Pretty even prettier</h2>
<p>Yihui has pushed for a new prettifying system that inserts indentation directly in the R code rather than making yajl prettify the entire JSON blob at the end. As a result we can use different indentation rules for different R classes. See the <a href="https://github.com/jeroenooms/jsonlite/pull/85">PR</a> for details. The main differce is that atomic vectors are now prettified without linebreaks:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">foo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">bar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">head</span><span class="p">(</span><span class="n">cars</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">toJSON</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">pretty</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"foo"</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">],</span><span class="w">
</span><span class="s2">"bar"</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"speed"</span><span class="o">:</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="s2">"dist"</span><span class="o">:</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"speed"</span><span class="o">:</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="s2">"dist"</span><span class="o">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">toJSON</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">pretty</span><span class="o">=</span><span class="nb">T</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"col"</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"foo"</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">],</span><span class="w">
</span><span class="s2">"bar"</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"speed"</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">],</span><span class="w">
</span><span class="s2">"dist"</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>This can be helpful for manually inspecting or debugging your JSON data. The <code class="language-plaintext highlighter-rouge">prettify</code> function still uses yajl, so if you prefer this style, simply use <code class="language-plaintext highlighter-rouge">prettify(toJSON(x))</code>.</p>
<h2 id="new-mongolite-package">New mongolite package</h2>
<p>There were some additional internal enhancements to support the new <a href="http://cran.r-project.org/web/packages/mongolite">mongolite</a> package, which will be announced later this month. This package will extend the concepts and power of jsonlite to the in-database JSON documents. Have a look at the <a href="https://github.com/jeroenooms/mongolite#readme">git</a> repository for a sneak preview.</p>
Improved memory usage and RJSONIO compatibility in jsonlite 0.9.152015-03-31T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-release-0-9-15
<a href="https://www.opencpu.org/posts/jsonlite-release-0-9-15"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>The <a href="http://cran.rstudio.org/web/packages/jsonlite/index.html">jsonlite</a> package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. Last week version 0.9.15 appeared on CRAN which improves memory usage and compatibility with other packages.</p>
<h2 id="migrating-to-jsonlite">Migrating to jsonlite</h2>
<p>The upcoming release of <a href="https://github.com/rstudio/shiny">shiny</a> will switch from RJSONIO to jsonlite. To make the transition painless for shiny users, Winston Chang has added some compatibility options to jsonlite that mimic the (legacy) behavior of RJSONIO. The following wrapper results in the same output as <code class="language-plaintext highlighter-rouge">RJSONIO::toJSON</code> for the majority of cases. Hopefully this will make it easier for other package authors to make the transition to jsonlite as well.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># RJSONIO compatibility wrapper</span><span class="w">
</span><span class="n">toJSON_legacy</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">jsonlite</span><span class="o">::</span><span class="n">toJSON</span><span class="p">(</span><span class="n">I</span><span class="p">(</span><span class="n">x</span><span class="p">),</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"columns"</span><span class="p">,</span><span class="w"> </span><span class="n">null</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"null"</span><span class="p">,</span><span class="w"> </span><span class="n">na</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"null"</span><span class="p">,</span><span class="w">
</span><span class="n">auto_unbox</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">use_signif</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">force</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w">
</span><span class="n">rownames</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">keep_vec_names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>However be aware that the RJSONIO defaults can sometimes result in unexpected behavior and odd edge cases (which is why jsonlite was created in the first place). Therefore it is still recommended to switch to the jsonlite defaults when possible (see jsonlite <a href="http://arxiv.org/abs/1403.2805">paper</a> for a discussion on the mapping). One exception is perhaps the <code class="language-plaintext highlighter-rouge">auto_unbox</code> argument, which many people seem to prefer to <code class="language-plaintext highlighter-rouge">TRUE</code> for encoding relatively simple static data structures.</p>
<h2 id="memory-usage">Memory usage</h2>
<p>The new version should use less memory when parsing JSON, especially from a file or URL. This is mostly due to a new push-parser implementation that can incrementally parse JSON in little pieces, which eliminates overhead of copying gigantic JSON strings. In addition, jsonlite now uses the new <a href="http://cran.r-project.org/web/packages/curl/index.html">curl</a> package for retrieving data via a connection interface.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">mydata1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">fromJSON</span><span class="p">(</span><span class="s2">"https://jeroenooms.github.io/data/dmd.json"</span><span class="p">)</span></code></pre></figure>
<p>The call above is results in the same output as the call below, but it should consume less memory, especially for very large json files.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">httr</span><span class="p">)</span><span class="w">
</span><span class="n">req</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">GET</span><span class="p">(</span><span class="s2">"https://jeroenooms.github.io/data/dmd.json"</span><span class="p">)</span><span class="w">
</span><span class="n">mydata2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">content</span><span class="p">(</span><span class="n">req</span><span class="p">,</span><span class="w"> </span><span class="s2">"text"</span><span class="p">))</span></code></pre></figure>
<p>None of this changes anything in the API, these changes are all internal.</p>
OpenCPU server update for R 3.1.32015-03-12T00:00:00+00:00https://www.opencpu.org/posts/opencpu-r-3-1-3
<a href="https://www.opencpu.org/posts/opencpu-r-3-1-3"><img alt="opencpu logo" src="https://www.opencpu.org/images/struisvogel.jpg"></a>
<p>Following the release of R 3.1.3, I have pushed a new build of the OpenCPU server to <a href="https://launchpad.net/~opencpu/+archive/ubuntu/opencpu-1.4">launchpad</a>, <a href="https://registry.hub.docker.com/u/opencpu/base/">dockerhub</a> and <a href="http://software.opensuse.org/download.html?project=home%3Ajeroenooms%3Aopencpu-1.4&package=opencpu">OBS</a>. This update has no changes in OpenCPU itself, but includes updated versions of R, RStudio and R packages used by OpenCPU.</p>
<p>To upgrade your OpenCPU server:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get dist-upgrade
</code></pre></div></div>
<p>If you are running OpenCPU in production and you do not want to receive automatic updates, make sure to remove or comment-out the opencpu repository in <code class="language-plaintext highlighter-rouge">/etc/apt/sources.list.d/opencpu-opencpu-1_4-trusty.list</code> on your server. The <a href="https://launchpad.net/~opencpu/+archive/ubuntu/opencpu-1.4/+packages">opencpu-1.4</a> repo now contains:</p>
<ul>
<li>OpenCPU 1.4.6</li>
<li>R 3.1.3</li>
<li>RStudio Server 0.98.1103</li>
<li>Rcpp 0.11.5</li>
</ul>
<p>To list the versions of other R packages included with the cloud server have a look at the <a href="https://github.com/jeroenooms/opencpu-server/tree/v1.4.6/opencpu-lib">opencpu-lib</a> directory on Github or navigate to <a href="http://cloud.opencpu.org/ocpu/info"><code class="language-plaintext highlighter-rouge">/ocpu/info</code></a> on your opencpu server.</p>
Compiling CoffeeScript in R with the js package2015-02-27T00:00:00+00:00https://www.opencpu.org/posts/js-release-0-2
<a href="https://www.opencpu.org/posts/js-release-0-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/coffeescript.jpg"></a>
<p>A new release of the <a href="http://cran.r-project.org/web/packages/js/">js</a> package has made it’s way to CRAN. This version adds support for compiling Coffee Script. Along with the uglify and jshint tools already in there, the package now provides a very complete suite for compiling, validating, reformatting, optimizing and analyzing JavaScript code in R.</p>
<h2 id="coffee-script">Coffee Script</h2>
<p>According to its website, <a href="http://coffeescript.org/">CoffeeScript</a> is a little language that compiles into JavaScript. It is an attempt to expose the good parts of JavaScript in a simple way. The <code class="language-plaintext highlighter-rouge">coffee_compile</code> function binds to the coffee script compiler. A hello world example from the package <a href="http://cran.r-project.org/web/packages/js/vignettes/intro.html">vignette</a>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Hello world</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">coffee_compile</span><span class="p">(</span><span class="s2">"square = (x) -> x * x"</span><span class="p">))</span></code></pre></figure>
<p>This outputs the following JavaScript code:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(function() {
var square;
square = function(x) {
return x * x;
};
}).call(this);
</code></pre></div></div>
<p>Or to compile without the closure:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Hello world</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">coffee_compile</span><span class="p">(</span><span class="s2">"square = (x) -> x * x"</span><span class="p">,</span><span class="w"> </span><span class="n">bare</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span></code></pre></figure>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>var square;
square = function(x) {
return x * x;
};
</code></pre></div></div>
<p>The package <a href="http://cran.r-project.org/web/packages/js/vignettes/intro.html">vignette</a> includes some more examples.</p>
<h2 id="why-coffee-script">Why coffee script?</h2>
<p>Coffee script is <strong>not</strong> some sort of widget factory or other <em>“use JavaScript without learning JavaScript”</em> tool kit. From the <a href="http://coffeescript.org">website</a>:</p>
<blockquote>
<p>The golden rule of CoffeeScript is: “It’s just JavaScript”. The code compiles one-to-one into the equivalent JS, and there is no interpretation at runtime. You can use any existing JavaScript library seamlessly from CoffeeScript (and vice-versa). The compiled output is readable and pretty-printed, will work in every JavaScript runtime, and tends to run as fast or faster than the equivalent handwritten JavaScript.</p>
</blockquote>
<p>CoffeeScript is popular among web developers for writing JavaScript applications using a syntax that is more readable and less error prone, but without being constrained by some sort of framework. CoffeeScript is often used in conjunction with an HTML templating engine such as jade (see <a href="https://www.opencpu.org/posts/jade-release-0-1/">rjade</a>) and a CSS pre-processor such as <a href="http://lesscss.org/">Less</a> or <a href="http://sass-lang.com/">SASS</a> or <a href="https://learnboost.github.io/stylus/">Stylus</a>.</p>
<p>Together, these tools are helpful in organizing and maintaining a non-trivial web applications. Given the recent mass adoption of HTML/JavaScipt based widgets and visualization in the R community, they can be a valuable addition to the R developer tool kit as well.</p>
RMySQL version 0.10.2: Full SSL Support2015-02-26T00:00:00+00:00https://www.opencpu.org/posts/rmysql-release-0-10-2
<a href="https://www.opencpu.org/posts/rmysql-release-0-10-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/mysql.jpg"></a>
<p>RMySQL version 0.10.2 has appeared on CRAN. This is a maintenance release to streamline the build process on various platforms. Most importantly, the Windows/OSX binary packages from CRAN are now built with full SSL support. On Linux, the configure script has been updated a bit to automatically find the mysql client library.</p>
<p>A big thanks to epoch.com for <a href="http://blog.rstudio.org/2015/02/11/epoch-rmysql/">sponsoring</a> the development of this important package.</p>
<h2 id="how-to-install-rmysql">How to install RMySQL</h2>
<p>RMySQL is a very <a href="http://cran.r-project.org/src/contrib/Archive/RMySQL/">old</a> package, and as a result there is a lot of outdated and incorrect information on the interwebs. Back in the day (up till version 0.9.3) you had to manually install mysql on your machine to make the package work. But since the 0.10 series earlier this year, the package is now entirely self contained. The recommended way to install RMySQL on Windows and OSX is simply:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s2">"RMySQL"</span><span class="p">)</span></code></pre></figure>
<p>On Linux the package still links against the system libmysqlclient. On most deb systems (Debian/Ubuntu) you need to install either <code class="language-plaintext highlighter-rouge">libmysqlclient-dev</code> or <code class="language-plaintext highlighter-rouge">libmariadbclient-dev</code>, and on rpm systems such as Fedora/CentOS/RHEL you need <code class="language-plaintext highlighter-rouge">mariadb-devel</code>. It should also work with less known variants of MySQL such as <a href="https://github.com/rstats-db/RMySQL/issues/38">Percona</a> but this doesn’t get a lot of testing coverage.</p>
<h2 id="using-ssl-with-mysql">Using SSL with MySQL</h2>
<p>MySQL is not always used with SSL because often the client and server run on the same machine, or within a private network. Moreover encryption introduces some performance overhead, which slows down your database connection a bit. But if you are connecting to a MySQL server over the internet, then enabling SSL is probably a good idea if you don’t want everyone to see your data.</p>
<p>Most MySQL servers have been built with SSL support. To configure RMySQL to connect to server over SSL you need to set the certificates in your <code class="language-plaintext highlighter-rouge">~/.my.cnf</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[client]
ssl-ca=c:/ssl_certs/ca-cert.pem
ssl-cert=c:/ssl_certs/client-cert.pem
ssl-key=c:/ssl_certs/client-key.pem
</code></pre></div></div>
<p>I’m not using this myself but <a href="https://github.com/rstats-db/RMySQL/issues/33">others are</a> so I’m taking their word that this works. If you’re experiencing any problems, open an issue on github.</p>
<h2 id="future-development">Future Development</h2>
<p>This is likely the final release of the 0.10 series. We (well mostly Hadley) are working on a full rewrite of the package based on Rcpp. The <a href="https://github.com/rstats-db/RMySQL#readme">readme</a> on Github contains instructions on how to install the latest version from source (it is really easy, even on Windows).</p>
<p>Past experiences have shown that problems in this package are often specific to the operating system and version of mysql. Therefore we really appreciate feedback and testing of the new version. If you use RMySQL, please check out the development version at some point so that we can make sure everything works as expected when it gets released. Report bugs or suggestions on the <a href="https://github.com/rstats-db/RMySQL/issues">github page</a>; please include your OS and RMySQL version.</p>
Jade: a clean, whitespace-sensitive template language for writing HTML2015-02-20T00:00:00+00:00https://www.opencpu.org/posts/jade-release-0-1
<a href="https://www.opencpu.org/posts/jade-release-0-1"><img alt="opencpu logo" src="https://www.opencpu.org/images/jade.png"></a>
<p>Jade is a high performance template engine heavily influenced by Haml. It is designed for writing HTML pages using a concise, modern syntax without the verbosity of old fashioned XML-like tags that we all want to forget about. The new <a href="http://cran.r-project.org/web/packages/rjade/">rjade</a> package implements convenient bindings from R to this popular JavaScript library.</p>
<h2 id="an-example-template">An example template</h2>
<p>Below an example of a Jade template, taken from the <a href="http://jade-lang.com/">jade homepage</a>. Notice that the notation of tags, classes and id’s much resembles CSS selectors. The template also includes one variable called <code class="language-plaintext highlighter-rouge">youAreUsingJade</code>, which we can use to control the rendering output.</p>
<figure class="highlight"><pre><code class="language-html" data-lang="html">doctype html
html(lang="en")
head
title= pageTitle
script(type='text/javascript').
if (foo) {
bar(1 + 5)
}
body
h1 Jade - node template engine
#container.col
if youAreUsingJade
p You are amazing
else
p Get on it!
p.
Jade is a terse and simple
templating language with a
strong focus on performance
and powerful features.</code></pre></figure>
<p>Converting a template to HTML text involves two steps. The first step compiles the template with some formatting options into a closure. The binding for this is implemented in <code class="language-plaintext highlighter-rouge">jade_compile</code>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Compile a Jade template in R</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rjade</span><span class="p">)</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="n">system.file</span><span class="p">(</span><span class="s2">"examples/test.jade"</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rjade"</span><span class="p">))</span><span class="w">
</span><span class="n">tpl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jade_compile</span><span class="p">(</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">pretty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span></code></pre></figure>
<p>The second step calls the closure with optionally some local variables to render the output to HTML.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Render the template</span><span class="w">
</span><span class="n">tpl</span><span class="p">()</span></code></pre></figure>
<p>The output looks like this:</p>
<figure class="highlight"><pre><code class="language-html" data-lang="html"><span class="cp"><!DOCTYPE html></span>
<span class="nt"><html</span> <span class="na">lang=</span><span class="s">"en"</span><span class="nt">></span>
<span class="nt"><head></span>
<span class="nt"><title></title></span>
<span class="nt"><script </span><span class="na">type=</span><span class="s">"text/javascript"</span><span class="nt">></span>
<span class="k">if</span> <span class="p">(</span><span class="nx">foo</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">bar</span><span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span>
<span class="p">}</span>
<span class="nt"></script></span>
<span class="nt"></head></span>
<span class="nt"><body></span>
<span class="nt"><h1></span>Jade - node template engine<span class="nt"></h1></span>
<span class="nt"><div</span> <span class="na">id=</span><span class="s">"container"</span> <span class="na">class=</span><span class="s">"col"</span><span class="nt">></span>
<span class="nt"><p></span>Get on it!<span class="nt"></p></span>
<span class="nt"><p></span>
Jade is a terse and simple
templating language with a
strong focus on performance
and powerful features.
<span class="nt"></p></span>
<span class="nt"></div></span>
<span class="nt"></body></span>
<span class="nt"></html></span></code></pre></figure>
<p>Note how the HTML output changes when setting local variables:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">tpl</span><span class="p">(</span><span class="n">youAreUsingJade</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-html" data-lang="html"><span class="cp"><!DOCTYPE html></span>
<span class="nt"><html</span> <span class="na">lang=</span><span class="s">"en"</span><span class="nt">></span>
<span class="nt"><head></span>
<span class="nt"><title></title></span>
<span class="nt"><script </span><span class="na">type=</span><span class="s">"text/javascript"</span><span class="nt">></span>
<span class="k">if</span> <span class="p">(</span><span class="nx">foo</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">bar</span><span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span>
<span class="p">}</span>
<span class="nt"></script></span>
<span class="nt"></head></span>
<span class="nt"><body></span>
<span class="nt"><h1></span>Jade - node template engine<span class="nt"></h1></span>
<span class="nt"><div</span> <span class="na">id=</span><span class="s">"container"</span> <span class="na">class=</span><span class="s">"col"</span><span class="nt">></span>
<span class="nt"><p></span>You are amazing<span class="nt"></p></span>
<span class="nt"><p></span>
Jade is a terse and simple
templating language with a
strong focus on performance
and powerful features.
<span class="nt"></p></span>
<span class="nt"></div></span>
<span class="nt"></body></span>
<span class="nt"></html></span></code></pre></figure>
<p>That’s it. Hover over to the <a href="http://jade-lang.com/">jade website</a> to learn about the full power of this amazing templating language.</p>
Introducing js: tools for working with JavaScript in R2015-02-17T00:00:00+00:00https://www.opencpu.org/posts/js-release-0-1
<a href="https://www.opencpu.org/posts/js-release-0-1"><img alt="opencpu logo" src="https://www.opencpu.org/images/jshint.png"></a>
<p>A new package has appeared on CRAN called <a href="http://cran.r-project.org/web/packages/js/">js</a>. This package implements bindings to several popular JavaScript libraries for validating, reformatting, optimizing and analyzing JavaScript code. It builds on the <a href="http://cran.r-project.org/web/packages/V8/vignettes/v8_intro.html">V8</a> engine, the fully standalone JavaScript engine in R.</p>
<h2 id="syntax-validation">Syntax Validation</h2>
<p>Several R packages allow the user to supply JavaScript code to be used as callback function or configuration object within a visualization or web application. By validating in R that the JavaScript code is syntactically correct and of the right type before actually inserting it in the HTML, we can avoid many annoying bugs.</p>
<p>The <code class="language-plaintext highlighter-rouge">js_typeof</code> function simply calls the <code class="language-plaintext highlighter-rouge">typeof</code> operator on the given code. If the code is syntactically invalid, a SyntaxError will be raised.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">callback</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'function(x, y){
var z = x*y ;
return z;
}'</span><span class="w">
</span><span class="n">js_typeof</span><span class="p">(</span><span class="n">callback</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "function"</span></code></pre></figure>
<p>Same for objects:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">conf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'{
foo : function (){},
bar : 123
}'</span><span class="w">
</span><span class="n">js_typeof</span><span class="p">(</span><span class="n">conf</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "object"</span></code></pre></figure>
<p>Catch JavaScript typos:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">js_typeof</span><span class="p">(</span><span class="s1">'function(x,y){return x + y}}'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Error in eval(expr, envir, enclos): SyntaxError: Unexpected token }</span></code></pre></figure>
<h2 id="script-validation">Script Validation</h2>
<p>A JavaScript program typically consists of script with a collection of JavaScript statements. The <code class="language-plaintext highlighter-rouge">js_validate_script</code> function can be used to validate an entire script.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">jscode</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="n">system.file</span><span class="p">(</span><span class="s2">"js/uglify.min.js"</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="o">=</span><span class="s2">"js"</span><span class="p">),</span><span class="w"> </span><span class="n">warn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">js_validate_script</span><span class="p">(</span><span class="n">jscode</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] TRUE</span></code></pre></figure>
<p>Note that JavaScript does not allow for defining anonymous functions in the global scope:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">js_validate_script</span><span class="p">(</span><span class="s1">'function(x, y){return x + y}'</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] FALSE</span></code></pre></figure>
<p>To validate individual functions or objects, use the <code class="language-plaintext highlighter-rouge">js_typeof</code> function.</p>
<h2 id="uglify-reformatting-and-optimization">Uglify: reformatting and optimization</h2>
<p>One of the most popular and powerful libraries for working with JavaScript code is <a href="https://www.npmjs.com/package/uglify-js">uglify-js</a>. This package provides an extensive toolkit for manipulating the syntax tree of a piece of JavaScript code.</p>
<p>The <code class="language-plaintext highlighter-rouge">uglify_reformat</code> function parses a string with code and then feeds it to the <a href="http://lisperator.net/uglifyjs/codegen">uglify code generator</a> which converts it back to a JavaScript text, with custom formatting options such as fixing whitespace, semicolons, etc.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"function test(x, y){ x = x || 1; y = y || 1; return x*y;}"</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">uglify_reformat</span><span class="p">(</span><span class="n">code</span><span class="p">,</span><span class="w"> </span><span class="n">beautify</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">indent_level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="c1"># function test(x, y) {</span><span class="w">
</span><span class="c1"># x = x || 1;</span><span class="w">
</span><span class="c1"># y = y || 1;</span><span class="w">
</span><span class="c1"># return x * y;</span><span class="w">
</span><span class="c1"># }</span></code></pre></figure>
<p>The more impressive part of uglify-js is the <a href="http://lisperator.net/uglifyjs/compress">compressor</a> which refactors the entire syntax tree, effectively rewriting your code into a more compact but equivalent program. The <code class="language-plaintext highlighter-rouge">uglify_optimize</code> function in R is a simple wrapper which parses code and then feeds it to the compressor.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">cat</span><span class="p">(</span><span class="n">code</span><span class="p">)</span><span class="w">
</span><span class="c1"># function test(x, y){ x = x || 1; y = y || 1; return x*y;}</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">uglify_optimize</span><span class="p">(</span><span class="n">code</span><span class="p">))</span><span class="w">
</span><span class="c1"># function test(x,y){return x=x||1,y=y||1,x*y}</span></code></pre></figure>
<p>You can pass <a href="http://lisperator.net/uglifyjs/compress">compressor options</a> to <code class="language-plaintext highlighter-rouge">uglify_optimize</code> to control the various uglify optimization techniques.</p>
<h2 id="jshint-code-analysis">JSHint: code analysis</h2>
<p>JSHint will automatically detect errors and potential problems in JavaScript code. The <code class="language-plaintext highlighter-rouge">jshint</code> function is R will return a data frame where each row is a problem detected by the library (type, line and reason of error):</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"var foo = 123"</span><span class="w">
</span><span class="n">jshint</span><span class="p">(</span><span class="n">code</span><span class="p">)</span><span class="w">
</span><span class="c1">#</span><span class="w">
</span><span class="c1"># id raw code evidence line character scope reason</span><span class="w">
</span><span class="c1"># 1 (error) Missing semicolon. W033 var foo = 123 1 14 (main) Missing semicolon.</span></code></pre></figure>
<p>JSHint has many <a href="http://jshint.com/docs/options/">configuration options</a> to control which types of code propblems it will report on.</p>
Minimist: an example of writing native JavaScript bindings in R2015-02-16T00:00:00+00:00https://www.opencpu.org/posts/minimist-release-0-1
<a href="https://www.opencpu.org/posts/minimist-release-0-1"><img alt="opencpu logo" src="https://www.opencpu.org/images/substack.jpg"></a>
<p>A new package has appeared on CRAN called <a href="http://cran.r-project.org/web/packages/minimist/">minimist</a>, which implements an interface to the popular <a href="https://www.npmjs.com/package/minimist">JavaScript library</a>. This package has only one function, used for argument parsing. For example in RGui on OSX, the output of <code class="language-plaintext highlighter-rouge">commandArgs()</code> looks like this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">commandArgs</span><span class="p">()</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"R"</span><span class="w"> </span><span class="s2">"--no-save"</span><span class="w"> </span><span class="s2">"--no-restore-data"</span><span class="w"> </span><span class="s2">"--gui=aqua"</span><span class="w"> </span></code></pre></figure>
<p>Minimist turns that into this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">minimist</span><span class="p">)</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">minimist</span><span class="p">(</span><span class="n">commandArgs</span><span class="p">())</span><span class="w">
</span><span class="o">$</span><span class="n">`_`</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"R"</span><span class="w">
</span><span class="o">$</span><span class="n">save</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="o">$</span><span class="n">`restore-data`</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="o">$</span><span class="n">gui</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"aqua"</span></code></pre></figure>
<p>Note how it interprets the <code class="language-plaintext highlighter-rouge">--no-</code> prefix as <code class="language-plaintext highlighter-rouge">FALSE</code> and the <code class="language-plaintext highlighter-rouge">--foo=bar</code> as a key-value pair. It has some more of these rules, following the usual scripting argument syntax conventions. Cool, but not exactly ground breaking; there are already half a dozen packages on CRAN for parsing arguments (although this one is particularly nice :P).</p>
<h2 id="writing-javascript-bindings-using-v8">Writing JavaScript bindings using V8</h2>
<p>The main purpose of this new package is to exemplify how to write a package with bindings to a JavaScript library using V8. If you take a look at the <a href="https://github.com/cran/minimist">package source</a>, you might be surprised how small it is. The package consists of:</p>
<ul>
<li>A copy of the <a href="https://www.npmjs.com/package/minimist">minimist.js</a> library in the package <a href="https://github.com/cran/minimist/tree/master/inst/js"><code class="language-plaintext highlighter-rouge">inst</code></a> dir</li>
<li>Two <a href="https://github.com/cran/minimist/blob/0.1/R/onLoad.R">lines of standard code</a> to initiate the V8 engine and read minimist when loading the R package</li>
<li>A one-line <a href="https://github.com/cran/minimist/blob/0.1/R/minimist.R">wrapper function</a> to call the JavaScript function from R</li>
</ul>
<p>That’s it. To install this package from source <strong>no compiler is required</strong>. It will build out of the box, even on machines without Rtools or Xcode. Moreover, there are <strong>no external dependencies</strong> as is the case for e.g. Java code, where we need to install a JVM. Everything is self contained within R and V8. It’s fast too:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">system.time</span><span class="p">(</span><span class="n">minimist</span><span class="p">(</span><span class="n">commandArgs</span><span class="p">()))</span><span class="w">
</span><span class="n">user</span><span class="w"> </span><span class="n">system</span><span class="w"> </span><span class="n">elapsed</span><span class="w">
</span><span class="m">0.001</span><span class="w"> </span><span class="m">0.000</span><span class="w"> </span><span class="m">0.001</span></code></pre></figure>
<p>I’m working on several other packages to implement bindings to cool JavaScript libraries (see also <a href="https://www.opencpu.org/posts/v8-release-0-5/">yesterdays post</a>). If you have some suggestions for other JavaScript libraries that might be useful in R, <a href="http://twitter.com/home?status=%23rstats%20%40opencpu%20">get in touch</a>.</p>
V8 version 0.5: typed arrays and sql.js2015-02-15T00:00:00+00:00https://www.opencpu.org/posts/v8-release-0-5
<a href="https://www.opencpu.org/posts/v8-release-0-5"><img alt="opencpu logo" src="https://www.opencpu.org/images/v8engine.jpg"></a>
<p>Earlier this month, V8 version 0.5 appeared on CRAN. This version adds support typed arrays as specified in ECMA 6 in order to support high performance computing and libraries compiled with emscripten. A big thanks goes to Kenton Russell (<a href="https://github.com/timelyportfolio">@timelyportfolio</a>) for suggesting these features.</p>
<h1 id="example-sqljs">Example: sql.js</h1>
<p>These new features increase the amount of JavaScript libraries that will run out-of-the-box on V8. For example, <a href="https://github.com/kripken/sql.js/">sql.js</a> is a port of SQLite to JavaScript, by compiling the SQLite C code with Emscripten:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Load V8</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">V8</span><span class="p">)</span><span class="w">
</span><span class="n">stopifnot</span><span class="p">(</span><span class="n">packageVersion</span><span class="p">(</span><span class="s2">"V8"</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s2">"0.5"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Create JavaScript context and load sql.js</span><span class="w">
</span><span class="n">ct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">new_context</span><span class="p">()</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">source</span><span class="p">(</span><span class="s2">"https://raw.githubusercontent.com/kripken/sql.js/master/js/sql.js"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Evaluate JavaScript code</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s1">'
var db = new SQL.Database()
db.run("CREATE TABLE hello (person char, age int);")
db.run("INSERT INTO hello VALUES (\'jerry\', 34);")
db.run("INSERT INTO hello VALUES (\'mary\', 27);")
db.run("INSERT INTO hello VALUES (\'joe\', 65);")
db.run("INSERT INTO hello VALUES (\'anna\', 18);")
// query:
var out = []
var stmt = db.prepare("SELECT * FROM hello WHERE age < 40");
while (stmt.step()) out.push(stmt.getAsObject());
'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Copy the object from JavaScript to R</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ct</span><span class="o">$</span><span class="n">get</span><span class="p">(</span><span class="s2">"out"</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">data</span><span class="p">)</span></code></pre></figure>
<h1 id="more-v8-fun">More V8 fun</h1>
<p>Several other examples are available on gist, for example <a href="https://gist.github.com/jeroenooms/7e56e2649389f53ed0ee">cheerio</a> (html parsing), <a href="https://gist.github.com/timelyportfolio/9b4fc699bb6d67b7f418">turf.js</a> (geojson), <a href="https://gist.github.com/jeroenooms/d0d03c7e58443f5a4438">viz.js</a> and <a href="https://gist.github.com/jeroenooms/c09fdb0465f7e9382163">KaTeX</a>. I am working on several packages that implement actual bindings to JavaScript libraries using V8. The first ones have just landed on CRAN: <a href="http://cran.r-project.org/web/packages/minimist/">minimist</a> and <a href="http://cran.r-project.org/web/packages/js/">js</a>.</p>
<p>To learn more, have a look at the vignettes:</p>
<ul>
<li><a href="http://cran.r-project.org/web/packages/V8/vignettes/v8_intro.html">Introduction to V8 for R</a></li>
<li><a href="http://cran.r-project.org/web/packages/V8/vignettes/npm.html">Using NPM packages in V8</a></li>
</ul>
<p>Questions, suggestions? Find me on <a href="http://twitter.com/home?status=%23rstats%20%40opencpu%20">twitter</a> or <a href="https://github.com/jeroenooms/">github</a>.</p>
V8 version 0.4: console.log and exception handling2015-01-13T00:00:00+00:00https://www.opencpu.org/posts/v8-release-0-4
<a href="https://www.opencpu.org/posts/v8-release-0-4"><img alt="opencpu logo" src="https://www.opencpu.org/images/v8engine.jpg"></a>
<p>V8 version 0.4 has appeared on CRAN. This version introduces several new console functions (<code class="language-plaintext highlighter-rouge">console.log</code>, <code class="language-plaintext highlighter-rouge">console.warn</code>, <code class="language-plaintext highlighter-rouge">console.error</code>) and two vignettes:</p>
<ul>
<li><a href="http://cran.r-project.org/web/packages/V8/vignettes/v8_intro.html">Introduction to V8 for R</a></li>
<li><a href="http://cran.r-project.org/web/packages/V8/vignettes/npm.html">Using NPM packages in V8</a></li>
</ul>
<p>I will talk more about using NPM in another blog post this week.</p>
<h2 id="javascript-exceptions">JavaScript Exceptions</h2>
<p>Starting V8 version 0.4 each context has a <code class="language-plaintext highlighter-rouge">console</code> object in the global namespace:</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="nx">console</span><span class="p">)</span>
<span class="nx">log</span><span class="p">,</span><span class="nx">warn</span><span class="p">,</span><span class="nx">error</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">console.log</code>, <code class="language-plaintext highlighter-rouge">console.warn</code> and <code class="language-plaintext highlighter-rouge">console.error</code> functions can be used to generate stdout, warnings or errors in R from JavaScript. This allows for writing embedded JavaScript functions that propagate exceptions back to R, similar as we would do for other foreign language interfaces such as C or C++:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">V8</span><span class="p">)</span><span class="w">
</span><span class="n">ct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">new_context</span><span class="p">()</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s1">'console.log("Bla bla")'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Bla bla</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s1">'console.warn("Heads up!")'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Warning: Heads up!</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s1">'console.error("Oh noes!")'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Error: Oh noes!</span></code></pre></figure>
<p>For example you can use this to verify that external resources were loaded:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ct</span><span class="o">$</span><span class="n">source</span><span class="p">(</span><span class="s2">"https://cdnjs.cloudflare.com/ajax/libs/crossfilter/1.3.11/crossfilter.min.js"</span><span class="p">)</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s1">'var cf = crossfilter || console.error("failed to load crossfilter!")'</span><span class="p">)</span></code></pre></figure>
<p>Of course, in R you could use <code class="language-plaintext highlighter-rouge">tryCatch</code> or whatever you like to catch exceptions that were raised this way in your JavaScript code.</p>
<h2 id="interactive-console">Interactive Console</h2>
<p>The interactive console has been enhanced a bit as well. It no longer prints redundant “undefined” returns:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">V8</span><span class="p">)</span><span class="w">
</span><span class="n">ct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">new_context</span><span class="p">()</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">console</span><span class="p">()</span><span class="w">
</span><span class="c1"># This is V8 version 3.14.5.10. Press ESC or CTRL+C to exit.</span></code></pre></figure>
<p>From here we can try our new functions:</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">Bla bla</span><span class="dl">"</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">warn</span><span class="p">(</span><span class="dl">"</span><span class="s2">Heads up!</span><span class="dl">"</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="dl">"</span><span class="s2">Oh noes!</span><span class="dl">"</span><span class="p">)</span></code></pre></figure>
<h2 id="bindings-to-javascript-libraries">Bindings to JavaScript Libraries</h2>
<p>V8 provides a JavaScript call interface, data interchange, exception handling and interactive debugging console. This is everything we need to embed JavaScript code and libraries in R.</p>
<p>If you are curious how this would work, I have started working on a <a href="https://github.com/jeroenooms/js">new R package</a> implementing bindings to some of the very best libraries available for working with JavaScript and HTML. I hope this package will make it’s way to CRAN soon, but until then it is available from github</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"jeroenooms/js"</span><span class="p">)</span></code></pre></figure>
<p>Some silly example illustrating <a href="https://www.npmjs.com/package/jshint">jshint</a>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">js</span><span class="p">)</span><span class="w">
</span><span class="n">code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"var foo = 123\nvar bar = 456\nfoo + bar"</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">code</span><span class="p">)</span><span class="w">
</span><span class="c1"># var foo = 123</span><span class="w">
</span><span class="c1"># var bar = 456</span><span class="w">
</span><span class="c1"># foo + bar</span><span class="w">
</span><span class="n">jshint</span><span class="p">(</span><span class="n">code</span><span class="p">)[</span><span class="nf">c</span><span class="p">(</span><span class="s2">"line"</span><span class="p">,</span><span class="w"> </span><span class="s2">"reason"</span><span class="p">)]</span><span class="w">
</span><span class="c1"># line reason</span><span class="w">
</span><span class="c1"># 1 Missing semicolon.</span><span class="w">
</span><span class="c1"># 2 Missing semicolon.</span><span class="w">
</span><span class="c1"># 3 Expected an assignment or function call and instead saw an expression.</span><span class="w">
</span><span class="c1"># 3 Missing semicolon.</span></code></pre></figure>
<p>Or the brilliant <a href="https://www.npmjs.com/package/uglify-js">uglify-js</a>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">uglify_reformat</span><span class="p">(</span><span class="n">code</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "var foo=123;var bar=456;foo+bar;"</span><span class="w">
</span><span class="n">uglify_optimize</span><span class="p">(</span><span class="n">code</span><span class="p">)</span><span class="w">
</span><span class="c1"># Warning: Dropping side-effect-free statement [null:3,0]</span><span class="w">
</span><span class="c1"># [1] "var foo=123,bar=456;"</span></code></pre></figure>
curl 0.4 bugfix release2015-01-11T00:00:00+00:00https://www.opencpu.org/posts/curl-release-0-4
<a href="https://www.opencpu.org/posts/curl-release-0-4"><img alt="opencpu logo" src="https://www.opencpu.org/images/curllogo.jpg"></a>
<p>This week curl version 0.4 appeared on CRAN. This release fixes a memory <a href="https://github.com/jeroenooms/curl/commit/2d07e3fb1aec17fb8d64a5802277acf3d684fcd1">bug</a> that was introduced in the previous version, and which could under some circumstances crash your R session. The new version is well tested and super stable. If you are using this package, updating is highly recommended.</p>
<h2 id="what-is-curl-again">What is curl again?</h2>
<p>From the manual</p>
<blockquote>
<p>The curl() function provides a drop-in replacement for base url() with better performance and support for http 2.0, ssl (https://, ftps://), gzip, deflate and other libcurl goodies. This interface is implemented using the RConnection API in order to support incremental processing of both binary and text streams.</p>
</blockquote>
<p>Some examples from the help page illustrating https, gzip, redirects and other stuff that base url doesn’t do well:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">curl</span><span class="p">)</span><span class="w">
</span><span class="c1"># Read from a connection</span><span class="w">
</span><span class="n">con</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://httpbin.org/get"</span><span class="p">)</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span><span class="c1"># HTTP error</span><span class="w">
</span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://httpbin.org/status/418"</span><span class="p">,</span><span class="w"> </span><span class="s2">"r"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Follow redirects</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://httpbin.org/redirect/3"</span><span class="p">))</span><span class="w">
</span><span class="c1"># Error after redirect</span><span class="w">
</span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://httpbin.org/redirect-to?url=http://httpbin.org/status/418"</span><span class="p">,</span><span class="w"> </span><span class="s2">"r"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Auto decompress Accept-Encoding: gzip / deflate (rfc2616 #14.3)</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"http://httpbin.org/gzip"</span><span class="p">))</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"http://httpbin.org/deflate"</span><span class="p">))</span></code></pre></figure>
<h2 id="streaming">Streaming</h2>
<p>The advantage of curl over RCurl and httr is that the connection interface allows for streaming. For example you can use <code class="language-plaintext highlighter-rouge">readLines</code> to download and process data line-by-line:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">con</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">curl</span><span class="p">(</span><span class="s2">"http://jeroenooms.github.io/data/diamonds.json"</span><span class="p">,</span><span class="w"> </span><span class="n">open</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"r"</span><span class="p">)</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">close</span><span class="p">(</span><span class="n">con</span><span class="p">)</span></code></pre></figure>
<p>We can combine this with <code class="language-plaintext highlighter-rouge">stream_in</code> from jsonlite to stream-parse sizable datasets:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">con</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gzcon</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://jeroenooms.github.io/data/nycflights13.json.gz"</span><span class="p">))</span><span class="w">
</span><span class="n">nycflights</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stream_in</span><span class="p">(</span><span class="n">con</span><span class="p">)</span></code></pre></figure>
New in openssl 0.3: hash functions2015-01-10T00:00:00+00:00https://www.opencpu.org/posts/openssl-release-0-3
<a href="https://www.opencpu.org/posts/openssl-release-0-3"><img alt="opencpu logo" src="https://www.opencpu.org/images/cat1.jpg"></a>
<p>This week version 0.3 of the <a href="http://cran.r-project.org/web/packages/openssl/index.html">openssl</a> package appeared on CRAN. New in this release are bindings to the cryptographic hashning functions in OpenSSL. Not exactly ground breaking (hashing functions have long been available from digest) but nice to have anyway. An overview from the new <a href="http://cran.r-project.org/web/packages/openssl/vignettes/crypto_hashing.html">vignette</a>:</p>
<h2 id="hashing-functions">Hashing functions</h2>
<p>The functions <code class="language-plaintext highlighter-rouge">sha1</code>, <code class="language-plaintext highlighter-rouge">sha256</code>, <code class="language-plaintext highlighter-rouge">sha512</code>, <code class="language-plaintext highlighter-rouge">md4</code>, <code class="language-plaintext highlighter-rouge">md5</code> and <code class="language-plaintext highlighter-rouge">ripemd160</code> bind to the respective <a href="https://www.openssl.org/docs/apps/dgst.html">digest functions</a> in OpenSSL’s libcrypto. Both binary and string inputs are supported and the output type will match the input type.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">openssl</span><span class="p">)</span><span class="w">
</span><span class="n">md5</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "acbd18db4cc2f85cedef654fccc4a4d8"</span><span class="w">
</span><span class="n">md5</span><span class="p">(</span><span class="n">charToRaw</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1] ac bd 18 db 4c c2 f8 5c ed ef 65 4f cc c4 a4 d8</span></code></pre></figure>
<p>Functions are fully vectorized for the case of character vectors: a vector with n strings will return n hashes.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Vectorized for strings</span><span class="w">
</span><span class="n">md5</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">,</span><span class="w"> </span><span class="s2">"bar"</span><span class="p">,</span><span class="w"> </span><span class="s2">"baz"</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1] "acbd18db4cc2f85cedef654fccc4a4d8" "37b51d194a7513e45b56f6524f2d51f2"</span><span class="w">
</span><span class="c1"># [3] "73feffa4b7f6bb68e44cf984c85f6e88"</span></code></pre></figure>
<p>Besides character and raw vectors we can pass a connection object (e.g. a file, socket or url). In this case the function will stream-hash the binary contents of the conection.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Stream-hash a file</span><span class="w">
</span><span class="n">myfile</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">system.file</span><span class="p">(</span><span class="s2">"CITATION"</span><span class="p">)</span><span class="w">
</span><span class="n">md5</span><span class="p">(</span><span class="n">file</span><span class="p">(</span><span class="n">myfile</span><span class="p">))</span><span class="w">
</span><span class="c1"># Hashing....</span><span class="w">
</span><span class="c1"># [1] e4 4f 1b 99 e3 2f 27 e0 a7 e6 a0 0a 36 07 0e 1b</span></code></pre></figure>
<p>Same for URLs. The hash of the <a href="http://cran.us.r-project.org/bin/windows/base/old/3.1.1/R-3.1.1-win.exe"><code class="language-plaintext highlighter-rouge">R-3.1.1-win.exe</code></a> below should match the one in <a href="http://cran.us.r-project.org/bin/windows/base/old/3.1.1/md5sum.txt"><code class="language-plaintext highlighter-rouge">md5sum.txt</code></a></p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Stream-hash from a network connection</span><span class="w">
</span><span class="n">md5</span><span class="p">(</span><span class="n">url</span><span class="p">(</span><span class="s2">"http://cran.us.r-project.org/bin/windows/base/old/3.1.1/R-3.1.1-win.exe"</span><span class="p">))</span><span class="w">
</span><span class="c1"># Hashing................................................................................................................</span><span class="w">
</span><span class="c1"># [1] 0b 48 29 e8 92 10 eb 6d 13 71 24 8c d0 97 d1 fc</span></code></pre></figure>
<h2 id="compare-to-digest">Compare to digest</h2>
<p>Similar functionality is also available in the <a href="http://cran.r-project.org/web/packages/digest/index.html">digest</a> package, but with a slightly different interface:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Compare to digest</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">digest</span><span class="p">)</span><span class="w">
</span><span class="n">digest</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">,</span><span class="w"> </span><span class="s2">"md5"</span><span class="p">,</span><span class="w"> </span><span class="n">serialize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "acbd18db4cc2f85cedef654fccc4a4d8"</span><span class="w">
</span><span class="c1"># Other way around</span><span class="w">
</span><span class="n">digest</span><span class="p">(</span><span class="n">cars</span><span class="p">,</span><span class="w"> </span><span class="n">skip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "81919836edd7b5a422700ac32bbccd7d"</span><span class="w">
</span><span class="n">md5</span><span class="p">(</span><span class="n">serialize</span><span class="p">(</span><span class="n">cars</span><span class="p">,</span><span class="w"> </span><span class="kc">NULL</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1] 81 91 98 36 ed d7 b5 a4 22 70 0a c3 2b bc cd 7d</span></code></pre></figure>
OpenCPU release 1.4.6: gzip and systemd2014-12-30T00:00:00+00:00https://www.opencpu.org/posts/opencpu-release-1-4-6
<a href="https://www.opencpu.org/posts/opencpu-release-1-4-6"><img alt="opencpu logo" src="https://www.opencpu.org/images/systemd.jpg"></a>
<p>OpenCPU server version 1.4.6 has been released to <a href="https://launchpad.net/~opencpu/+archive/ubuntu/opencpu-1.4">launchpad</a>, <a href="https://build.opensuse.org/package/show/home:jeroenooms:opencpu-1.4/opencpu">OBS</a>, and <a href="https://registry.hub.docker.com/repos/opencpu/">dockerhub</a> (more about docker in a future blog post). I also updated the instructions to <a href="https://www.opencpu.org/download.html">install</a> the server or build from source for <a href="https://github.com/jeroenooms/opencpu-server/tree/master/rpm#readme">rpm</a> or <a href="https://github.com/jeroenooms/opencpu-server/tree/master/debian#readme">deb</a>. If you have a running deployment, you should be able to upgrade with <code class="language-plaintext highlighter-rouge">apt-get upgrade</code> or <code class="language-plaintext highlighter-rouge">yum update</code> respectively.</p>
<h2 id="compression">Compression</h2>
<p>This release enables gzip compression in the default apache2 configuration for ocpu, which was suggested by several smart users. As was explained in an earlier <a href="https://www.opencpu.org/posts/curl-release-0-2/">post</a> about the curl package:</p>
<blockquote>
<p>Support for compression can make a huge difference when streaming large data. Text based formats such as json are popular because they are human readable, but the main downside of plain-text is inefficiency for storing numbers. However when gzipped, json payloads are often <a href="https://news.ycombinator.com/item?id=2571729">comparable to binary formats</a>, giving you the best of both worlds.</p>
</blockquote>
<p>The nice thing about http is that compression is handled entirely on the level of the protocol so it works for all content types and you don’t have to do anything to take advantage of it. Client and server will automatically negotiate a method of compression that they both support via the <code class="language-plaintext highlighter-rouge">Accept-Encoding</code> header.</p>
<p>Try playing around with the ocpu <a href="http://cloud.opencpu.org/ocpu/test/">test page</a> by looking at the <code class="language-plaintext highlighter-rouge">Content-Encoding</code> response header, or just use curl with the <code class="language-plaintext highlighter-rouge">--compress</code> flag (use <code class="language-plaintext highlighter-rouge">-v</code> to see headers)</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://demo.ocpu.io/MASS/data/Boston/json <span class="nt">-v</span> <span class="o">></span> /dev/null
curl https://demo.ocpu.io/MASS/data/Boston/json <span class="nt">--compress</span> <span class="nt">-v</span> <span class="o">></span> /dev/null
</code></pre></div></div>
<p>As usual, I also updated the library of R packages included with the server, including the latest <a href="https://www.opencpu.org/posts/jsonlite-release-0-9-14/">jsonlite 0.9.14</a> which allows for controlling prettify indentation:</p>
<ul>
<li><a href="http://demo.ocpu.io/MASS/data/cats/json"><code class="language-plaintext highlighter-rouge">http://demo.ocpu.io/MASS/data/cats/json</code></a></li>
<li><a href="http://demo.ocpu.io/MASS/data/cats/json?pretty=2"><code class="language-plaintext highlighter-rouge">http://demo.ocpu.io/MASS/data/cats/json?pretty=2</code></a></li>
<li><a href="http://demo.ocpu.io/MASS/data/cats/json?pretty=false"><code class="language-plaintext highlighter-rouge">http://demo.ocpu.io/MASS/data/cats/json?pretty=false</code></a></li>
</ul>
<h2 id="support-for-systemd-and-docker">Support for systemd and docker</h2>
<p>Apart from enabling compression and updating the R package library, this release has some internal changes to support systemd on Debian 8 (Jessie), on which the r-base docker images are based.</p>
<p>The introduction of systemd has been quite <a href="http://linux.slashdot.org/story/14/11/19/043259/debian-votes-against-mandating-non-systemd-compatibility">controversial</a> in the Debian community, to say the least, which is perhaps why things are not working as smoothly yet as in Fedora. My current init scripts definitely did not work out of the box with systemd (as advertised) and getting them fixed was quite painful.</p>
<p>However I did figure everything out eventually, and learned a lot about systemd while debugging it. I can see it being a very powerful system, definitely a big improvement over the old style init scripts. The way services are specified has a lot in common with how docker does it, which I’m sure is not a conicidence. I look forward to taking full advantage of it once it has landed in all major distributions.</p>
<p>I really hope the Debian folks will resolve their differences sooner rather than later though, because the current state of Jessie is not very good. Even popular packges such as nginx are currently broken due to the chaos and uncertainty surrounding the transition to systemd, which is not helping anyone. On the other hand, I do admire the Debian tradition of transparent and democratic decision making (even when messy) which is something the R community seems to be missing sometimes…</p>
Interactive JavaScript in R with V8: a crossfilter example2014-12-24T00:00:00+00:00https://www.opencpu.org/posts/v8-release-0-3
<a href="https://www.opencpu.org/posts/v8-release-0-3"><img alt="opencpu logo" src="https://www.opencpu.org/images/v8big.gif"></a>
<p>In last weeks <a href="https://www.opencpu.org/posts/v8-release-0-2/">blog post</a> introducing the new V8 package I showed how you can use <code class="language-plaintext highlighter-rouge">context$eval</code> and <code class="language-plaintext highlighter-rouge">context$source</code> to execute commands and scripts of JavaScript in R.</p>
<p>It turns out that typing <code class="language-plaintext highlighter-rouge">context$eval()</code> for each JavaScript command gets annoying very quickly, so the new V8 version 0.3 adds an interactive console feature that works very similar to the one in chrome developer tools or Firebug. Playing in the interactive console is a nice way to debug a session, or just to learn JavaScript.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Load stuff</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">V8</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="o">=</span><span class="s2">"ggplot2"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Create JavaScript session</span><span class="w">
</span><span class="n">ct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">new_context</span><span class="p">()</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">assign</span><span class="p">(</span><span class="s2">"diamonds"</span><span class="p">,</span><span class="w"> </span><span class="n">diamonds</span><span class="p">)</span><span class="w">
</span><span class="c1"># Load CrossFilter JavaScript library</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">source</span><span class="p">(</span><span class="s2">"http://cdnjs.cloudflare.com/ajax/libs/crossfilter/1.3.11/crossfilter.min.js"</span><span class="p">)</span></code></pre></figure>
<p>The code above loads the <code class="language-plaintext highlighter-rouge">diamonds</code> dataset from the ggplot2 package and assigns it to a new JavaScript context. We also load the <a href="http://square.github.io/crossfilter/">crossfilter</a> JavaScript library. We can now use the <code class="language-plaintext highlighter-rouge">console</code> method to enter an interactive JavaScript console for this session:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ct</span><span class="o">$</span><span class="n">console</span><span class="p">()</span><span class="w">
</span><span class="c1"># This is V8 version 3.14.5.10. Press ESC or CTRL+C to exit.</span><span class="w">
</span><span class="c1"># ~</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">~</code> prompt indicates that we are in V8 now and can start typing JavaScript. For example to filter the 10 diamonds with the highest depth in the price range between 2000 and 3000:</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="c1">//now we are in javasript :)</span>
<span class="kd">var</span> <span class="nx">cf</span> <span class="o">=</span> <span class="nx">crossfilter</span><span class="p">(</span><span class="nx">diamonds</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">price</span> <span class="o">=</span> <span class="nx">cf</span><span class="p">.</span><span class="nx">dimension</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">x</span><span class="p">){</span><span class="k">return</span> <span class="nx">x</span><span class="p">.</span><span class="nx">price</span><span class="p">})</span>
<span class="kd">var</span> <span class="nx">depth</span> <span class="o">=</span> <span class="nx">cf</span><span class="p">.</span><span class="nx">dimension</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">x</span><span class="p">){</span><span class="k">return</span> <span class="nx">x</span><span class="p">.</span><span class="nx">depth</span><span class="p">})</span>
<span class="nx">price</span><span class="p">.</span><span class="nx">filter</span><span class="p">([</span><span class="mi">2000</span><span class="p">,</span> <span class="mi">3000</span><span class="p">])</span>
<span class="nx">output</span> <span class="o">=</span> <span class="nx">depth</span><span class="p">.</span><span class="nx">top</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span></code></pre></figure>
<p>You’ll notice that crossfilter is pretty fast! To in inspect the data in JavaScript we can convert it to JSON:</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">output</span><span class="p">)</span></code></pre></figure>
<p>But easier might be to read the data in R. Exit the prompt by pressing ESC, which gives you back R’s default <code class="language-plaintext highlighter-rouge">></code> prompt. From there we can read the retrieve the output object using <code class="language-plaintext highlighter-rouge">ct$get</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Pressing ESC</span><span class="w">
</span><span class="c1"># Exiting V8 console.</span><span class="w">
</span><span class="n">output</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ct</span><span class="o">$</span><span class="n">get</span><span class="p">(</span><span class="s2">"output"</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">output</span><span class="p">)</span></code></pre></figure>
<p>All of this will work seamlessly in most editors too. For example if you load this <a href="https://gist.github.com/jeroenooms/9e4dc12a70b7e880fbed">script</a> in RStudio, you can execute it by selecting the code and pressing the Run button in the script editor, and it does exactly what you would expect!</p>
<p>However, the console is of course mostly for debugging and interactive use. If you plan to share your R script, the most elegant way to include some JavaScript code is by putting it in a seperate file <code class="language-plaintext highlighter-rouge">myscript.js</code> and then load it from R using <code class="language-plaintext highlighter-rouge">ct$source("myscript.js")</code>.</p>
Introducing V8: An Embedded JavaScript Engine for R2014-12-17T00:00:00+00:00https://www.opencpu.org/posts/v8-release-0-2
<a href="https://www.opencpu.org/posts/v8-release-0-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/v8small.jpg"></a>
<p>JavaScript is an fantastic language for building applications. It runs on browsers, <a href="http://nodejs.org/">servers</a> and <a href="http://docs.mongodb.org/manual/core/server-side-javascript/">databases</a>, making it possible to design an entire web stack in a single language.</p>
<p>The OpenCPU <a href="https://www.opencpu.org/jslib.html">JavaScript client</a> already allows for calling R functions from JavaScript (see <a href="http://jsfiddle.net/user/opencpu/fiddles/">jsfiddles</a> and <a href="https://www.opencpu.org/apps.html">apps</a>). With the new V8 package we can now do the reverse as well: run JavaScript inside R!</p>
<h2 id="the-v8-engine">The V8 Engine</h2>
<p>V8 is Google’s open source, high performance JavaScript engine. It is written in C++ and implements ECMAScript as specified in ECMA-262, 5th edition. The <a href="http://cran.r-project.org/web/packages/V8/index.html">V8 R package</a> builds on C++ library to provide a completely standalone JavaScript engine within R:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">V8</span><span class="p">)</span><span class="w">
</span><span class="c1"># Create a new context</span><span class="w">
</span><span class="n">ct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">new_context</span><span class="p">();</span><span class="w">
</span><span class="c1"># Evaluate some code</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s2">"foo=123"</span><span class="p">)</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s2">"bar=456"</span><span class="p">)</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s2">"foo+bar"</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "579"</span></code></pre></figure>
<p>However note that V8 by itself is just the naked JavaScript engine. Currently, there is no DOM, no network or disk IO, not even an event loop. Which is fine because we already have all of those in R. In this sense V8 resembles other foreign language interfaces such as Rcpp or rJava, but then for JavaScript.</p>
<p>A major advantage over the other foreign language interfaces is that V8 requires no compilers, external executables or other run-time dependencies to execute JavaScript. The entire engine is contained within a 6MB R package (2MB when zipped) and works on all major platforms.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s2">"JSON.stringify({x:Math.random()})"</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "{\"x\":0.08649904327467084}"</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">eval</span><span class="p">(</span><span class="s2">"(function(x){return x+1;})(123)"</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "124"</span></code></pre></figure>
<p>Sounds promising? There is more!</p>
<h2 id="v8--jsonlite--awesome">V8 + jsonlite = awesome</h2>
<p>The native data structure in JavaScript is basically JSON, hence we can use <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html">jsonlite</a> for seamless data interchange between V8 and R:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ct</span><span class="o">$</span><span class="n">assign</span><span class="p">(</span><span class="s2">"mydata"</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ct</span><span class="o">$</span><span class="n">get</span><span class="p">(</span><span class="s2">"mydata"</span><span class="p">)</span><span class="w">
</span><span class="n">all.equal</span><span class="p">(</span><span class="n">out</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="c1"># TRUE</span></code></pre></figure>
<p>Because jsonlite stores data in its natural structure, we can plug it staight into existing JavaScript libraries:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Use a JavaScript library</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="n">source</span><span class="p">(</span><span class="s2">"http://underscorejs.org/underscore-min.js"</span><span class="p">)</span><span class="w">
</span><span class="n">ct</span><span class="o">$</span><span class="nf">call</span><span class="p">(</span><span class="s2">"_.filter"</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">I</span><span class="p">(</span><span class="s2">"function(x){return x.mpg < 15}"</span><span class="p">))</span><span class="w">
</span><span class="c1"># mpg cyl disp hp drat wt qsec vs am gear carb</span><span class="w">
</span><span class="c1"># Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4</span><span class="w">
</span><span class="c1"># Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4</span><span class="w">
</span><span class="c1"># Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4</span><span class="w">
</span><span class="c1"># Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4</span><span class="w">
</span><span class="c1"># Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4</span></code></pre></figure>
<h2 id="javascript-libraries">JavaScript Libraries</h2>
<p>JavaScript libraries specifically written for the Browser (such as Jquery or D3) or libraries for Node that depend on disk/network functionality might not work in plain V8, but many of them actually do.</p>
<p>For example, <a href="http://square.github.io/crossfilter/">crossfilter</a> is a high performance data filtering library that I have used for creating D3 <a href="http://jeroenooms.github.io/dashboard/snack/">data dashboards</a> in the browser, but crossfilter itself is just vanilla JavaScript:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ct</span><span class="o">$</span><span class="n">source</span><span class="p">(</span><span class="s2">"cdnjs.cloudflare.com/ajax/libs/crossfilter/1.3.11/crossfilter.min.js"</span><span class="p">)</span></code></pre></figure>
<p>I’ll continue here in the next blog post later this week. Have a look at the (very short) <a href="http://cran.r-project.org/web/packages/V8/V8.pdf">package manual</a> in the mean time.</p>
New features in jsonlite 0.9.142014-12-05T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-release-0-9-14
<a href="https://www.opencpu.org/posts/jsonlite-release-0-9-14"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>The <a href="http://cran.rstudio.org/web/packages/jsonlite/index.html">jsonlite</a> package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.14 appeared on CRAN which adds some handy new features.</p>
<h2 id="significant-digits">Significant Digits</h2>
<p>By default, the <code class="language-plaintext highlighter-rouge">digits</code> argument in <code class="language-plaintext highlighter-rouge">toJSON</code> specifies the number of decimal digits to print:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">toJSON</span><span class="p">(</span><span class="nb">pi</span><span class="p">,</span><span class="w"> </span><span class="n">digits</span><span class="o">=</span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="c1"># [3.142]</span></code></pre></figure>
<p>A feature requested by Winston Chang was to control precision of number formatting. You can now specify the number of significant digits, analogous to the <code class="language-plaintext highlighter-rouge">signif</code> function in base R. Either set <code class="language-plaintext highlighter-rouge">signif = TRUE</code> or specify the <code class="language-plaintext highlighter-rouge">digits</code> argument using <code class="language-plaintext highlighter-rouge">I()</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">toJSON</span><span class="p">(</span><span class="nb">pi</span><span class="p">,</span><span class="w"> </span><span class="n">digits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">use_signif</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1"># [3.14]</span><span class="w">
</span><span class="n">toJSON</span><span class="p">(</span><span class="nb">pi</span><span class="p">,</span><span class="w"> </span><span class="n">digits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">I</span><span class="p">(</span><span class="m">3</span><span class="p">))</span><span class="w">
</span><span class="c1"># [3.14]</span></code></pre></figure>
<h2 id="prettify-indent">Prettify Indent</h2>
<p>A feature requested by Yihui Xie was to control the number of spaces to indent prettified json. The default is still 4 spaces:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">toJSON</span><span class="p">(</span><span class="nb">pi</span><span class="p">,</span><span class="w"> </span><span class="n">pretty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1"># [</span><span class="w">
</span><span class="c1"># 3.1416</span><span class="w">
</span><span class="c1"># ]</span></code></pre></figure>
<p>The number of indent spaces can be changed by setting the <code class="language-plaintext highlighter-rouge">pretty</code> argument to an integer. For example to indent by only 2 spaces:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">toJSON</span><span class="p">(</span><span class="nb">pi</span><span class="p">,</span><span class="w"> </span><span class="n">pretty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="c1"># [</span><span class="w">
</span><span class="c1"># 3.1416</span><span class="w">
</span><span class="c1"># ]</span></code></pre></figure>
<h2 id="support-for-64bit-integers-in-tojson">Support for 64bit integers in toJSON</h2>
<p>Another new feature is support for 64bit integers from the <code class="language-plaintext highlighter-rouge">bit64</code> package. R does not support 64 bit integers by default, and doubles have limited precision:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">2</span><span class="o">^</span><span class="m">60</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="w">
</span><span class="n">toJSON</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1.15292150460685e+18,1.15292150460685e+18,1.15292150460685e+18]</span></code></pre></figure>
<p>But when the number is stored as 64 bit integer, jsonlite will print the full integer in the JSON output:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">bit64</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.integer64</span><span class="p">(</span><span class="m">2</span><span class="p">)</span><span class="o">^</span><span class="m">60</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="w">
</span><span class="n">toJSON</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1152921504606846977,1152921504606846978,1152921504606846979]</span></code></pre></figure>
<p>Currently this is only supported in <code class="language-plaintext highlighter-rouge">toJSON</code>. The parser in <code class="language-plaintext highlighter-rouge">fromJSON</code> still uses doubles for very large integers.</p>
New package: curl. High performance http(s) streaming in R2014-11-22T00:00:00+00:00https://www.opencpu.org/posts/curl-release-0-2
<a href="https://www.opencpu.org/posts/curl-release-0-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/boat.jpg"></a>
<p>A bit ago I blogged about <a href="https://www.opencpu.org/posts/jsonlite-streaming">new streaming features</a> in jsonlite:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">diamonds2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stream_in</span><span class="p">(</span><span class="n">url</span><span class="p">(</span><span class="s2">"http://jeroenooms.github.io/data/diamonds.json"</span><span class="p">))</span></code></pre></figure>
<p>In the same blog post it was also mentioned that R does currently not support https connections. The <code class="language-plaintext highlighter-rouge">RCurl</code> package does support https, but does not have a connection interface. This bothered me so I decided to write one. The result is the new <a href="http://cran.r-project.org/package=curl">curl</a> package.</p>
<h2 id="encryption-compression-and-more">Encryption, compression and more</h2>
<p>From the package description:</p>
<blockquote>
<p>The curl() function provides a drop-in replacement for base url() with better performance and support for http 2.0, ssl (https, ftps), gzip, deflate and other libcurl goodies. This interface is implemented using the RConnection API in order to support incremental processing of both binary and text streams.</p>
</blockquote>
<p>What this means is that <code class="language-plaintext highlighter-rouge">curl()</code> should be able to do anything that <code class="language-plaintext highlighter-rouge">url()</code> does, but better. The same example as above, but now with https:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">curl</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">diamonds2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stream_in</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"https://jeroenooms.github.io/data/diamonds.json"</span><span class="p">))</span></code></pre></figure>
<p>That was easy. Switching to curl has other benefits as well. For example it automatically recognizes and decompresses gzipped or deflated connections from the <code class="language-plaintext highlighter-rouge">Accept-Encoding</code> header:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">readLines</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"http://httpbin.org/gzip"</span><span class="p">))</span><span class="w">
</span><span class="n">readLines</span><span class="p">(</span><span class="n">curl</span><span class="p">(</span><span class="s2">"http://httpbin.org/deflate"</span><span class="p">))</span></code></pre></figure>
<p>Support for compression can make a huge difference when streaming large data. Text based formats such as json are popular because they are human readable, but the main downside of plain-text is inefficiency for storing numbers. However when gzipped, json payloads are often <a href="https://news.ycombinator.com/item?id=2571729">comparable to binary formats</a>, giving you the best of both worlds.</p>
<h2 id="performance">Performance</h2>
<p>One thing that did surprise me a bit is the difference in performance. Especially the implementation of <code class="language-plaintext highlighter-rouge">readLines</code> for url connections seems to be inefficient in base R.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">con2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">curl</span><span class="p">(</span><span class="s2">"http://jeroenooms.github.io/data/diamonds.json"</span><span class="p">)</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">readLines</span><span class="p">(</span><span class="n">con2</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.238 0.096 0.334</span><span class="w">
</span><span class="n">con1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">url</span><span class="p">(</span><span class="s2">"http://jeroenooms.github.io/data/diamonds.json"</span><span class="p">)</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">readLines</span><span class="p">(</span><span class="n">con1</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.236 0.113 3.858</span></code></pre></figure>
<p>I’m not quite sure why this is. Maybe the base R version does some additional character recoding that I am not aware of, although I have not observed such behavior. Also measuring performance is tricky in this case because it depends on the connection bandwidth, caching settings, etc.</p>
OpenCPU release 1.4.5: configurable webhooks2014-11-10T00:00:00+00:00https://www.opencpu.org/posts/opencpu-release-1-4-5
<a href="https://www.opencpu.org/posts/opencpu-release-1-4-5"><img alt="opencpu logo" src="https://www.opencpu.org/images/struisvogel.jpg"></a>
<p>OpenCPU 1.4.5 is a patch release that improves performance by taking advantage of latest versions of jsonlite, devtools, knitr, openssl, etc. Also new in this release is the option to pass build parameters for deploying on ocpu.io (or your own opencpu server) using the github webhook.</p>
<p>As usual, server binaries for Ubuntu, Fedora and Suse are available from <a href="https://www.opencpu.org/download.html">Launchpad</a> and <a href="http://software.opensuse.org/download.html?project=home:jeroenooms:opencpu-1.4&package=opencpu">Build Service</a>. There should not be any breaking changes, but perhaps double check that all is OK next time you run <code class="language-plaintext highlighter-rouge">apt-get upgrade</code> on your server. If you are in production and do <em>not</em> want to upgrade, make sure to comment-out the <code class="language-plaintext highlighter-rouge">opencpu-1.4</code> ppa in the <code class="language-plaintext highlighter-rouge">/etc/apt/sources.list.d/</code> conf files.</p>
<p>The opencpu-1.4 repository now ships with:</p>
<ul>
<li>OpenCPU 1.4.5</li>
<li>R 3.1.2</li>
<li>Rcpp 0.11.3</li>
<li>RApache 1.2.5</li>
<li>RStudio-Server 0.98.1087</li>
</ul>
<p>For Debian/CentOS users, instructions to build opencpu-server packages from source are on github: <a href="https://github.com/jeroenooms/opencpu-server/tree/master/rpm#readme">rpm</a> and <a href="https://github.com/jeroenooms/opencpu-server/tree/master/debian#readme">deb</a>.</p>
<h2 id="configurable-webhooks">Configurable webhooks</h2>
<p>Any R package on Github can automatically be deployed to <code class="language-plaintext highlighter-rouge">https://yourname.ocpu.io/yourpkg</code> by setting the <a href="https://www.opencpu.org/api.html#api-ci">ocpu webhook</a> in your github repository. It takes about 15 seconds to setup, and is a great way to continuously publish and test code, data, documentation, vignettes from your package. You will also get notified by email if your package fails to build. If you are not using ocpu.io yet, now would be a good time to add the webhook :-)</p>
<p>New in this release is that http parameters added to the webhook URL will be passed to <a href="http://demo.ocpu.io/devtools/man/install_github/text">install_github</a>. For example if you want to build vignettes of your package, use webhook:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/webhook?build_vignettes=true
</code></pre></div></div>
<p>Or if your package is in a subdir in the repo:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/webhook?build_vignettes=true&subdir=pkgdir
</code></pre></div></div>
<p>In addition to parameters for <code class="language-plaintext highlighter-rouge">install_github</code>, there is currently one extra parameter <code class="language-plaintext highlighter-rouge">sendmail</code> (true/false) which specifies if the server sends an email with the build status.</p>
High performance JSON streaming in R: Part 12014-11-06T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-streaming
<a href="https://www.opencpu.org/posts/jsonlite-streaming"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>The jsonlite <a href="http://demo.ocpu.io/jsonlite/man/stream_in/html">stream_in</a> and <a href="http://demo.ocpu.io/jsonlite/man/stream_in/html">stream_out</a> functions implement line-by-line processing of JSON data over a connection, such as a socket, url, file or pipe. Thereby we can construct a data processing pipeline that can handle large (or unlimited) amounts of data with limited memory. This post will walk through some examples from the <a href="http://demo.ocpu.io/jsonlite/man/stream_in/html">help pages</a>.</p>
<h2 id="the-json-streaming-format">The json streaming format</h2>
<p>Because parsing huge JSON strings is difficult and inefficient, JSON streaming is done using lines of minified JSON records. This is pretty standard: JSON databases such as <a href="http://docs.mongodb.org/manual/reference/program/mongoexport/#cmdoption--query">MongoDB</a> use the same format to import/export large datasets. Note that this means that the total stream combined is not valid JSON itself; only the individual lines are.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">iris</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">,]</span><span class="w">
</span><span class="n">stream_out</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">con</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stdout</span><span class="p">())</span><span class="w">
</span><span class="c1"># {"Sepal.Length":5.1,"Sepal.Width":3.5,"Petal.Length":1.4,"Petal.Width":0.2,"Species":"setosa"}</span><span class="w">
</span><span class="c1"># {"Sepal.Length":4.9,"Sepal.Width":3,"Petal.Length":1.4,"Petal.Width":0.2,"Species":"setosa"}</span><span class="w">
</span><span class="c1"># {"Sepal.Length":4.7,"Sepal.Width":3.2,"Petal.Length":1.3,"Petal.Width":0.2,"Species":"setosa"}</span></code></pre></figure>
<p>Also note that because line-breaks are used as separators, prettified JSON is not permitted: the JSON lines must be minified. In this respect, the format is a bit different from fromJSON and toJSON where all lines are part of a single JSON structure with optional line breaks.</p>
<h2 id="streaming-tofrom-a-file">Streaming to/from a file</h2>
<p>The <code class="language-plaintext highlighter-rouge">nycflights13</code> package contains a dataset with about 5 million values. To stream this to a file:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">nycflights13</span><span class="p">)</span><span class="w">
</span><span class="n">stream_out</span><span class="p">(</span><span class="n">flights</span><span class="p">,</span><span class="w"> </span><span class="n">con</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">file</span><span class="p">(</span><span class="s2">"~/flights.json"</span><span class="p">))</span></code></pre></figure>
<p>Running this code will open the file connection, write json to the connection in batches of 500 rows, and afterwards close the connection. Status messages will be printed to the console while writing output. The entire process should take a few seconds and generate a json file of about 7MB.</p>
<p>We use the same file to illustrate how to stream the json back into R. The following code will stream-parse the json in batches of 500 lines. Afterward we verify that the output is indeed identical to the original one:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">flights2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stream_in</span><span class="p">(</span><span class="n">file</span><span class="p">(</span><span class="s2">"~/flights.json"</span><span class="p">))</span><span class="w">
</span><span class="n">all.equal</span><span class="p">(</span><span class="n">flights2</span><span class="p">,</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">flights</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1] TRUE</span></code></pre></figure>
<p>Because the data is read in small batches, this require much less memory than when we would try to parse a huge json blob all at once. The <code class="language-plaintext highlighter-rouge">pagesize</code> argument in <code class="language-plaintext highlighter-rouge">stream_in</code> and <code class="language-plaintext highlighter-rouge">stream_out</code> can be used to specify the number of rows that will be read/written per iteration.</p>
<h2 id="streaming-from-a-url">Streaming from a URL</h2>
<p>We can use the standard <code class="language-plaintext highlighter-rouge">url</code> function in R to stream from a HTTP connection.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">diamonds2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stream_in</span><span class="p">(</span><span class="n">url</span><span class="p">(</span><span class="s2">"http://jeroenooms.github.io/data/diamonds.json"</span><span class="p">))</span></code></pre></figure>
<p>If the data source is gzipped, simply wrap the connection in <code class="language-plaintext highlighter-rouge">gzcon</code>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">flights3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stream_in</span><span class="p">(</span><span class="n">gzcon</span><span class="p">(</span><span class="n">url</span><span class="p">(</span><span class="s2">"http://jeroenooms.github.io/data/nycflights13.json.gz"</span><span class="p">)))</span><span class="w">
</span><span class="n">all.equal</span><span class="p">(</span><span class="n">flights3</span><span class="p">,</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">flights</span><span class="p">))</span></code></pre></figure>
<p>Because R currently does not support SSL, we use a <code class="language-plaintext highlighter-rouge">curl</code> pipe to stream over HTTPS:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">flights4</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stream_in</span><span class="p">(</span><span class="n">gzcon</span><span class="p">(</span><span class="n">pipe</span><span class="p">(</span><span class="s2">"curl https://jeroenooms.github.io/data/nycflights13.json.gz"</span><span class="p">)))</span><span class="w">
</span><span class="n">all.equal</span><span class="p">(</span><span class="n">flights4</span><span class="p">,</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">flights</span><span class="p">))</span></code></pre></figure>
<p>For this to work, the <code class="language-plaintext highlighter-rouge">curl</code> executable needs to be installed and available in the search path, which requires cygwin on Windows. Unfortunately the RCurl package does not seem to support binary streaming at this point.</p>
<h2 id="next-up">Next up</h2>
<p>These examples illustrate basic line-by-line json streaming of data frames from/to a connection, which allows for importing/exporting large json datasets.</p>
<p>In the next blog post we will make the step to full JSON IO streaming by defining a custom <code class="language-plaintext highlighter-rouge">handler</code> function. This allows for constructing a json data processing pipeline in R that can handle an infinite data stream. Impatient readers can have a look at the examples in the <a href="http://demo.ocpu.io/jsonlite/man/stream_in/html">stream_in</a> help page.</p>
Parsing multipart/form-data with webutils2014-11-01T00:00:00+00:00https://www.opencpu.org/posts/webutils-release-0-2
<a href="https://www.opencpu.org/posts/webutils-release-0-2"><img alt="opencpu logo" src="https://www.opencpu.org/images/rabbit1.jpg"></a>
<p>As part of a larger effort to clean up and rewrite the opencpu package, some of the more general utilities will be moved into a new, separate package called <a href="http://cran.r-project.org/web/packages/webutils/">webutils</a>. The first release of webutils is now on CRAN.</p>
<p>The package contains a simple http request body parser that supports <code class="language-plaintext highlighter-rouge">application/x-www-form-urlencoded</code>, <code class="language-plaintext highlighter-rouge">multipart/form-data</code>, and <code class="language-plaintext highlighter-rouge">application/json</code>. The multipart parser is written in pure R but surprisingly fast. Furthermore, two demo functions are included that illustrate how to host and parse simple HTML forms (with file uploads) using either rhttpd or httpuv.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">webutils</span><span class="p">)</span><span class="w">
</span><span class="n">demo_rhttpd</span><span class="p">()</span><span class="w">
</span><span class="n">demo_httpuv</span><span class="p">()</span></code></pre></figure>
<p>Nothing ground breaking in a time of interactive graphics and restful data science as a service, but sometimes all you need is a simple form. I had a hard time finding a decent multipart parser for R, and this one does the job quite nicely.</p>
jsonlite 0.9.13: high performance number formatting2014-10-25T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-release-0-9-13
<a href="https://www.opencpu.org/posts/jsonlite-release-0-9-13"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>The <a href="http://cran.rstudio.org/web/packages/jsonlite/index.html">jsonlite</a> package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.13 appeared on CRAN which is the third release in a relatively short period focusing on performance optimization.</p>
<h2 id="fast-number-formatting">Fast number formatting</h2>
<p>Version 0.9.11 and 0.9.12 had already introduced majors speedup by porting <a href="https://www.opencpu.org/posts/jsonlite-release-0-9-11/">critical bottlenecks to C code</a> and switching to a <a href="https://www.opencpu.org/posts/jsonlite-release-0-9-12/">better JSON parser</a>. The current release focuses on number formatting and incorporates C code from <a href="https://code.google.com/p/stringencoders/"><code class="language-plaintext highlighter-rouge">modp_numtoa</code></a> which is several times faster than <code class="language-plaintext highlighter-rouge">as.character</code>, <code class="language-plaintext highlighter-rouge">formatC</code> or <code class="language-plaintext highlighter-rouge">sprintf</code> for converting doubles and integers to strings (your mileage may vary depending on platform and precision).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">nrow</span><span class="p">(</span><span class="n">diamonds</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 53940</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">jsonlite</span><span class="o">::</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"row"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.319 0.007 0.325</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">jsonlite</span><span class="o">::</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"col"</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.073 0.002 0.075</span></code></pre></figure>
<p>Using the same benchmark from <a href="http://pages.opencpu.org/posts/jsonlite-release-0-9-12/">previous posts</a>, time to convert the <code class="language-plaintext highlighter-rouge">diamonds</code> data to row-based json has gone down from 0.619s to 0.325s on my machine (about 2x speedup from jsonlite 0.9.12), and converting to column-based json has gone down from 0.330s to 0.075s (about 4x speedup).</p>
<h2 id="comparing-to-other-json-packages">Comparing to other JSON packages</h2>
<p>When comparing JSON packages, it should be noted that the comparsion is never entirely fair because different packages use different settings and defaults for missing values, number of digits, etc. Both <code class="language-plaintext highlighter-rouge">rjson</code> and <code class="language-plaintext highlighter-rouge">RJSONIO</code> only support the column based format for encoding data frames. Using their default settings:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">system.time</span><span class="p">(</span><span class="n">rjson</span><span class="o">::</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.279 0.004 0.281</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">RJSONIO</span><span class="o">::</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">))</span><span class="w">
</span><span class="c1"># user system elapsed</span><span class="w">
</span><span class="c1"># 0.918 0.027 0.944</span></code></pre></figure>
<p>For this particular dataset, jsonlite is about 3.5x faster than <code class="language-plaintext highlighter-rouge">rjson</code> and about 12x faster than <code class="language-plaintext highlighter-rouge">RJSONIO</code> (on my machine) to generate column-based JSON. These differences are relatively large because 7 out of the 10 columns in the <code class="language-plaintext highlighter-rouge">diamonds</code> dataset are numeric.</p>
Generating secure random numbers with openssl2014-10-24T00:00:00+00:00https://www.opencpu.org/posts/openssl-release-01
<a href="https://www.opencpu.org/posts/openssl-release-01"><img alt="opencpu logo" src="https://www.opencpu.org/images/securitycat.jpg"></a>
<p>I started working on a new R package with bindings for OpenSSL. The initial release is now available <a href="http://cran.r-project.org/web/packages/openssl">from CRAN</a>. To install the package on Linux you need <code class="language-plaintext highlighter-rouge">libssl-dev</code> (Debian/Ubuntu) or <code class="language-plaintext highlighter-rouge">openssl-devel</code> (Fedora, RHEL, CentOS). For Mac and Windows, precompiled binaries are available from CRAN as usual. The Mac version is compiled against the version of OpenSSL that is included with OSX. See the <a href="https://github.com/jeroenooms/openssl/blob/master/src/Makevars">comments</a> in Makevars if you want to compile against a more recent version of OpenSSL.</p>
<h2 id="secure-random-numbers">Secure random numbers</h2>
<p>The initial release of openssl implements bindings to the OpenSSL random number generator, which will be used to generate session keys in the upcoming version of the OpenCPU system. This feature was requested by Ruben Arslan who noted that the default RNG in R is not suitable for this because it is predictable and lack of entropy can lead to collisions. I’m not a crypto expert but it seems like everyone uses OpenSSL for secure RNG, hence this new package. For implementation details, see the respective <a href="https://www.openssl.org/docs/crypto/RAND_bytes.html">OpenSSL documentation</a> pages.</p>
<p>The <code class="language-plaintext highlighter-rouge">rand_bytes</code> and <code class="language-plaintext highlighter-rouge">rand_pseudo_bytes</code> functions return a raw vector with random bytes:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">openssl</span><span class="p">)</span><span class="w">
</span><span class="n">rand_bytes</span><span class="p">(</span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 3b a7 0f 85 e7 c6 cd 15 cb 5f</span></code></pre></figure>
<p>To convert them to integers (0-255) simply use <code class="language-plaintext highlighter-rouge">as.numeric</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">rand_bytes</span><span class="p">(</span><span class="m">10</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1] 15 149 231 77 18 29 219 191 165 112</span></code></pre></figure>
<p>Or convert bits to booleans:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">rnd</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rand_bytes</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="nf">as.logical</span><span class="p">(</span><span class="n">rawToBits</span><span class="p">(</span><span class="n">rnd</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1] FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE</span></code></pre></figure>
<h2 id="probability-distributions">Probability distributions</h2>
<p>Mapping random bytes to a continuous distribution requires a bit of math. For example to combine four 8bit bytes into a single 32bit double from the standard uniform distribution:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">rand_unif</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">n</span><span class="p">){</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">openssl</span><span class="o">::</span><span class="n">rand_bytes</span><span class="p">(</span><span class="n">n</span><span class="o">*</span><span class="m">4</span><span class="p">)),</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">
</span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="m">256</span><span class="o">^-</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">rand_unif</span><span class="p">(</span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 0.8094907 0.8180394 0.0743821 0.6031131 0.8488938</span></code></pre></figure>
<p>And from U(0,1) we can map into draws from a probability distribution using its CDF:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">rand_norm</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">){</span><span class="w">
</span><span class="n">qnorm</span><span class="p">(</span><span class="n">rand_unif</span><span class="p">(</span><span class="n">n</span><span class="p">),</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">rand_norm</span><span class="p">(</span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">mean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">sd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">15</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 101.86120 123.84420 70.15235 81.50505 86.46514</span></code></pre></figure>
<p>However note the native R random number generators are much faster and have better numeric properties. Also the OpenSSL RNG is not intended for generating large sequences of random numbers as often used in statistics. It is mainly useful in situations where it is critical to create a little bit of secure randomness that can not be manipulated. Typical applications include encryption keys, drinking games, or raffle drawings at your local R user group.</p>
<h2 id="more-fun-stuff">More fun stuff</h2>
<p>OpenSSL has a lot of other useful stuff which we coud add to the R package in future versions. In particular public key methods to sign and verify packages is something that R and CRAN could really benefit from. Simon Urbanek is working on something similar as well in the <a href="https://github.com/s-u/PKI">PKI</a> package, which also builds on OpenSSL.</p>
<p>If you you would like to see some other OpenSSL functionality in the R package, feel free to send a pull request with bindings on <a href="https://github.com/jeroenooms/openssl">github</a>. It would be great to have people involved with better understanding cryptographic methods.</p>
jsonlite 0.9.12: now even lighter and faster2014-09-29T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-release-0-9-12
<a href="https://www.opencpu.org/posts/jsonlite-release-0-9-12"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>The <a href="http://cran.rstudio.org/web/packages/jsonlite/index.html">jsonlite</a> package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.12 appeared on CRAN which includes a completely rewritten json parser and more optimized C code for json generation. The new parser is based on <a href="http://lloyd.github.io/yajl/">yajl</a> which is smaller and faster than libjson, and much easier to compile.</p>
<h3 id="error-handling">Error handling</h3>
<p>My favorite feature of yajl is that it gives helpful error messages when parsing invalid JSON, for example:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">fromJSON</span><span class="p">(</span><span class="s1">'[1,2,falsse,4]'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Error in parseJSON(txt) : lexical error: invalid string in json text.</span><span class="w">
</span><span class="c1"># [1,2,falsse,4]</span><span class="w">
</span><span class="c1"># (right here) ------^</span><span class="w">
</span><span class="n">fromJSON</span><span class="p">(</span><span class="s1">'["foo", "bla\nbla"]'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Error in parseJSON(txt) : lexical error: invalid character inside string.</span><span class="w">
</span><span class="c1"># ["foo", "bla bla"]</span><span class="w">
</span><span class="c1"># (right here) ------^</span><span class="w">
</span><span class="n">fromJSON</span><span class="p">(</span><span class="s1">'[1,2,3,4] {}'</span><span class="p">)</span><span class="w">
</span><span class="c1"># Error in parseJSON(txt) : parse error: trailing garbage</span><span class="w">
</span><span class="c1"># [1,2,3,4] {}</span><span class="w">
</span><span class="c1"># (right here) ------^</span></code></pre></figure>
<p>This makes debugging much easier, especially when dealing fast changing dynamic data from the web.</p>
<h3 id="unicode-parsing">Unicode parsing</h3>
<p>The yajl parser always correctly converts escaped unicode sequences into UTF-8 characters:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">fromJSON</span><span class="p">(</span><span class="s1">'["\\u5bff\u53f8","Z\\u00fcrich"]'</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] "寿司" "Zürich"</span></code></pre></figure>
<p>Escaped unicode was already supported in the previous version of jsonlite, however it was expensive and not enabled by default. With yajl we get this for free :-)</p>
<h3 id="integer-parsing">Integer parsing</h3>
<p>Another cool feature is that yajl parses numbers into integers when possible:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="nf">class</span><span class="p">(</span><span class="n">fromJSON</span><span class="p">(</span><span class="s1">'[13,14,15]'</span><span class="p">))</span><span class="w">
</span><span class="c1"># [1] "integer"</span></code></pre></figure>
<h3 id="performance">Performance</h3>
<p>Performance of both parsing and generating JSON has again tremendously improved in this version. Some benchmarks:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">microbenchmark</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="o">=</span><span class="s2">"ggplot2"</span><span class="p">)</span><span class="w">
</span><span class="n">json_rows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">)</span><span class="w">
</span><span class="n">json_columns</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"columns"</span><span class="p">)</span><span class="w">
</span><span class="n">microbenchmark</span><span class="p">(</span><span class="w">
</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">),</span><span class="w">
</span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"columns"</span><span class="p">),</span><span class="w">
</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_rows</span><span class="p">),</span><span class="w">
</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_columns</span><span class="p">),</span><span class="w">
</span><span class="n">times</span><span class="o">=</span><span class="m">10</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: milliseconds</span><span class="w">
</span><span class="c1"># expr min lq median uq max neval</span><span class="w">
</span><span class="c1"># toJSON(diamonds) 587.6984 591.3231 619.1590 630.3588 661.5118 10</span><span class="w">
</span><span class="c1"># toJSON(diamonds, dataframe = "columns") 317.6793 325.3809 330.6444 339.9898 343.7466 10</span><span class="w">
</span><span class="c1"># fromJSON(json_rows) 890.9832 899.3334 939.3230 979.6338 1059.9770 10</span><span class="w">
</span><span class="c1"># fromJSON(json_columns) 188.5764 201.8463 238.1272 279.7607 293.1195 10</span></code></pre></figure>
<p>If we compare this to the <a href="https://www.opencpu.org/posts/jsonlite-release-0-9-11/">previous blog post</a> we can see that generating JSON to row-based data frames (the default) is approx 2x faster than the previous version. Parsing row-based json is about 2.5x faster, and parsing column-based json is almost 5x faster!</p>
<h3 id="streaming-json">Streaming JSON</h3>
<p>Version 0.9.12 introduces some cool streaming functionality. This is a topic in itself and I will blog about this later in the week. Have a look at examples from the <a href="http://demo.ocpu.io/jsonlite/man/stream_in/html"><code class="language-plaintext highlighter-rouge">stream_in</code></a> and <a href="http://demo.ocpu.io/jsonlite/man/stream_in/html"><code class="language-plaintext highlighter-rouge">stream_out</code></a> manual pages till then.</p>
New jsonlite gets a major speed boost!2014-09-06T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-release-0-9-11
<a href="https://www.opencpu.org/posts/jsonlite-release-0-9-11"><img alt="opencpu logo" src="https://www.opencpu.org/images/mariokart.jpg"></a>
<p>The <a href="http://cran.r-project.org/web/packages/jsonlite/">jsonlite</a> package is a JSON parser/generator optimized for the web. It implements a bidirectional mapping between JSON data and the most important R data types, which allows for converting objects to JSON and back without manual data restructuring. This is ideal for interacting with web APIs, or to build pipelines where data seamlessly flow in and out of R through JSON. The <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html">quickstart vignette</a> gives a brief introduction, or just try:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">fromJSON</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">mtcars</span><span class="p">))</span></code></pre></figure>
<p>Or use some data from the web:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Latest commits in r-base</span><span class="w">
</span><span class="n">r_source</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="s2">"https://api.github.com/repos/wch/r-source/commits"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Pretty print:</span><span class="w">
</span><span class="n">committer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">format</span><span class="p">(</span><span class="n">r_source</span><span class="o">$</span><span class="n">commit</span><span class="o">$</span><span class="n">author</span><span class="o">$</span><span class="n">name</span><span class="p">)</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="n">r_source</span><span class="o">$</span><span class="n">commit</span><span class="o">$</span><span class="n">committer</span><span class="o">$</span><span class="n">date</span><span class="p">)</span><span class="w">
</span><span class="n">message</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sub</span><span class="p">(</span><span class="s2">"\n\n.*"</span><span class="p">,</span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">r_source</span><span class="o">$</span><span class="n">commit</span><span class="o">$</span><span class="n">message</span><span class="p">)</span><span class="w">
</span><span class="n">paste</span><span class="p">(</span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">committer</span><span class="p">,</span><span class="w"> </span><span class="n">message</span><span class="p">)</span></code></pre></figure>
<h2 id="new-in-0911-performance">New in 0.9.11: performance!</h2>
<p>Version 0.9.11 has a few minor bugfixes, but most of the work of this release has gone into improving performance. The implementation of <code class="language-plaintext highlighter-rouge">toJSON</code> has been optimized in many ways, and with a little <a href="http://stackoverflow.com/questions/25609174/fast-escaping-deparsing-of-character-vectors-in-r">help</a> from Winston Chang, the most CPU intensive bottleneck has been ported to C code. The result is quite impressive: encoding dataframes to row-based JSON format is about 3x faster, and encoding dataframes to column-based JSON format is nearly 10x faster in comparision with the previous release.</p>
<p>The <a href="https://demo.ocpu.io/ggplot2/data/diamonds">diamonds</a> dataset from the ggplot2 package has about 0.5 million values which makes a nice benchmark. On my macbook it takes jsonlite on average 1.18s to encode it to row-based JSON, and 0.34s for column-based json:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">microbenchmark</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="s2">"diamonds"</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="o">=</span><span class="s2">"ggplot2"</span><span class="p">)</span><span class="w">
</span><span class="n">microbenchmark</span><span class="p">(</span><span class="n">json_rows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="o">=</span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: seconds</span><span class="w">
</span><span class="c1"># expr min lq median uq max neval</span><span class="w">
</span><span class="c1"># toJSON(diamonds) 1.12773 1.140724 1.175872 1.180354 1.21786 10</span><span class="w">
</span><span class="n">microbenchmark</span><span class="p">(</span><span class="n">json_columns</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">toJSON</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">dataframe</span><span class="o">=</span><span class="s2">"col"</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="o">=</span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: milliseconds</span><span class="w">
</span><span class="c1"># expr min lq median uq # max neval</span><span class="w">
</span><span class="c1"># toJSON(diamonds, dataframe = "col") 333.9494 334.799 338.0843 340.0929 350.3026 10</span></code></pre></figure>
<h2 id="parsing-and-simplification-performance">Parsing and simplification performance</h2>
<p>The performance of <code class="language-plaintext highlighter-rouge">fromJSON</code> has been improved as well. The parser itself was already a high performance c++ library that was borrowed from RJSONIO, which has not changed. However the simplification code used to reduce deeply nested lists into nice vectors and data frames has been tweaked in many places and is on average 3 to 5 times faster than before (depending on what the JSON data look like). For the diamonds example, the row-based data gets parsed in about 2.32s and column based data in 1.25s.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">microbenchmark</span><span class="p">(</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_rows</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="o">=</span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: seconds</span><span class="w">
</span><span class="c1"># expr min lq median uq max neval</span><span class="w">
</span><span class="c1"># fromJSON(json_rows) 2.178211 2.278337 2.319519 2.376085 2.423627 10</span><span class="w">
</span><span class="n">microbenchmark</span><span class="p">(</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_columns</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="o">=</span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: seconds</span><span class="w">
</span><span class="c1"># expr min lq median uq max neval</span><span class="w">
</span><span class="c1"># fromJSON(json_columns) 1.17289 1.252284 1.253999 1.265763 1.306357 10</span></code></pre></figure>
<p>For comparison, we can also disable simplification in which case parsing takes respectively 0.70 and 0.39 seconds for these data. However without simplification we end up with a big nested list of lists which is often not very useful.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">microbenchmark</span><span class="p">(</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_rows</span><span class="p">,</span><span class="w"> </span><span class="n">simplifyVector</span><span class="o">=</span><span class="nb">F</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="o">=</span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: milliseconds</span><span class="w">
</span><span class="c1"># expr min lq median uq max neval</span><span class="w">
</span><span class="c1"># fromJSON(json_rows, simplifyVector = F) 635.5767 648.4693 704.6996 720.0335 727.8869 10</span><span class="w">
</span><span class="n">microbenchmark</span><span class="p">(</span><span class="n">fromJSON</span><span class="p">(</span><span class="n">json_columns</span><span class="p">,</span><span class="w"> </span><span class="n">simplifyVector</span><span class="o">=</span><span class="nb">F</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="o">=</span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="c1"># Unit: milliseconds</span><span class="w">
</span><span class="c1"># expr min lq median uq max neval</span><span class="w">
</span><span class="c1"># fromJSON(json_columns, simplifyVector = F) 385.3224 388.4772 395.1916 409.3432 463.9695 10</span></code></pre></figure>
New in OpenCPU 1.4.4: session namespaces2014-08-25T00:00:00+00:00https://www.opencpu.org/posts/opencpu-release-1-4-4
<a href="https://www.opencpu.org/posts/opencpu-release-1-4-4"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>The OpenCPU system exposes an <a href="https://www.opencpu.org/api.html">HTTP API</a> for embedded scientific computing with R. This provides reliable and scalable foundations for integrating R based analysis and visualization modules into pipelines, web applications or big data infrastructures.</p>
<p>This week version 1.4.4 was released on <a href="https://launchpad.net/~opencpu/+archive/ubuntu/opencpu-1.4">Launchpad</a> (Ubuntu), and <a href="http://software.opensuse.org/download.html?project=home%3Ajeroenooms%3Aopencpu-1.4&package=opencpu">OBS</a> (Fedora, SUSE) and <a href="http://cran.r-project.org/web/packages/opencpu/">CRAN</a>.</p>
<h2 id="new-session-namespaces">New: session namespaces</h2>
<p>A new feature in this version is support for session namespaces. Clients can now refer to objects within a temporary session using <code class="language-plaintext highlighter-rouge">sessionid::name</code>. This makes it easier to reuse objects that were created from a script. For example let’s execute the <a href="https://cloud.opencpu.org/ocpu/library/MASS/scripts/ch01.R">ch01.R</a> script which is included with the <a href="https://cloud.opencpu.org/ocpu/library/MASS/scripts">MASS</a> package:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>> curl https://cloud.opencpu.org/ocpu/library/MASS/scripts/ch01.R -X POST
/ocpu/tmp/x05af9fe89a/R/dd
/ocpu/tmp/x05af9fe89a/R/m
/ocpu/tmp/x05af9fe89a/R/std.dev
/ocpu/tmp/x05af9fe89a/R/t.stat
/ocpu/tmp/x05af9fe89a/R/t.test.p
/ocpu/tmp/x05af9fe89a/R/v
/ocpu/tmp/x05af9fe89a/R/z
/ocpu/tmp/x05af9fe89a/stdout
/ocpu/tmp/x05af9fe89a/source
/ocpu/tmp/x05af9fe89a/console
/ocpu/tmp/x05af9fe89a/info
/ocpu/tmp/x05af9fe89a/files/ch01.pdf
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">x05af9fe89a</code> is the temporary session ID, which will be different for every execution. From the output we can see that this script stored 7 objects in the session namespace. To retrieve the <code class="language-plaintext highlighter-rouge">z</code> object in <code class="language-plaintext highlighter-rouge">json</code> format, use:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/tmp/x05af9fe89a/R/z/json?pretty=FALSE
</code></pre></div></div>
<p>But what if we want to reuse <code class="language-plaintext highlighter-rouge">z</code> the object in a subsequent function call? We can now do this using the sesssion namespace. For example, to calculate <code class="language-plaintext highlighter-rouge">stats::sd(x = z)</code>, we need to refer to <code class="language-plaintext highlighter-rouge">x05af9fe89a::z</code> as shown below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://cloud.opencpu.org/ocpu/library/stats/R/sd/json -d x=x05af9fe89a::z
[
1.9368
]
</code></pre></div></div>
<p>This way, we can chain script executions and function calls by passing output objects as arguments to subsequent requests.</p>
<h2 id="function-calls">Function calls</h2>
<p>For remote function calls, you can still use the session id alone to refer to the return object of the function call. For example to calculate <code class="language-plaintext highlighter-rouge">stats::rnorm(n = 5)</code> we do:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>> curl https://cloud.opencpu.org/ocpu/library/stats/R/rnorm -d n=5
/ocpu/tmp/x009f9e7630/R/.val
/ocpu/tmp/x009f9e7630/stdout
/ocpu/tmp/x009f9e7630/source
/ocpu/tmp/x009f9e7630/console
/ocpu/tmp/x009f9e7630/info
</code></pre></div></div>
<p>To calculate the standard deviation of our newly created object, the client can either use <code class="language-plaintext highlighter-rouge">x009f9e7630::.val</code> or simply <code class="language-plaintext highlighter-rouge">x009f9e7630</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://cloud.opencpu.org/ocpu/library/stats/R/sd -d x=x009f9e7630
curl https://cloud.opencpu.org/ocpu/library/stats/R/sd -d x=x009f9e7630::.val
</code></pre></div></div>
<p>The above two requests are equivalent.</p>
CRAN release jsonlite 0.9.10 (RC)2014-08-20T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-release-0-9-10
<a href="https://www.opencpu.org/posts/jsonlite-release-0-9-10"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>The <a href="http://cran.r-project.org/web/packages/jsonlite/">jsonlite</a> package is a JSON parser/generator optimized for the web. It implements a bidirectional mapping between JSON data and the most important R data types. This is very powerful for interacting with web APIs, or to build pipelines where data seamlessly flows in and out of R through JSON without any manual serializing, parsing or data munging.</p>
<p>The jsonlite package is one of the pillars of the <a href="https://www.opencpu.org/">OpenCPU</a> system, which provides an interoperable API to interact with R over HTTP+JSON. However since its release, jsonlite has been adopted by many other projects as well, mostly to grab JSON data from REST APIs in R.</p>
<h2 id="new-in-this-version">New in this version</h2>
<p>Version 0.9.10 includes two new vignettes to get you up and running with JSON and R in a few minutes.</p>
<ul>
<li><a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html">Getting started: Parsing JSON with jsonlite</a></li>
<li><a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-apis.html">Fetching JSON data from REST APIs</a></li>
</ul>
<p>These vignettes show how to get started analyzing data from Twitter, NY Times, Github, NYC CitiBike, ProPublica, Sunlight Foundation and much more, with 2 or 3 lines of R code.</p>
<p>There are also a few <a href="http://cran.r-project.org/web/packages/jsonlite/NEWS">other improvements</a>, most notably support parsing of escaped JSON unicode sequences, which could be important if you are from a country with a non-latin alphabet.</p>
<h2 id="release-candidate">Release candidate</h2>
<p>This is the 10th CRAN version of jsonlite, and we are getting very close to a 1.0 release. By now the package does what it should do, has been tested by many users and all outstanding issues have been addressed. The mapping between JSON data and R classes is described in detail in the <a href="http://arxiv.org/abs/1403.2805">jsonlite paper</a>, and unit tests are available to validate that implementations behave as prescribed for all data and edge cases. Once the version bumps to 1.0, we plan to switch gears and start focussing more on optimizing performance.</p>
Running OpenCPU server on Fedora and Enterprise Linux2014-08-15T00:00:00+00:00https://www.opencpu.org/posts/opencpu-fedora-centos
<a href="https://www.opencpu.org/posts/opencpu-fedora-centos"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>Starting version 1.4.4, the OpenCPU cloud server can run on Redhat distributions, i.e. Fedora and Enterprise Linux (CentOS/RHEL). This post explains how to install and use OpenCPU on these systems. But before continuing I should emphasize that the preferred distribution to run OpenCPU servers is still Ubuntu, which has better support for R than any other server OS. If you would like to run OpenCPU (or other R based software) on a server, you can save yourself lots of time and headaches down the road by wisely choosing your OS. But if you like Redhat, know what you are doing and want to try OpenCPU, this post is for you.</p>
<h2 id="opencpu-rpm-packages">OpenCPU rpm packages</h2>
<p>A spec file and instructions to build the opencpu-server rpm package from source are available from the <a href="https://github.com/jeroenooms/opencpu-server/tree/master/rpm#readme">rpm readme</a> in the Github repository. The build process is very easy and I verified that it works out of the box on Fedora 19, 20 and CentOS 6. For recent versions of Fedora, prebuilt binaries are available from build service, so all you need to do is <a href="https://github.com/jeroenooms/opencpu-server/tree/master/rpm#readme">add the repository</a> and run:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">yum <span class="nb">install </span>opencpu-server</code></pre></figure>
<p>If you find any issues with the rpm packages please report them on the <a href="https://github.com/jeroenooms/opencpu/issues">issues page</a>.</p>
<h2 id="opencpu-and-selinux">OpenCPU and SELinux</h2>
<p>In general, the <code class="language-plaintext highlighter-rouge">opencpu-server</code> rpm package is very similar to the deb one, and most information in the <a href="http://opencpu.github.io/server-manual/opencpu-server.pdf">server manual</a> applies to Fedora/EL the same way as it does to Ubuntu. However one aspect is completely different: security.</p>
<p>Because OpenCPU has no notion of users or privileges, the server relies on Mandatory Access Control (MAC) style security. On Debian and Ubuntu, MAC is available through AppArmor and the opencpu-server package includes customisable apparmor profiles defining policies designed specifically for R and OpenCPU (see also <a href="http://www.jstatsoft.org/v55/i07/">RAppArmor</a>). Redhat distributions on the other hand use SELinux and do not support AppArmor. The SELinux system is more complex and requires a lot of manual effort from the system administrator to configure and maintain security policies on the server (a popular introduction is <a href="http://www.redhat.com/resourcelibrary/videos/selinux-for-mere-mortals">SELinux for Mere Mortals</a>). This is perhaps very powerful if you’re a bank or government agency with a team of dedicated security experts, but otherwise it can be pretty painful.</p>
<p>Because the OpenCPU server builds on rApache (mod_R), it runs by default in the SElinux <code class="language-plaintext highlighter-rouge">httpd_modules_t</code> context. This standard SELinux policy is designed for Apache modules, and prevents most types of malicious use that you would expect from a web service. Running OpenCPU in this context is fine for internal use, but it is not recommended to expose your Fedora/EL OpenCPU server to the web without further fine tuning SELinux for your application. Furthermore, if you experience unexpected persmission denied errors, you probably need to enable some of the <code class="language-plaintext highlighter-rouge">httpd_</code> selinux “booleans”. A boolean in SElinux is the term for a global flag that enables/disables a particular privilege within a particular context. The <a href="http://linux.die.net/man/8/httpd_selinux">httpd_selinux man page</a> lists some important booleans for httpd that you might want to turn on/off.</p>
<p>Some more information is available in the earlier mentioned <a href="https://github.com/jeroenooms/opencpu-server/tree/master/rpm#readme">rpm readme</a>, which I will be updating regularly.</p>
<h2 id="about-r-in-centosrhel">About R in CentOS/RHEL</h2>
<p>The above should get you started on Fedora, but on Enterprise Linux there is another catch. <strong>Officially, Enterprise Linux does not support R!</strong> The standard repositories for CentOS and RHEL do not include the <code class="language-plaintext highlighter-rouge">R-core</code> and <code class="language-plaintext highlighter-rouge">R-devel</code> packages that are available in Fedora. The workaround that is recommended by for example <a href="http://cran.r-project.org/bin/linux/redhat/README">CRAN</a> and <a href="http://www.rstudio.com/products/rstudio/download-server/#tab1ff10494">RStudio</a> is to add the EPEL (Extra Packages for Enterprise Linux) repository, which includes ports of many Fedora packages, including <code class="language-plaintext highlighter-rouge">R-core</code> and <code class="language-plaintext highlighter-rouge">R-devel</code>.</p>
<p>However it is important to realize that packages in EPEL are not frozen: they include whatever is latest on the most recent version of Fedora. This means that each time a new version of Fedora gets released (every 6 months), the latest development versions of all EPEL packages get pushed to your server the next time you run <code class="language-plaintext highlighter-rouge">yum update</code>. This is usually precisely what to avoid on servers. I stress this because I learned this the hard way, when <code class="language-plaintext highlighter-rouge">yum</code> accidentily upgraded R from 2.15 to 3.0, breaking every currently installed package, when all I wanted was security updates.</p>
<p>None of this is a problem on distributions which have native support for R, such as Ubuntu, Debian or Fedora. But if you do decide to use CentOS/RHEL for R based services/applications, make sure you either disable EPEL after installing R, or be very careful with <code class="language-plaintext highlighter-rouge">yum update</code> on long running servers.</p>
Combining pages of JSON data with jsonlite and plyr2014-07-25T00:00:00+00:00https://www.opencpu.org/posts/paging-with-jsonlite
<a href="https://www.opencpu.org/posts/paging-with-jsonlite"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>The <a href="http://cran.r-project.org/web/packages/jsonlite/index.html">jsonlite</a> package is a <code class="language-plaintext highlighter-rouge">JSON</code> parser/generator for R which is optimized for pipelines and web APIs. It is used by the OpenCPU system and many other packages to get data in and out of R using the <code class="language-plaintext highlighter-rouge">JSON</code> format.</p>
<h2 id="a-bidirectional-mapping">A bidirectional mapping</h2>
<p>One of the main strenghts of <code class="language-plaintext highlighter-rouge">jsonlite</code> is that it implements a bidirectional <a href="http://arxiv.org/abs/1403.2805">mapping</a> between data frames and <code class="language-plaintext highlighter-rouge">JSON</code>. Thereby it can convert nested collections of <code class="language-plaintext highlighter-rouge">JSON</code> records, as they often appear on the web, immediately into the appropriate R structures, without complicated manual data munging by the user. For example, if a journalist wants to grab some data from ProPublica, she can simply use something like:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">mydata</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="s2">"http://projects.propublica.org/forensics/geos.json"</span><span class="p">)</span><span class="w">
</span><span class="n">View</span><span class="p">(</span><span class="n">mydata</span><span class="o">$</span><span class="n">geo</span><span class="p">)</span></code></pre></figure>
<p>Here, the <code class="language-plaintext highlighter-rouge">mydata$geo</code> object is a data frame which can be used directly for modeling or visualization, without the need for advanced data minipulation skills.</p>
<h2 id="paging-with-jsonlite-and-plyr">Paging with jsonlite and plyr</h2>
<p>A question that comes up frequently is how to combine pages of data. Most web APIs limit the amount of data that can be retrieved per request. If the client needs more data than what can fits in a single request, it needs to break down the data into multiple requests that each retrieve a fragment (page) of data, not unlike pages in a book. In practice this is often implemented using a <code class="language-plaintext highlighter-rouge">page</code> parameter in the API. Below an example from the <a href="http://projects.propublica.org/nonprofits/api">ProPublica Nonprofit Explorer API</a> where we retrieve the first 3 pages of tax-exempt organizations in the USA, ordered by revenue:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">baseurl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"http://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"</span><span class="w">
</span><span class="n">mydata0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">baseurl</span><span class="p">,</span><span class="w"> </span><span class="s2">"&page=0"</span><span class="p">))</span><span class="w">
</span><span class="n">mydata1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">baseurl</span><span class="p">,</span><span class="w"> </span><span class="s2">"&page=1"</span><span class="p">))</span><span class="w">
</span><span class="n">mydata2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">baseurl</span><span class="p">,</span><span class="w"> </span><span class="s2">"&page=2"</span><span class="p">))</span><span class="w">
</span><span class="c1">#The actual data is in the filings element</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">mydata0</span><span class="o">$</span><span class="n">filings</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">mydata0</span><span class="o">$</span><span class="n">filings</span><span class="o">$</span><span class="n">organization</span><span class="p">)</span></code></pre></figure>
<p>To analyze or visualize these data, we need to combine the pages into a single dataset. This is best done using <code class="language-plaintext highlighter-rouge">rbind.fill</code> from the <code class="language-plaintext highlighter-rouge">plyr</code> package. However because <code class="language-plaintext highlighter-rouge">rbind.fill</code> does not support nested data frames, we need to flatten the <code class="language-plaintext highlighter-rouge">JSON</code> data by passing the <code class="language-plaintext highlighter-rouge">flatten = TRUE</code> argument to <code class="language-plaintext highlighter-rouge">fromJSON</code>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#Note flatten=TRUE requires jsonlite => 0.9.9</span><span class="w">
</span><span class="n">baseurl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"http://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"</span><span class="w">
</span><span class="n">mydata0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">baseurl</span><span class="p">,</span><span class="w"> </span><span class="s2">"&page=0"</span><span class="p">),</span><span class="w"> </span><span class="n">flatten</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">mydata1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">baseurl</span><span class="p">,</span><span class="w"> </span><span class="s2">"&page=1"</span><span class="p">),</span><span class="w"> </span><span class="n">flatten</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">mydata2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">baseurl</span><span class="p">,</span><span class="w"> </span><span class="s2">"&page=2"</span><span class="p">),</span><span class="w"> </span><span class="n">flatten</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#Combine data pages</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">plyr</span><span class="p">)</span><span class="w">
</span><span class="n">filings</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind.fill</span><span class="p">(</span><span class="n">mydata0</span><span class="o">$</span><span class="n">filings</span><span class="p">,</span><span class="w"> </span><span class="n">mydata1</span><span class="o">$</span><span class="n">filings</span><span class="p">,</span><span class="w"> </span><span class="n">mydata2</span><span class="o">$</span><span class="n">filings</span><span class="p">)</span><span class="w">
</span><span class="c1">#Check output</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">filings</span><span class="p">)</span><span class="w">
</span><span class="n">nrow</span><span class="p">(</span><span class="n">filings</span><span class="p">)</span></code></pre></figure>
<h2 id="automatically-combining-many-pages">Automatically combining many pages</h2>
<p>We can write a simple loop that automatically downloads and combines many pages. For example to retrieve the first 20 pages with non-profits from the example above:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#requires jsonlite >= 0.9.9</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="c1">#store all pages in a list first</span><span class="w">
</span><span class="n">baseurl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"http://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"</span><span class="w">
</span><span class="n">pages</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">()</span><span class="w">
</span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span><span class="o">:</span><span class="m">20</span><span class="p">){</span><span class="w">
</span><span class="n">mydata</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">baseurl</span><span class="p">,</span><span class="w"> </span><span class="s2">"&page="</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">),</span><span class="w"> </span><span class="n">flatten</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">message</span><span class="p">(</span><span class="s2">"Retrieving page "</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)</span><span class="w">
</span><span class="n">pages</span><span class="p">[[</span><span class="n">i</span><span class="m">+1</span><span class="p">]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mydata</span><span class="o">$</span><span class="n">filings</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#combine all into one </span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">plyr</span><span class="p">)</span><span class="w">
</span><span class="n">filings</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind.fill</span><span class="p">(</span><span class="n">pages</span><span class="p">)</span><span class="w">
</span><span class="c1">#check output</span><span class="w">
</span><span class="n">nrow</span><span class="p">(</span><span class="n">filings</span><span class="p">)</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">filings</span><span class="p">)</span></code></pre></figure>
<p>From here, our journalist can go straight to analyzing the data without any further tedious, complicated and time consuming data manipulation.</p>
Recording of OpenCPU talk at #useR20142014-07-09T00:00:00+00:00https://www.opencpu.org/posts/user2014-recording
<a href="https://www.opencpu.org/posts/user2014-recording"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>A <a href="https://www.youtube.com/watch?v=kAfVWxiZ-Cc">recording</a> of the useR! 2014 prentation about OpenCPU is now available on Youtube. This talk gives a brief (20 minute) motivation and introduction to some of the high level concepts of the OpenCPU system. The video contains mostly screen recording, mixed with some AV footage provided by <a href="https://twitter.com/timothy_phan">Timothy Phan</a> (thanks!).</p>
<div class="videoWrapper">
<iframe src="//www.youtube.com/embed/kAfVWxiZ-Cc" frameborder="0" allowfullscreen=""></iframe>
</div>
The future of R on the web at #user20142014-06-27T00:00:00+00:00https://www.opencpu.org/posts/opencpu-at-user2014
<a href="https://www.opencpu.org/posts/opencpu-at-user2014"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>The schedule and abstracts for <a href="http://user2014.stat.ucla.edu/">useR! 2014</a> have been posted on the conference website. Session 2 (tuesday 1pm) of the Kaleidoscope track will feature a fantastic set of talks about R and the web, including <a href="https://github.com/att/rcloud"><code class="language-plaintext highlighter-rouge">RCloud</code></a> (Gordon Woodhull, AT&T), <a href="https://www.opencpu.org"><code class="language-plaintext highlighter-rouge">OpenCPU</code></a> (Jeroen Ooms, UCLA), <a href="http://shiny.rstudio.com/"><code class="language-plaintext highlighter-rouge">Shiny</code></a> (Joe Cheng, RStudio) and <a href="http://ropensci.org/"><code class="language-plaintext highlighter-rouge">rOpenSci</code></a> (Karthik Ram, UC Berkeley).</p>
<p>The presentation about <code class="language-plaintext highlighter-rouge">OpenCPU</code> will be a high level introduction and go over some of the concepts from the recent <a href="http://arxiv.org/abs/1406.4806">whitepaper</a>. The <a href="http://user2014.stat.ucla.edu/abstracts/talks/209_Ooms.pdf">abstract</a> and <a href="http://jeroenooms.github.io/opencpu-slides/">slides</a> are available from the website. Update: a recording of the presentation is available below.</p>
<div class="videoWrapper">
<iframe src="//www.youtube.com/embed/kAfVWxiZ-Cc" frameborder="0" allowfullscreen=""></iframe>
</div>
Deploying a scoring engine for predictive analytics with OpenCPU2014-06-23T00:00:00+00:00https://www.opencpu.org/posts/scoring-engine
<a href="https://www.opencpu.org/posts/scoring-engine"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p><strong>TLDR/abstract:</strong> See the <a href="https://demo.ocpu.io/tvscore/www/">tvscore demo app</a> or <a href="http://jsfiddle.net/opencpu/WVWCR/">this jsfiddle</a> for all of this in action.</p>
<p>This post explains how to use the OpenCPU system to setup a scoring engine for calculating real time predictions. In our example we use the <a href="http://stat.ethz.ch/R-manual/R-patched/library/mgcv/html/predict.gam.html">predict.gam</a> function from the <code class="language-plaintext highlighter-rouge">mgcv</code> package to make predictions based on a generalized additive model. The entire process consists of four steps:</p>
<ol>
<li>Building a model</li>
<li>Create an R package containing the model and a scoring function</li>
<li>Install the package on your OpenCPU server</li>
<li>Remotely call the scoring function through the OpenCPU API</li>
</ol>
<p>Let’s get started!</p>
<h2 id="step-1-creating-a-model">Step 1: creating a model</h2>
<p>For this example, we use data from the <a href="http://www3.norc.org/GSS+Website/">General Social Survey</a>, which is a very rich dataset on demographic characteristics and attitudes of United States residents. To load the data in R:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#Data info: http://www3.norc.org/GSS+Website/Download/SPSS+Format/</span><span class="w">
</span><span class="n">download.file</span><span class="p">(</span><span class="s2">"http://publicdata.norc.org/GSS/DOCUMENTS/OTHR/2012_spss.zip"</span><span class="p">,</span><span class="w"> </span><span class="n">destfile</span><span class="o">=</span><span class="s2">"2012_spss.zip"</span><span class="p">)</span><span class="w">
</span><span class="n">unzip</span><span class="p">(</span><span class="s2">"2012_spss.zip"</span><span class="p">)</span><span class="w">
</span><span class="n">GSS</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">foreign</span><span class="o">::</span><span class="n">read.spss</span><span class="p">(</span><span class="s2">"GSS2012.sav"</span><span class="p">,</span><span class="w"> </span><span class="n">to.data.frame</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span></code></pre></figure>
<p>The GSS data has 1974 rows for 816 variables. To keep our example simple, we create a model with only 2 predictor variables. The code below fits a GAM which predicts the average number of hours per day that a person watches TV, based on their age and marital status. In these data <code class="language-plaintext highlighter-rouge">tvhours</code> and <code class="language-plaintext highlighter-rouge">age</code> are numeric variables, whereas <code class="language-plaintext highlighter-rouge">marital</code> is categorical (factor) variable with levels <code class="language-plaintext highlighter-rouge">MARRIED</code>, <code class="language-plaintext highlighter-rouge">SEPARATED</code>,<code class="language-plaintext highlighter-rouge">DIVORCED</code>, <code class="language-plaintext highlighter-rouge">WIDOWED</code> and <code class="language-plaintext highlighter-rouge">NEVER MARRIED</code>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#Variable info: http://www3.norc.org/GSS+Website/Browse+GSS+Variables/Mnemonic+Index/</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">mgcv</span><span class="p">)</span><span class="w">
</span><span class="n">mydata</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">na.omit</span><span class="p">(</span><span class="n">GSS</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="s2">"age"</span><span class="p">,</span><span class="w"> </span><span class="s2">"tvhours"</span><span class="p">,</span><span class="w"> </span><span class="s2">"marital"</span><span class="p">)])</span><span class="w">
</span><span class="n">tv_model</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gam</span><span class="p">(</span><span class="n">tvhours</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="o">=</span><span class="n">marital</span><span class="p">),</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mydata</span><span class="p">)</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">predict</code> function is used to score data against the model. We test with some random cases:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">newdata</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">24</span><span class="p">,</span><span class="w"> </span><span class="m">54</span><span class="p">,</span><span class="w"> </span><span class="m">32</span><span class="p">,</span><span class="w"> </span><span class="m">75</span><span class="p">),</span><span class="w">
</span><span class="n">marital</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"MARRIED"</span><span class="p">,</span><span class="w"> </span><span class="s2">"DIVORCED"</span><span class="p">,</span><span class="w"> </span><span class="s2">"WIDOWED"</span><span class="p">,</span><span class="w"> </span><span class="s2">"NEVER MARRIED"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">predict</span><span class="p">(</span><span class="n">tv_model</span><span class="p">,</span><span class="w"> </span><span class="n">newdata</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newdata</span><span class="p">)</span><span class="w">
</span><span class="m">1</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">4</span><span class="w">
</span><span class="m">3.022650</span><span class="w"> </span><span class="m">3.693640</span><span class="w"> </span><span class="m">1.556342</span><span class="w"> </span><span class="m">3.665077</span><span class="w"> </span></code></pre></figure>
<p>All seems good, this completes step 1. But just to get a sense of what our example model actually looks like before we start scoring, a simple viz:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">qplot</span><span class="p">(</span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">tv_model</span><span class="p">),</span><span class="w"> </span><span class="n">color</span><span class="o">=</span><span class="n">marital</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="o">=</span><span class="s2">"line"</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="o">=</span><span class="n">mydata</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">ggtitle</span><span class="p">(</span><span class="s2">"gam(tvhours ~ s(age, by=marital))"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Average hours of TV per day"</span><span class="p">)</span></code></pre></figure>
<p><img src="https://raw.githubusercontent.com/opencpu/tvscore/master/inst/tv/viz.png" class="img-responsive" /></p>
<p>Seems like people that get married start watching less TV, who would have thought :-) In a real study we should probably tune the smoothing a bit and add parenting as predictor (also in the data), but for simplicity we’ll stick with this model for now.</p>
<h2 id="step-2-creating-a-package">Step 2: creating a package</h2>
<p>In order to score cases via the OpenCPU API, we need to turn the model into an R package. Making R packages is very easy these days, especially when using RStudio. Our package needs to contain at least two things: the <code class="language-plaintext highlighter-rouge">tv_model</code> object that we created above, and a wrapper function that calls out to <code class="language-plaintext highlighter-rouge">predict(tv_model, ...)</code>. You can make the wrapper as simple or sophisticated as you like, based on the type of input and output data that you want to send/receive from your scoring engine.</p>
<p>The <a href="https://github.com/opencpu/tvscore"><code class="language-plaintext highlighter-rouge">tvscore</code></a> package that is available from the <a href="https://github.com/opencpu">opencpu github repository</a> is an example of such a package. The important thing to note is that the <a href="https://github.com/opencpu/tvscore/tree/master/data"><code class="language-plaintext highlighter-rouge">tv_model</code></a> object is included in the <code class="language-plaintext highlighter-rouge">data</code> directory of the package. Saving objects to a file is done using the <code class="language-plaintext highlighter-rouge">save</code> function in R:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#Store the model as a data object</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">tv_model</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="o">=</span><span class="s2">"data/tv_model.rda"</span><span class="p">)</span></code></pre></figure>
<p>To load the model with the package, we can either set <code class="language-plaintext highlighter-rouge">LazyData=true</code> in the package <a href="https://github.com/opencpu/tvscore/blob/master/DESCRIPTION">DESCRIPTION</a>, or manually load it using the <code class="language-plaintext highlighter-rouge">data()</code> function in R. For details on including data in R packages, see <a href="http://cran.r-project.org/doc/manuals/R-exts.html#Data-in-packages">section 1.1.6 of writing R extensions</a>.</p>
<p>Finally the package contains a scoring function called <a href="https://github.com/opencpu/tvscore/blob/master/R/tv.R"><code class="language-plaintext highlighter-rouge">tv</code></a>, which calls out to <code class="language-plaintext highlighter-rouge">predict.gam</code>. The scoring function is what clients will call remotely through the OpenCPU API. We use a smart function that supports both data frames as well as CSV files for input:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">tv</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">input</span><span class="p">){</span><span class="w">
</span><span class="c1">#input can either be csv file or data </span><span class="w">
</span><span class="n">newdata</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="nf">is.character</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">file.exists</span><span class="p">(</span><span class="n">input</span><span class="p">)){</span><span class="w">
</span><span class="n">read.csv</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">stopifnot</span><span class="p">(</span><span class="s2">"age"</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">newdata</span><span class="p">))</span><span class="w">
</span><span class="n">stopifnot</span><span class="p">(</span><span class="s2">"marital"</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">newdata</span><span class="p">))</span><span class="w">
</span><span class="n">newdata</span><span class="o">$</span><span class="n">age</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">newdata</span><span class="o">$</span><span class="n">age</span><span class="p">)</span><span class="w">
</span><span class="c1">#tv_model is included with the package</span><span class="w">
</span><span class="n">newdata</span><span class="o">$</span><span class="n">tv</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">predict.gam</span><span class="p">(</span><span class="n">tv_model</span><span class="p">,</span><span class="w"> </span><span class="n">newdata</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newdata</span><span class="p">)</span><span class="w">
</span><span class="nf">return</span><span class="p">(</span><span class="n">newdata</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Note how the function does a bit of input validation by checking that the <code class="language-plaintext highlighter-rouge">age</code> and <code class="language-plaintext highlighter-rouge">marital</code> columns are present. As usual, the <a href="https://github.com/opencpu/tvscore/blob/master/R/tv.R"><code class="language-plaintext highlighter-rouge">tv</code></a> function is saved in the <a href="https://github.com/opencpu/tvscore/blob/master/R"><code class="language-plaintext highlighter-rouge">R</code></a> directory of the <a href="https://github.com/opencpu/tvscore">source package</a>. Install the package locally to verify that it works as expected in a clean R session. To install our example package from github, restart R and do:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#install the tv score package</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"opencpu/tvscore"</span><span class="p">)</span></code></pre></figure>
<p>First we test the <code class="language-plaintext highlighter-rouge">tv</code> function with data frame input:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#test scoring with data frame input</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tvscore</span><span class="p">)</span><span class="w">
</span><span class="n">newdata</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">24</span><span class="p">,</span><span class="w"> </span><span class="m">54</span><span class="p">,</span><span class="w"> </span><span class="m">32</span><span class="p">,</span><span class="w"> </span><span class="m">75</span><span class="p">),</span><span class="w">
</span><span class="n">marital</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"MARRIED"</span><span class="p">,</span><span class="w"> </span><span class="s2">"DIVORCED"</span><span class="p">,</span><span class="w"> </span><span class="s2">"WIDOWED"</span><span class="p">,</span><span class="w"> </span><span class="s2">"NEVER MARRIED"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">tv</span><span class="p">(</span><span class="n">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newdata</span><span class="p">)</span></code></pre></figure>
<p>And then we test if it works for CSV data:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#test scoring with CSV file input</span><span class="w">
</span><span class="n">setwd</span><span class="p">(</span><span class="n">tempdir</span><span class="p">())</span><span class="w">
</span><span class="n">write.csv</span><span class="p">(</span><span class="n">newdata</span><span class="p">,</span><span class="w"> </span><span class="s2">"testdata.csv"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tvscore</span><span class="p">)</span><span class="w">
</span><span class="n">tv</span><span class="p">(</span><span class="n">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"testdata.csv"</span><span class="p">)</span></code></pre></figure>
<p>If all of this works as expected, the package is ready to be deployed on your OpenCPU server!</p>
<h2 id="step-3-install-the-package-on-the-server">Step 3: Install the package on the server</h2>
<p>To deploy your scoring engine, simply install the package on your OpenCPU server. If you are running the OpenCPU cloud server, make sure to install your package as root. For example if you built the package into a <code class="language-plaintext highlighter-rouge">tar.gz</code> archive:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo</span> <span class="nt">-i</span>
R CMD INSTALL tvscore_0.1.tar.gz</code></pre></figure>
<p>To install our example package straight from R, either on an OpenCPU cloud server or OpenCPU single-user server:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#install the tv score package</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"opencpu/tvscore"</span><span class="p">)</span></code></pre></figure>
<p>If you are running the cloud server, you are done with this step. If you are running the single-user server, start OpenCPU using:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">opencpu</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">()</span></code></pre></figure>
<p>To verify that the installation succeeded, open your browser and navigate to the <a href="https://cloud.opencpu.org/ocpu/library/tvscore/"><code>/ocpu/library/tvscore</code></a> path on the OpenCPU server. Also have a look at <a href="https://cloud.opencpu.org/ocpu/library/tvscore/R/tv"><code>/ocpu/library/tvscore/R/tv</code></a> and <a href="https://cloud.opencpu.org/ocpu/library/tvscore/man/tv"><code>/ocpu/library/tvscore/man/tv</code></a>.</p>
<h2 id="step-4-scoring-through-the-api">Step 4: Scoring through the API</h2>
<p>Once the package is installed on the server, we can remotely call the <code class="language-plaintext highlighter-rouge">tv</code> function via the OpenCPU API. In the examples below we use the public demo server: <code>https://cloud.opencpu.org/</code>. For example, to call the <code class="language-plaintext highlighter-rouge">tv</code> function with <code class="language-plaintext highlighter-rouge">curl</code> using basic <a href="https://www.opencpu.org/api.html#api-json">JSON RPC</a>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/tvscore/R/tv/json <span class="se">\</span>
<span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span>
<span class="nt">-d</span> <span class="s1">'{"input" : [ {"age":26, "marital" : "MARRIED"}, {"age":41, "marital":"DIVORCED"}, {"age":53, "marital":"NEVER MARRIED"} ]}'</span></code></pre></figure>
<p>Note how the OpenCPU server automatically converts input and output data from/to JSON using <a href="http://arxiv.org/pdf/1403.2805v1.pdf"><code class="language-plaintext highlighter-rouge">jsonlite</code></a>. See the <a href="https://www.opencpu.org/api.html#api-json">API docs</a> for more details on this process. Alternatively we can batch score by posting a CSV file (<a href="https://cloud.opencpu.org/ocpu/library/tvscore/tv/testdata.csv">example data</a>)</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/tvscore/R/tv <span class="nt">-F</span> <span class="s2">"input=@testdata.csv"</span></code></pre></figure>
<p>The response to a successful HTTP POST request contains the location of the output data in the <code class="language-plaintext highlighter-rouge">Location</code> header. For example if the call returned a HTTP 201 with <code class="language-plaintext highlighter-rouge">Location</code> header <code class="language-plaintext highlighter-rouge">/ocpu/tmp/x036bf30e82</code>, the client can retrieve the output data in various formats using a subsequent HTTP GET request:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/tmp/x036bf30e82/R/.val/csv
curl https://cloud.opencpu.org/ocpu/tmp/x036bf30e82/R/.val/json
curl https://cloud.opencpu.org/ocpu/tmp/x036bf30e82/R/.val/tab</code></pre></figure>
<p>This completes our scoring engine. Using these steps, clients from any language can remotely score cases by calling the <code class="language-plaintext highlighter-rouge">tv</code> function using standard <code class="language-plaintext highlighter-rouge">HTTP</code> and <code class="language-plaintext highlighter-rouge">JSON</code> libraries.</p>
<h2 id="extra-credit-performance-optimization">Extra credit: performance optimization</h2>
<p>When using a scoring engine based on OpenCPU in production, it is worthwile configuring your server to optimize performance. In particular, we can add our package to the <code class="language-plaintext highlighter-rouge">preload</code> field in the <code class="language-plaintext highlighter-rouge">/etc/opencpu/server.conf</code> file on the OpenCPU cloud server. This will automatically load (but not attach) the package when the OpenCPU server starts, which eliminates package loading time from the individual scoring requests. In our example this is important because <code class="language-plaintext highlighter-rouge">tvscore</code> depends on the <code class="language-plaintext highlighter-rouge">mgcv</code> package, which takes about 2 seconds to load.</p>
<p>Note that R does <em>not</em> load LazyData objects when the package loads. Hence, <code class="language-plaintext highlighter-rouge">preload</code> in combination with lazy loading of data might not have the desired effect. When using <code class="language-plaintext highlighter-rouge">preload</code>, make sure to design your package such that all data gets loaded when the package loads <a href="https://github.com/opencpu/tvscore/blob/master/R/onLoad.R">(example)</a>.</p>
<p>Finally in production you might want to tweak the <code class="language-plaintext highlighter-rouge">timelimit.post</code> (timeout), <code class="language-plaintext highlighter-rouge">rlimit.as</code> (mem limit), <code class="language-plaintext highlighter-rouge">rlimit.fsize</code> (disk limit) and <code class="language-plaintext highlighter-rouge">rlimit.nproc</code> (parallel process limit) options in <code class="language-plaintext highlighter-rouge">/etc/opencpu/server.conf</code> to fit your needs. Also see the <a href="https://opencpu.github.io/server-manual/opencpu-server.pdf">server manual</a> on this topic.</p>
<h2 id="bonus-creating-an-opencpu-app">Bonus: creating an OpenCPU app</h2>
<p>By including web pages in the <a href="https://github.com/opencpu/tvscore/tree/master/inst/www"><code class="language-plaintext highlighter-rouge">/inst/www/</code></a> directory of the source package, we can turn our scoring engine into a standalone web application. The <a href="https://github.com/opencpu/tvscore"><code class="language-plaintext highlighter-rouge">tvscore</code></a> example package contains a simple web interface that makes use of the <a href="https://www.opencpu.org/jslib.html">opencpu.js</a> JavaScript client to interact with R via OpenCPU in the browser. Navigate to <a href="https://cloud.opencpu.org/ocpu/library/tvscore/www">/ocpu/library/tvscore/www/</a> on the public demo server to see it in action!</p>
<p>To install and run the same app in your local R session, use:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#Install the app</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"opencpu/tvscore"</span><span class="p">)</span><span class="w">
</span><span class="c1">#Load the app</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">opencpu</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/tvscore/www"</span><span class="p">)</span></code></pre></figure>
<p>We can also call the OpenCPU server from an external website using cross domain ajax requests (CORS). See <a href="http://jsfiddle.net/opencpu/WVWCR/">this jsfiddle</a> for a simple example that calls the public server using the <code class="language-plaintext highlighter-rouge">ocpu.rpc</code> function from <code class="language-plaintext highlighter-rouge">opencpu.js</code>.</p>
OpenCPU whitepaper published on arXiv2014-06-20T00:00:00+00:00https://www.opencpu.org/posts/opencpu-article-arxiv
<a href="https://www.opencpu.org/posts/opencpu-article-arxiv"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>This week a new paper appeared on <a href="http://arxiv.org/a/ooms_j_1">arXiv</a> titled: <a href="http://arxiv.org/abs/1406.4806"><em>The OpenCPU System: Towards a Universal Interface for Scientific Computing through Separation of Concerns</em></a>. It is based on a chapter of my thesis and provides a conceptual introduction to embedded scientific computing and the OpenCPU system.</p>
<p>The article deliberately does not describe any software specifics. Instead, it takes a high-level view and discusses domain logic of scientific computing, the benefits of using a standardized application protocol to interface statistical methods, and the importance of clearly separating statistical computing from application and implementation logic. The R software and OpenCPU API are used to illustrate the advocated approach. However, it is emphasized that the API is designed to describe general logic of data analysis rather than that of a particular language, and the system should generalize quite naturally to other computational back-ends, such as Julia, Python or Matlab.</p>
<p>This paper is an accumulation of many experiences with building statistical web applications in academic and industry organizations over the past years. I hope it will be a good read for anyone who wishes to build stacks, applications, and pipelines with integrated analysis/visualization components, with or without OpenCPU.</p>
<p>Go and grab the (open access) <a href="http://arxiv.org/pdf/1406.4806v1.pdf">pdf</a> from arXiv!</p>
OpenCPU Gem for Ruby2014-05-22T00:00:00+00:00https://www.opencpu.org/posts/opencpu-ruby-gem
<a href="https://www.opencpu.org/posts/opencpu-ruby-gem"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>The guys from <a href="http://roqua.nl/">roqua.nl</a> are working on a <a href="https://github.com/roqua/opencpu/">OpenCPU wrapper Gem</a>. This simple API client provides a pretty nice basis for building R web applications with Ruby. A minimal example from the readme:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span class="n">client</span><span class="p">.</span><span class="nf">execute</span> <span class="ss">:digest</span><span class="p">,</span> <span class="ss">:hmac</span><span class="p">,</span> <span class="p">{</span> <span class="ss">key: </span><span class="s1">'foo'</span><span class="p">,</span> <span class="ss">object: </span><span class="s1">'bar'</span><span class="p">,</span> <span class="ss">algo: </span><span class="s1">'md5'</span> <span class="p">}</span>
<span class="c1"># => ['0c7a250281315ab863549f66cd8a3a53']</span></code></pre></figure>
<p>Which performs the following JSON RPC request:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">digest</span><span class="o">::</span><span class="n">hmac</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="s2">"foo"</span><span class="p">,</span><span class="w"> </span><span class="n">object</span><span class="o">=</span><span class="s2">"bar"</span><span class="p">,</span><span class="w"> </span><span class="n">algo</span><span class="o">=</span><span class="s2">"md5"</span><span class="p">)</span></code></pre></figure>
<p>They are accepting <a href="https://github.com/roqua/opencpu/#contributing">pull requests</a>!</p>
OpenCPU release 1.3 and 1.42014-04-20T00:00:00+00:00https://www.opencpu.org/posts/opencpu-release-14
<a href="https://www.opencpu.org/posts/opencpu-release-14"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>After a few months of testing we present OpenCPU versions 1.3 and 1.4. These releases do not introduce any major changes in the <a href="../../api.html">OpenCPU HTTP API</a> but focus entirely on performance, reliability and security to support long running servers. The only minor API change in the <a href="../getting-ready-for-opencpu130/">switch to absolute URLs</a> in the location header. Upgrading from OpenCPU 1.2 should be painless and is recommended.</p>
<p>These and future releases of the OpenCPU cloud server will target <code>Ubuntu 14.04</code> in order to take advantage of recent features in <code>R</code>, <code>Apache2</code>, <code>AppArmor</code> and <code>nginx</code>. Because this is a Long Term Support (LTS) Ubuntu release it includes 5 years of updates. Hence your OpenCPU server can run safely until April 2019 (or until you decide to upgrade).</p>
<h2 id="version-13-versus-14">Version 1.3 versus 1.4</h2>
<p>OpenCPU versions 1.3 and 1.4 build on exactly the same version of the HTTP API and server code. The only difference is the version of R that is used in the cloud server. OpenCPU version 1.3 uses <code>R 3.0.2</code> included with Ubuntu, whereas OpenCPU version 1.4 uses the current version: <code>R 3.1.0</code>.</p>
<p>If you have no preference, OpenCPU 1.4 is recommended because many of the packages on <code>CRAN</code> require the <i>current</i> version of <code>R</code> and will therefore only work with OpenCPU 1.4.</p>
<h2 id="how-to-upgrade">How to upgrade</h2>
<p>Because of some internal cleanup and refactoring of configuration files, it is highly recommended to install the new version of OpenCPU on a clean fresh Ubuntu 14.04 server. Usually installing a new Ubuntu server is safer and quicker than upgrading and old server anyway. See the <a href="https://opencpu.github.io/server-manual/opencpu-server.pdf">Server Manual</a> for standard instructions on a clean installation.</p>
<p>However if for whatever reason you need to upgrade a previous installation, the safest way is to uninstall previous versions before installing the new one. This ensures that no old files keep lingering around.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># remove old versions</span>
<span class="nb">sudo </span>apt-get purge opencpu-<span class="k">*</span>
<span class="nb">sudo </span>apt-get autoremove <span class="nt">--purge</span>
<span class="c"># upgrade Ubuntu to 14.04 (if not done so yet)</span>
<span class="nb">sudo </span><span class="k">do</span><span class="nt">-release-upgrade</span>
<span class="c"># install new version on Ubuntu 14.04</span>
<span class="nb">sudo </span>add-apt-repository opencpu/opencpu-1.4
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get <span class="nb">install </span>opencpu</code></pre></figure>
<h2 id="opencpu-and-rstudio">OpenCPU and RStudio</h2>
<p>Using OpenCPU together with RStudio is now even easier! The <code>opencpu-1.3</code> and <code>opencpu-1.4</code> repositories include a copy of rstudio server that you can install with a single line:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># install rstudio</span>
<span class="nb">sudo </span>apt-get <span class="nb">install </span>rstudio-server</code></pre></figure>
<p>Both apache and nginx are preconfigured to proxy the <code>/rstudio/</code> path to rstudio. Hence after installing both opencpu and rstudio-server they can be accessed directly through:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://your.server.com/ocpu/
https://your.server.com/rstudio/
</code></pre></div></div>
<p>Appendix B of the <a href="https://opencpu.github.io/server-manual/opencpu-server.pdf">OpenCPU Server Manual</a> has some more details.</p>
<h2 id="questions">Questions</h2>
<p>If you have any problems, questions, feedback or suggestions feel free to send an email on the <a href="../../help.html">mailing list</a> or open an issue on github. As is the case for many open source projects, good software comes with terrible documentation. But if anything is not working or unclear please do let me know; it is probably something small.</p>
Getting ready for OpenCPU 1.32014-03-17T00:00:00+00:00https://www.opencpu.org/posts/getting-ready-for-opencpu130
<a href="https://www.opencpu.org/posts/getting-ready-for-opencpu130"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>The OpenCPU <a href="../../demo.html">public demo server</a> and <a href="https://demo.ocpu.io">ocpu.io</a> have been upgraded to an early version of the upcoming OpenCPU 1.3 release. This release is scheduled for April 17 along with Ubuntu 14.04 (Trusty). By deploying it on the public demo server we get some testing before the actual release. Please report any problems.</p>
<h2 id="new-in-opencpu-13">New in OpenCPU 1.3</h2>
<p>The improvements in this release are mostly internal. However there will be one subtle change: starting version 1.3, all <a href="../../api.html">HTTP API</a> responses with status code <code>201</code>, <code>301</code> or <code>302</code> will use an <b>absolute url</b> in the <code>Location</code> response header. For example, the response headers of a request could contain:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
Date: Mon, 17 Mar 2014 06:59:26 GMT
Location: http://cloud.opencpu.org/ocpu/tmp/x0e28afb7/
Content-Length: 44
...
</code></pre></div></div>
<p>Whereas in previous versions, the same response would have looked like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
Date: Mon, 17 Mar 2014 06:59:26 GMT
Location: /ocpu/tmp/x0e28afb7/
Content-Length: 44
...
</code></pre></div></div>
<p>However to scale up to distributed environments where resources can be hosted on various servers, we need to start using absolute URLs.</p>
<h2 id="how-to-update-my-clientapp">How to update my client/app?</h2>
<p>Most HTTP clients natively understand both absolute and relative urls, so you probably won’t notice the difference. For example the <a href="../../jslib.html">opencpu.js</a> client library requires no changes or updates. However for the few of you that implemented a custom OpenCPU client, you might want to double check that your code understands both absolute and relative urls in the <code>Location</code> header, to make sure your application will be compatible with future versions of OpenCPU.</p>
OpenCPU 1.2.3 release2014-03-12T00:00:00+00:00https://www.opencpu.org/posts/opencpu-version-123
<a href="https://www.opencpu.org/posts/opencpu-version-123"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>A new version of <a href="http://www.opencpu.org/">OpenCPU</a> was released to <a href="http://cran.r-project.org/web/packages/opencpu">CRAN</a> and <a href="https://launchpad.net/~opencpu/+archive/opencpu-1.2">Launchpad</a>. Besides some minor bugfixes, the single-user has better support for configuration. By default, the single-user server will now load configuration from the following file:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">path.expand</span><span class="p">(</span><span class="s2">"~/.opencpu.conf"</span><span class="p">)</span></code></pre></figure>
<p>If this file does not exist, the default configuration is used.</p>
<h2 id="future-plans">Future plans</h2>
<p>This is likely the final release in the 1.2 series. Future versions of OpenCPU will be targeting <code>R 3.1</code> and <code>Ubuntu 14.04</code> (both to be released in April), and the version number will be bumped to emphasize this.</p>
<p>No changes in the API are scheduled. Future work will focus on improving performance, documentation and client libraries.</p>
Release of jsonlite 0.9.42014-03-02T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-version-094
<a href="https://www.opencpu.org/posts/jsonlite-version-094"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>A new version of the <a href="http://cran.r-project.org/web/packages/jsonlite">jsonlite package</a> was released to CRAN. In addition to adding small new features, this release cleans up code and documentation. Some annoying compiler warnings inherited from <code>RJSONIO</code> are fixed and the <a href="http://cran.r-project.org/web/packages/jsonlite/jsonlite.pdf">reference manual</a> is a bit more concise. Also some new examples of public JSON APIs were added to the <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">package vignette</a>. These are great to see the power of <code>jsonlite</code> in action when working with real world JSON structures.</p>
<h2 id="what-is-jsonlite-again">What is jsonlite again?</h2>
<p>The <code>jsonlite</code> package is a fork of <code>RJSONIO</code>. It builds on the same libjson c++ parser (although a more recent version), but implements a different system for converting between R objects and JSON structures. The most powerful feature is the option to automatically convert tabular JSON structures into R data frames and vice versa. Tabular structures are very common in <code>JSON</code> data, but usually difficult to read and manipulate. By automatically turning these into data frames <code>jsonlite</code> can save you many hours and bugs in getting your <code>JSON</code> data in and out of R. This <a href="../jsonlite-a-smarter-json-encoder/">blog post</a> has some nice examples with data from the Github API.</p>
<h2 id="new-in-this-release">New in this release</h2>
<p>Two new functions were introduced in this release. The <code>minify</code> function is the opposite of <code>prettify</code>, and reduces the size of a <code>JSON</code> blob by removing all redundant whitespace.</p>
<p>The new <code>unbox</code> function was requested several users. It can be used to force atomic vectors of length 1 to be encoded as a <code>JSON</code> <b>scalar</b> rather than an <b>array</b>. To understand why this should not be default behavior, see the <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">vignette</a> or this <a href="https://github.com/jeroenooms/jsonlite/issues/6">github issue</a>. However it can be useful to do this for individual object elements:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">cat</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">foo</span><span class="o">=</span><span class="m">123</span><span class="p">)))</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="s2">"foo"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="m">123</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">cat</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">foo</span><span class="o">=</span><span class="n">unbox</span><span class="p">(</span><span class="m">123</span><span class="p">))))</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="s2">"foo"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="m">123</span><span class="w"> </span><span class="p">}</span></code></pre></figure>
<p>In the context of a script or function, the <code>unbox</code> function should only be used for elements that are always exactly length 1, otherwise <code>unbox</code> will throw an error. This is to protect you from writing code that generates inconsistent <code>JSON</code> i.e. an array one time and a scalar another time.</p>
<p>The same <code>unbox</code> function can be used for data frames with exactly 1 row:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">mycar</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cars</span><span class="p">[</span><span class="m">23</span><span class="p">,]</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">cat</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">mycar</span><span class="p">))</span><span class="w">
</span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"speed"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="m">14</span><span class="p">,</span><span class="w"> </span><span class="s2">"dist"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="m">80</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">cat</span><span class="p">(</span><span class="n">toJSON</span><span class="p">(</span><span class="n">unbox</span><span class="p">(</span><span class="n">mycar</span><span class="p">)))</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="s2">"speed"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="m">14</span><span class="p">,</span><span class="w"> </span><span class="s2">"dist"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="m">80</span><span class="w"> </span><span class="p">}</span></code></pre></figure>
<p>But again this should be used sparsely and with care. When in doubt, always stick with the default <code>toJSON</code> encodings.</p>
Publishing dynamic data on ocpu.io2014-02-16T00:00:00+00:00https://www.opencpu.org/posts/publishing-data-with-opencpu
<a href="https://www.opencpu.org/posts/publishing-data-with-opencpu"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>Suppose you would like to publish some data, for example to accompany a journal article. One way would be to put a <code>CSV</code> file on your website, and share the URL with your colleagues. However CSV has many limitations: it only works for tabular structures, has limited type safety (pretty much everything gets coersed into strings) and leads to loss of numeric precision.</p>
<p>There are many alternative data interchange formats, each with their own benefits and limitations. For example <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">JSON</a> is widely supported and can be parsed in almost any language, however it can be verbose and slow. A binary format such as <a href="http://arxiv.org/abs/1401.7372">Protocol Buffers</a> is more efficient, but many users might not know how to parse it. You could even use <code>save</code> or <code>saveRDS</code> in R to share the native R structures, however this limits your audience to R users.</p>
<h2 id="retrieving-dynamic-data">Retrieving dynamic data</h2>
<p>What we really need is a method to publish the data itself rather than some representation of the data in a particular format. With OpenCPU you can publish R <emph>objects</emph> (including datasets) in a way that lets the clients select the format and formatting options for retrieving the dataset. This is implemented using native R functionality to include arbitrary data/objects in packages, and standard R functions for exporting these data. For example, the CRAN package <code>MASS</code> includes a dataset called <code>bacteria</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">MASS</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="n">bacteria</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">bacteria</span><span class="p">)</span></code></pre></figure>
<p>Via OpenCPU, the dataset can downloaded by anyone, using one of many formats:</p>
<table class="table table-hover table-bordered">
<thead>
<tr>
<th>Format</th>
<th>Export Function</th>
<th>URL (short)</th>
</tr>
</thead>
<tbody>
<tr>
<td>text</td>
<td><code>print</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/print"><code>cran.ocpu.io/MASS/data/bacteria/print</code></a></td>
</tr>
<tr>
<td>CSV</td>
<td><code>write.csv</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/csv"><code>cran.ocpu.io/MASS/data/bacteria/csv</code></a></td>
</tr>
<tr>
<td>TSV</td>
<td><code>write.table</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/tab"><code>cran.ocpu.io/MASS/data/bacteria/tab</code></a></td>
</tr>
<tr>
<td>JSON</td>
<td><code>jsonlite::toJSON</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/json"><code>cran.ocpu.io/MASS/data/bacteria/json</code></a></td>
</tr>
<tr>
<td>Protocol Buffers</td>
<td><code>RProtoBuf::serialize_pb</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/pb"><code>cran.ocpu.io/MASS/data/bacteria/pb</code></a></td>
</tr>
<tr>
<td>RData</td>
<td><code>save</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/rda"><code>cran.ocpu.io/MASS/data/bacteria/rda</code></a></td>
</tr>
<tr>
<td>RDS</td>
<td><code>saveRDS</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/rds"><code>cran.ocpu.io/MASS/data/bacteria/rds</code></a></td>
</tr>
<tr>
<td>ascii R</td>
<td><code>dput</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/ascii"><code>cran.ocpu.io/MASS/data/bacteria/ascii</code></a></td>
</tr>
</tbody>
</table>
<p>The client can also control formatting options by passing HTTP parameters. These parameters map directly to function arguments for the respective export function in the table above. Some random examples:</p>
<table class="table table-hover table-bordered">
<thead>
<tr>
<th>Output Format</th>
<th>Equivalent URL on Public OpenCPU Server</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>write.csv(bacteria, row.names=TRUE)</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/bacteria/csv?row.names=true"><code>cran.ocpu.io/MASS/data/bacteria/csv?row.names=true</code></a></td>
</tr>
<tr>
<td><code>jsonlite::toJSON(Boston, digits=4)</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/Boston/json?digits=4"><code>cran.ocpu.io/MASS/data/Boston/json?digits=4</code></a></td>
</tr>
<tr>
<td><code>jsonlite::toJSON(Boston, dataframe="columns")</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/Boston/json?dataframe=columns&digits=4"><code>cran.ocpu.io/MASS/data/Boston/json?dataframe=columns</code></a></td>
</tr>
<tr>
<td><code>jsonlite::toJSON(Boston, pretty=FALSE)</code></td>
<td><a href="https://cran.ocpu.io/MASS/data/Boston/json?pretty=false"><code>cran.ocpu.io/MASS/data/Boston/json?pretty=false</code></a></td>
</tr>
</tbody>
</table>
<h2 id="creating-a-data-package">Creating a data package</h2>
<p>To start publishing your own dynamic data you need to put your data objects in an R package following the standard guidelines as documented in <a href="http://cran.r-project.org/doc/manuals/R-exts.html#Data-in-packages">section 1.1.6</a> of <i>Writing R Extensions</i>. This might sound cumbersome, but once you get a hold of it, it only takes a few seconds. You’ll realize that packages are actually a beautiful, standardized and well-tested container format for R objects and much more. Have a look at the data folder in the <a href="https://github.com/opencpu/appdemo">opencpu/appdemo</a> package for some examples.</p>
<p>After creating and installing your package on your local R, test it using the OpenCPU single user server:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">opencpu</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/mypackage/data"</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/mypackage/data/myobject"</span><span class="p">)</span></code></pre></figure>
<h2 id="publishing-dynamic-data-on-ocpuio">Publishing dynamic data on ocpu.io</h2>
<p>To make your data available through the public OpenCPU server and <code>ocpu.io</code>, all you need to do is put your package up on Github. OpenCPU requires the name of the Github repository to match the name of the R package it contains. Use devtools to test if your package is working:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"pkgname"</span><span class="p">,</span><span class="w"> </span><span class="s2">"username"</span><span class="p">)</span></code></pre></figure>
<p>If this succeeds you’re good to go. Navigate to <code>username.ocpu.io/pkgname/data</code> where username is your Github login. By default the OpenCPU public server updates packages installed from Github every 24 hours. However, the <a href="../../api.html#api-ci">Github webhook</a> can be used to update the package immediately every time a commit is pushed to github.</p>
<h2 id="publishing-dynamic-data-on-your-own-server">Publishing dynamic data on your own server</h2>
<p>OpenCPU does not lock you into some commercial hosting service. Your data is stored on Github in a standard format under your control. The <code>ocpu.io</code> public server is there for your convenience. You can also <a href="../../download.html">install your own OpenCPU cloud server</a> to publish data at e.g. <code>http://opencpu.yourserver.com/ocpu/library/pkgname/data/myobject</code>. No need to put anything on Github, just install the package in R on the server.</p>
Share and access R code, data, apps on ocpu.io2014-02-12T00:00:00+00:00https://www.opencpu.org/posts/publishing-apps-on-ocpuio
<a href="https://www.opencpu.org/posts/publishing-apps-on-ocpuio"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p><code>ocpu.io</code> is a new domain for publishing code, data and apps based on the OpenCPU system. Any R package on Github is directly available via <code>yourname.ocpu.io</code>. Thereby the package can be used remotely via the <a href="../../api.html">OpenCPU API</a> to access data, perform remote function calls, reproduce results, publish webapps, and much more. The OpenCPU <a href="../../demo.html">public server page</a> explains how requests to <code>ocpu.io</code> map to the existing public demo server.</p>
<h2 id="examples">Examples</h2>
<table class="table table-hover table-bordered">
<thead>
<tr>
<th>Action</th>
<th>URL (short)</th>
</tr>
</thead>
<tbody>
<tr>
<td>List packages on CRAN</td>
<td><a href="https://cran.ocpu.io"><code>cran.ocpu.io</code></a></td>
</tr>
<tr>
<td>List packages on BioConductor</td>
<td><a href="https://bioc.ocpu.io"><code>bioc.ocpu.io</code></a></td>
</tr>
<tr>
<td>Github repositories from: <a href="http://github.com/hadley">Hadley</a></td>
<td><a href="https://hadley.ocpu.io"><code>hadley.ocpu.io</code></a></td>
</tr>
<tr><th colspan="3" class="text-center">Package Info</th></tr>
<tr>
<td>MASS from CRAN</td>
<td><a href="https://cran.ocpu.io/MASS/"><code>cran.ocpu.io/MASS/</code></a></td>
</tr>
<tr>
<td>plyr from CRAN</td>
<td><a href="https://cran.ocpu.io/plyr/"><code>cran.ocpu.io/plyr/</code></a></td>
</tr>
<tr>
<td>plyr from Github</td>
<td><a href="https://hadley.ocpu.io/plyr/"><code>hadley.ocpu.io/plyr/</code></a></td>
</tr>
<tr><th colspan="3" class="text-center">Package Contents</th></tr>
<tr>
<td>MASS datasets</td>
<td><a href="https://cran.ocpu.io/MASS/data/"><code>cran.ocpu.io/MASS/data/</code></a></td>
</tr>
<tr>
<td>plyr datasets</td>
<td><a href="https://hadley.ocpu.io/plyr/data/"><code>hadley.ocpu.io/plyr/data/</code></a></td>
</tr>
<tr>
<td>plyr R objects</td>
<td><a href="https://hadley.ocpu.io/plyr/R/"><code>hadley.ocpu.io/plyr/R/</code></a></td>
</tr>
<tr>
<td>plyr help pages</td>
<td><a href="https://hadley.ocpu.io/plyr/man/"><code>hadley.ocpu.io/plyr/man/</code></a></td>
</tr>
<tr>
<td>plyr files</td>
<td><a href="https://hadley.ocpu.io/plyr/DESCRIPTION"><code>hadley.ocpu.io/plyr/DESCRIPTION</code></a></td>
</tr>
<tr><th colspan="3" class="text-center">Datasets</th></tr>
<tr>
<td>mammals sleep data (print)</td>
<td><a href="https://hadley.ocpu.io/ggplot2/data/msleep/print"><code>hadley.ocpu.io/ggplot2/data/msleep/print</code></a></td>
</tr>
<tr>
<td>mammals sleep data (csv)</td>
<td><a href="https://hadley.ocpu.io/ggplot2/data/msleep/csv"><code>hadley.ocpu.io/ggplot2/data/msleep/csv</code></a></td>
</tr>
<tr>
<td>mammals sleep data (json)</td>
<td><a href="https://hadley.ocpu.io/ggplot2/data/msleep/json?digits=4"><code>hadley.ocpu.io/ggplot2/data/msleep/json</code></a></td>
</tr>
<tr>
<td>mammals sleep data (json columns)</td>
<td><a href="https://hadley.ocpu.io/ggplot2/data/msleep/json?dataframe=column&digits=4"><code>hadley.ocpu.io/ggplot2/data/msleep/json?dataframe=column</code></a></td>
</tr>
<tr><th colspan="3" class="text-center">Manual pages</th></tr>
<tr>
<td>msleep help (text) </td>
<td><a href="https://hadley.ocpu.io/ggplot2/man/msleep/text"><code>hadley.ocpu.io/ggplot2/man/msleep/text</code></a></td>
</tr>
<tr>
<td>msleep help (html) </td>
<td><a href="https://hadley.ocpu.io/ggplot2/man/msleep/html"><code>hadley.ocpu.io/ggplot2/man/msleep/html</code></a></td>
</tr>
<tr>
<td>msleep help (pdf) </td>
<td><a href="https://hadley.ocpu.io/ggplot2/man/msleep/pdf"><code>hadley.ocpu.io/ggplot2/man/msleep/pdf</code></a></td>
</tr>
<tr><th colspan="3" class="text-center">Example Apps</th></tr>
<tr>
<td>appdemo <a href="http://github.com/opencpu/appdemo">(src)</a></td>
<td><a href="https://opencpu.ocpu.io/appdemo/www"><code>opencpu.ocpu.io/appdemo/www</code></a></td>
</tr>
<tr>
<td>stocks <a href="http://github.com/opencpu/stocks">(src)</a></td>
<td><a href="https://opencpu.ocpu.io/stocks/www"><code>opencpu.ocpu.io/stocks/www</code></a></td>
</tr>
<tr>
<td>nabel <a href="http://github.com/opencpu/nabel">(src)</a></td>
<td><a href="https://opencpu.ocpu.io/nabel/www"><code>opencpu.ocpu.io/nabel/www</code></a></td>
</tr>
<tr>
<td>markdownapp <a href="http://github.com/opencpu/markdownapp">(src)</a></td>
<td><a href="https://opencpu.ocpu.io/markdownapp/www"><code>opencpu.ocpu.io/markdownapp/www</code></a></td>
</tr>
<tr>
<td>mapapp <a href="http://github.com/opencpu/mapapp">(src)</a></td>
<td><a href="https://opencpu.ocpu.io/mapapp/www"><code>opencpu.ocpu.io/mapapp/www</code></a></td>
</tr>
</tbody>
</table>
<h2 id="how-to-use">How to use</h2>
<p>To start publishing on <code>ocpu.io</code> you need to put your R functions, datasets, scripts, sweave/knitr documents into an R package and put it up on Github. This is not too difficult, there are many guides on how to do this. OpenCPU requires the name of the Github repository to match the name of the R package it contains. Use devtools to test if your package is working:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"pkgname"</span><span class="p">,</span><span class="w"> </span><span class="s2">"username"</span><span class="p">)</span></code></pre></figure>
<p>If this succeeds you’re good to go. Navigate to <code>username.ocpu.io/pkgname</code> where username is your Github login. The <a href="../../api.html">API docs</a> and <a href="../../jslib.html">JavaScript docs</a> explain how to read objects, files and datasets, RPC functions and develop apps.</p>
<p>By default the OpenCPU public server updates packages installed from Github every 24 hours. However, the <a href="../../api.html#api-ci">Github webhook</a> can be used to update the package immediately every time a commit is pushed to github.</p>
OpenCPU 1.2: Flexible and reliable R function RPC over HTTPS + JSON2013-12-19T00:00:00+00:00https://www.opencpu.org/posts/opencpu-release-1.2
<a href="https://www.opencpu.org/posts/opencpu-release-1.2"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>Earlier this week, OpenCPU 1.2 was released. This release uses the new <a href="../jsonlite-a-smarter-json-encoder/">jsonlite</a> package for JSON conversion, which puts in place the final fundamental piece of the OpenCPU framework. This post describes what has changed, why this is important, and how to upgrade.</p>
<p>From here, no major changes in the OpenCPU API are planned for quite a while, so that we can shift focus towards optimizing performance, implementing client-libraries and developing applications.</p>
<h2 id="https-json-and-opencpu">HTTPS, JSON and OpenCPU</h2>
<p>Let’s first explain why this piece is important. The OpenCPU API defines a mapping between HTTP request and R function calls. This is easy for simple input and output, such as numbers or vectors:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/stats/R/rnorm/json <span class="nt">-d</span> <span class="s1">'n=10&mean=5'</span></code></pre></figure>
<p>But what if the R function has a return value or arguments which require more advanced objects, such as a matrix or data frame? This is where <code>jsonlite</code> comes in. The <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">jsonlite vignette</a> defines <i><b>a practical and consistent mapping between JSON data and R Objects</i></b>. This allows OpenCPU to automatically convert incoming JSON arguments into R objects using <code>jsonlite::fromJSON</code>, and convert output values back into JSON using <code>jsonlite::toJSON</code>. Thereby the cycle is complete, and we can call advanced R functions over http(s)+json without requiring clients to have any understanding of R.</i></p>
<h2 id="an-example-melting-data-frames">An example: melting data frames</h2>
<p>Examples with curl get a bit verbose with a large payload, but to get an idea, let’s melt some data using the <code>melt</code> function in the <code>reshape2</code> package. This function has an argument <tt>data</tt> (data frame) and an argument <tt>id</tt> (character vector). It returns another data frame. In this example, we pass it the first three rows of the AirQuaility dataset, very similar to the example in the <a href="https://cloud.opencpu.org/ocpu/library/reshape2/man/melt.data.frame/text">melt manual page</a>. The API docs explain that the JSON objects can either be posted as HTTP parameters in a standard HTTP POST formats (i.e. multipart or x-www-form-urlencoded):</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/reshape2/R/melt/json <span class="se">\</span>
<span class="nt">-d</span> <span class="s1">'data=[{"Ozone":41, "Solar.R":190, "Wind":7.4, "Temp":67, "Month":5, "Day":1},
{"Ozone":36, "Solar.R":118, "Wind":8, "Temp": 72, "Month":5, "Day":2},
{"Ozone":12, "Solar.R":149, "Wind":12.6, "Temp": 74, "Month":5, "Day":3}]&id=["Month", "Day"]'</span></code></pre></figure>
<p>Alternatively, we can do pure JSON RPC by setting the <code>Content-Type: application/json</code> header:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/reshape2/R/melt/json <span class="se">\</span>
<span class="nt">-H</span> <span class="s1">'Content-Type: application/json'</span> <span class="se">\</span>
<span class="nt">-d</span> <span class="s1">'{
"data": [
{"Ozone":41, "Solar.R":190, "Wind":7.4, "Temp":67, "Month":5, "Day":1},
{"Ozone":36, "Solar.R":118, "Wind":8, "Temp": 72, "Month":5, "Day":2},
{"Ozone":12, "Solar.R":149, "Wind":12.6, "Temp": 74, "Month":5, "Day":3}
],
"id" :["Month", "Day"]
}'</span></code></pre></figure>
<p>Note that if you use Windows, the <code>curl</code> examples might need to be modified to properly escape the quotes in the windows terminal. This is just a limitation of using the windows command line; it won’t be a problem for actual clients (e.g. a browser). If you don’t like curl, the same request can be performed using the <a href="https://cloud.opencpu.org/ocpu/test">ocpu test page</a>.</p>
<p>The above RPC request is equivalent to the R code below. You can use this code as a template to see how your R functions would behave when called remotely over OpenCPU.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Load required packages</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">reshape2</span><span class="p">)</span><span class="w">
</span><span class="c1"># Input arguments in JSON format</span><span class="w">
</span><span class="n">input</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'{
"data": [
{"Ozone":41, "Solar.R":190, "Wind":7.4, "Temp":67, "Month":5, "Day":1},
{"Ozone":36, "Solar.R":118, "Wind":8, "Temp": 72, "Month":5, "Day":2},
{"Ozone":12, "Solar.R":149, "Wind":12.6, "Temp": 74, "Month":5, "Day":3}
],
"id" :["Month", "Day"]
}'</span><span class="w">
</span><span class="c1"># The actual function call</span><span class="w">
</span><span class="n">args</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w">
</span><span class="n">result</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">do.call</span><span class="p">(</span><span class="n">reshape2</span><span class="o">::</span><span class="n">melt</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w">
</span><span class="c1"># This is what you get back from OpenCPU</span><span class="w">
</span><span class="n">output</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">toJSON</span><span class="p">(</span><span class="n">result</span><span class="p">,</span><span class="w"> </span><span class="n">pretty</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">output</span><span class="p">)</span></code></pre></figure>
<h2 id="upgrading-to-opencpu-12">Upgrading to OpenCPU 1.2</h2>
<p>It is recommended to update your servers and applications to version 1.2 rather sooner than later. The 1.0 branch will keep working, but it won’t get any new fixes or updates. We plan to stay on the 1.2 branch for quite a while.</p>
<p>The introduction of <code>jsonlite</code> does not affect the HTTP API itself, but existing applications that rely heavily on JSON to get data in and out of R might need some modification. For this reason we decided to bump the version to the <tt>1.2</tt> series. If you have existing OpenCPU clients/applications that use JSON, have a look at the <a href="../jsonlite-a-smarter-json-encoder/">post about jsonlite</a> to get a better understanding of how JSON data map to R objects and vice versa. Installing or upgrading the OpenCPU single-user development server is business as usual:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">update.packages</span><span class="p">(</span><span class="n">ask</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"opencpu"</span><span class="p">)</span></code></pre></figure>
<p>Servers running the OpenCPU 1.0 cloud server will not automatically receive the update to 1.2, to prevent existing applications from breaking. In order to update a previous installation of the OpenCPU cloud server, you need to add the new repository first:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>add-apt-repository ppa:opencpu/opencpu-1.2
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get upgrade</code></pre></figure>
<p>To see if the update was successful, navigate to <a href="https://cloud.opencpu.org/ocpu/library/opencpu/">/ocpu/library/opencpu</a> on your server to check the currently installed version of the opencpu package.</p>
New package: jsonlite. A smart(er) JSON encoder/decoder.2013-12-06T00:00:00+00:00https://www.opencpu.org/posts/jsonlite-a-smarter-json-encoder
<a href="https://www.opencpu.org/posts/jsonlite-a-smarter-json-encoder"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>This week we released a new package on CRAN: <a href="http://cran.r-project.org/web/packages/jsonlite/index.html">jsonlite</a>. This package is a fork of <code class="language-plaintext highlighter-rouge">RJSONIO</code> by Duncan Temple Lang and builds on the same parser, but uses a different mapping between R objects and JSON data. The <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">package vignette</a> goes in great detail and has many examples on how JSON data are converted to R objects and vice versa. To try it:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#install</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"jsonlite"</span><span class="p">,</span><span class="w"> </span><span class="n">repos</span><span class="o">=</span><span class="s2">"http://cran.r-project.org"</span><span class="p">)</span><span class="w">
</span><span class="c1">#load</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="c1">#convert object to json</span><span class="w">
</span><span class="n">myjson</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">toJSON</span><span class="p">(</span><span class="n">iris</span><span class="p">,</span><span class="w"> </span><span class="n">pretty</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">myjson</span><span class="p">)</span><span class="w">
</span><span class="c1">#convert json back to object</span><span class="w">
</span><span class="n">iris2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="n">myjson</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">iris2</span><span class="p">)</span></code></pre></figure>
<h2 id="so-whats-new">So what’s new?</h2>
<p>The <code class="language-plaintext highlighter-rouge">jsonlite</code> package implements functions <code class="language-plaintext highlighter-rouge">toJSON</code> and <code class="language-plaintext highlighter-rouge">fromJSON</code> similar to those in packages as <code class="language-plaintext highlighter-rouge">RJSONIO</code> and <code class="language-plaintext highlighter-rouge">rjson</code>, but options and output are quite different. The primary goal in the design of <code class="language-plaintext highlighter-rouge">jsonlite</code> is to recognize and comply with conventional ways of encoding data in JSON (outside the R community), in particular (relational) datasets. This increases interoperability when dealing with external data from within R, or when reading/writing R objects from an external client (e.g. through <a href="http://opencpu.org">OpenCPU</a>). For example, consider structures as returned by the Github API:</p>
<ul>
<li>Simple dataset: <a href="https://api.github.com/users/hadley/orgs" target="_blank">https://api.github.com/users/hadley/orgs</a></li>
<li>Nested dataset: <a href="https://api.github.com/users/hadley/repos" target="_blank">https://api.github.com/users/hadley/repos</a></li>
</ul>
<p>These JSON structures obviously represent data tables, or in R terminology: data frames. The first dataset is a single table; the second dataset has a relational structure with two tables: the <code class="language-plaintext highlighter-rouge">owner</code> property in the main table was generated from a foreign key that points to a record in a second table (owners). However, in their JSON representation these tables are structured <strong>by row</strong>, wereas R likes data frames <strong>by column</strong>. This is one example where <code class="language-plaintext highlighter-rouge">jsonlite</code> goes beyond other packages, and actually returns a data frame:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">jsonlite</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">httr</span><span class="p">)</span><span class="w">
</span><span class="c1">#get data</span><span class="w">
</span><span class="n">data1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="s2">"https://api.github.com/users/hadley/orgs"</span><span class="p">)</span><span class="w">
</span><span class="c1">#it's a data frame</span><span class="w">
</span><span class="nf">names</span><span class="p">(</span><span class="n">data1</span><span class="p">)</span><span class="w">
</span><span class="n">data1</span><span class="o">$</span><span class="n">login</span></code></pre></figure>
<p>The second example is a bit more complicated because of the relational structure. <code class="language-plaintext highlighter-rouge">jsonlite</code> tries to stay as close as possible to the original structure by returing a nested data frame:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">data2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fromJSON</span><span class="p">(</span><span class="s2">"https://api.github.com/users/hadley/repos"</span><span class="p">)</span><span class="w">
</span><span class="c1">#it's a data frame...</span><span class="w">
</span><span class="nf">names</span><span class="p">(</span><span class="n">data2</span><span class="p">)</span><span class="w">
</span><span class="n">data2</span><span class="o">$</span><span class="n">name</span><span class="w">
</span><span class="c1">#...with has a nested data frame</span><span class="w">
</span><span class="nf">names</span><span class="p">(</span><span class="n">data2</span><span class="o">$</span><span class="n">owner</span><span class="p">)</span><span class="w">
</span><span class="n">data2</span><span class="o">$</span><span class="n">owner</span><span class="o">$</span><span class="n">login</span><span class="w">
</span><span class="c1">#these are equivalent :)</span><span class="w">
</span><span class="n">data2</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="o">$</span><span class="n">owner</span><span class="o">$</span><span class="n">login</span><span class="w">
</span><span class="n">data2</span><span class="p">[</span><span class="m">1</span><span class="p">,</span><span class="s2">"owner"</span><span class="p">]</span><span class="o">$</span><span class="n">login</span><span class="w">
</span><span class="n">data2</span><span class="o">$</span><span class="n">owner</span><span class="p">[</span><span class="m">1</span><span class="p">,</span><span class="s2">"login"</span><span class="p">]</span><span class="w">
</span><span class="n">data2</span><span class="o">$</span><span class="n">owner</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="o">$</span><span class="n">login</span></code></pre></figure>
<p>The <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">package vignette</a> gives many more examples of how various structures map to R objects.</p>
<h2 id="on-correctness-and-performance">On correctness and performance</h2>
<p>The initial emphasis in jsonlite has been on correctness: rather than rushing towards performance, we want to explicity specify intended behavior covering all important structures. The complexity of this problem is easily understimated, which can result in unexpected behavior, ambiguous edge cases and differences between implementations. A set of conventions for a consistent and practical mapping are proposed in the <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">package vignette</a>. If you are using JSON with R, free to join the discussion.</p>
<blockquote>
<p>Premature optimization is the root of all evil.</p>
<small>Donald Knuth</small>
</blockquote>
<p>We hope that a clear specifiction will make it much easier to optimize performance or write alternate implementations. The <a href="http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf">package vignette</a> and package unit tests are intended to take away ambiguity on what exactly <code class="language-plaintext highlighter-rouge">toJSON</code> and <code class="language-plaintext highlighter-rouge">fromJSON</code> are supposed to do. From here we will start optimizing R code, port pieces to C++, or perhaps even write an entirely new implementation, without breaking software that depends on it.</p>
<p>If you would like to contribute to <code class="language-plaintext highlighter-rouge">jsonlite</code>, you can <a href="https://github.com/jeroenooms/jsonlite/">submit patches or pull requests</a> on github, as long as they don’t alter the behavior of the functions. At a minimum, they should pass the package unit tests… or you should modify the unit tests that are overly strict :-)</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">testthat</span><span class="p">)</span><span class="w">
</span><span class="n">test_package</span><span class="p">(</span><span class="s2">"jsonlite"</span><span class="p">)</span></code></pre></figure>
Continuous Integration with OpenCPU2013-11-27T00:00:00+00:00https://www.opencpu.org/posts/continuous-integration-of-R-packages
<a href="https://www.opencpu.org/posts/continuous-integration-of-R-packages"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>Starting version 1.0.7, the OpenCPU cloud server adds support for continuous integration (CI). This means that Github repositories can be configured to automatically install your package on an OpenCPU server, every time a commit is pushed. To take advantage of this feature, it is required that:</p>
<ol>
<li>Your R source package is hosted on Github.</li>
<li>The name of the Github repository is identical to the name of the R package</li>
<li>Your Github user account has a public email address</li>
</ol>
<p>To setup CI, add the following URL as a ‘WebHook’ in your Github repository:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/webhook
</code></pre></div></div>
<p>Make sure to select payload version <code>application/vnd.github.v3+form</code>. To trigger a build push a commit to the master branch. The build will show up under <i>Recent Deliveries</i> in your github project page and if you should receive an email reporting if the installation was successful. If it was, the package will directly be available for remote use through the OpenCPU API.</p>
<p><img class="img-thumbnail img-responsive" src="../../images/githook.png" alt="git hook screenshot" /></p>
<h2 id="but-why">But why?</h2>
<p>Continuous Integration in OpenCPU addresses several issues at once:</p>
<ul>
<li>If you introduced a bug and your package fails to install, you get notified by email immediately.</li>
<li>Deploy packages/apps on OpenCPU public cloud servers without having to wait until the server synchronizes.</li>
<li>You can use CI without relying on a 3rd party service; installing your own OpenCPU server is easy.</li>
</ul>
<p>Every active R package maintainer could benefit from some sort of CI environment, with or without OpenCPU. Earlier this year, Yihui had a cool <a href="http://yihui.name/en/2013/04/travis-ci-for-r/">blog post</a> about <a href="https://travis-ci.org/">Travis CI</a> (also see <a href="https://github.com/craigcitro/r-travis">r-travis</a>). Simon Urbanek’s rforge.net is another service that provides some auto-building functionality. One way or another, it’s important to frequently check that your all your packages still build, pass unit tests, haven’t introduced conflicts, etc. That way you catch problems immediately while the changes are still fresh in your memory.</p>
<p>Moreover, unexpected changes in R or dependencies are often beyond your control, but can cause your package to work one day, and break the next. The article on <a href="http://arxiv.org/abs/1303.2140">Possible Directions for Improving Dependency Versioning in R</a> (<i>The R Journal <a href="http://journal.r-project.org/archive/2013-1">Vol. 5/1</a>, June 2013)</i> explained that CRAN requires all “current” packages to compatible, which assumes that all package authors are constantly on the lookout for changes in dependencies and reverse dependencies, forever. This system is unsustainable and will eventually have to be revised, but continuous integration can at least help detecting problems as soon as possible.</p>
<h2 id="final-notes">Final notes</h2>
<p>Some final notes/disclaimers: this feature is currently being tested; please let me know if something is not working. To setup your own OpenCPU CI server, you need to configure an SMTP server; which is not yet documented in the <a href="https://opencpu.github.io/server-manual/opencpu-server.pdf">PDF manual</a>. Also note that currently only the default (master) branch will be deployed; pushes to other branches are ignored. Finally some packages might not build on the public demo server because of missing system dependencies. If your package needs any particular libraries, send me an email (or set up your own cloud server :-)</p>
The RAppArmor Package: Enforcing Security Policies in R Using Dynamic Sandboxing on Linux2013-11-14T00:00:00+00:00https://www.opencpu.org/posts/rapparmor-jss-publication
<a href="https://www.opencpu.org/posts/rapparmor-jss-publication"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>An article called <strong>The RAppArmor Package: Enforcing Security Policies in R Using Dynamic Sandboxing on Linux</strong> has appeared in the latest volume of he <i>Journal of Statistical Software</i>: <a href="http://www.jstatsoft.org/v55/i07">http://www.jstatsoft.org/v55/i07</a>. The RAppArmor package is one of the foundations of the OpenCPU framework. It protects against malicious use and excessive use of hardware resources when executing arbitrary R code. From the abstract:</p>
<p><blockquote><small><em>The increasing availability of cloud computing and scientific super computers brings great potential for making R accessible through public or shared resources. This allows us to efficiently run code requiring lots of cycles and memory, or embed R functionality into, e.g., systems and web services. However some important security concerns need to be addressed before this can be put in production. The prime use case in the design of R has always been a single statistician running R on the local machine through the interactive console. Therefore the execution environment of R is entirely unrestricted, which could result in malicious behavior or excessive use of hardware resources in a shared environment. Properly securing an R process turns out to be a complex problem. We describe various approaches and illustrate potential issues using some of our personal experiences in hosting public web services. Finally we introduce the RAppArmor package: a Linux based reference implementation for dynamic sandboxing in R on the level of the operating system.</em></small></blockquote></p>
<p>Code, documentation, examples and videos are available from Github: <a href="https://github.com/jeroenooms/RAppArmor#readme">https://github.com/jeroenooms/RAppArmor</a>. A quick preview of what the package does below. The <code>eval.secure</code> function evaluates an expression in a sandboxed process. This way it is possible to set limits on hardware resources such as memory allocation, cpu usage, etc:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">RAppArmor</span><span class="p">)</span><span class="w">
</span><span class="c1">#sandboxed evaluation: setting 500MB memory limit</span><span class="w">
</span><span class="n">A</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">eval.secure</span><span class="p">(</span><span class="n">rnorm</span><span class="p">(</span><span class="m">1e7</span><span class="p">),</span><span class="w"> </span><span class="n">RLIMIT_AS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">512</span><span class="o">*</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="p">);</span><span class="w">
</span><span class="nf">length</span><span class="p">(</span><span class="n">A</span><span class="p">)</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">10000000</span><span class="w">
</span><span class="n">B</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">eval.secure</span><span class="p">(</span><span class="n">rnorm</span><span class="p">(</span><span class="m">1e8</span><span class="p">),</span><span class="w"> </span><span class="n">RLIMIT_AS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">512</span><span class="o">*</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="p">);</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">Error</span><span class="o">:</span><span class="w"> </span><span class="n">cannot</span><span class="w"> </span><span class="n">allocate</span><span class="w"> </span><span class="n">vector</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="m">762.9</span><span class="w"> </span><span class="n">Mb</span></code></pre></figure>
<p>RAppArmor can also set hard time limits to kill jobs that are not returning timely. These time limits always work, unlike e.g. R's built-in <code>setTimeLimit</code> which won't work for the example below:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">cputest</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">A</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">rnorm</span><span class="p">(</span><span class="m">1e7</span><span class="p">),</span><span class="w"> </span><span class="m">1e3</span><span class="p">)</span><span class="w">
</span><span class="n">B</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">svd</span><span class="p">(</span><span class="n">A</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">eval.secure</span><span class="p">(</span><span class="n">cputest</span><span class="p">(),</span><span class="w"> </span><span class="n">timeout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">))</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">Error</span><span class="o">:</span><span class="w"> </span><span class="n">R</span><span class="w"> </span><span class="n">call</span><span class="w"> </span><span class="n">did</span><span class="w"> </span><span class="n">not</span><span class="w"> </span><span class="n">return</span><span class="w"> </span><span class="n">within</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="n">seconds.</span><span class="w"> </span><span class="n">Terminating</span><span class="w"> </span><span class="n">process.</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">Timing</span><span class="w"> </span><span class="n">stopped</span><span class="w"> </span><span class="n">at</span><span class="o">:</span><span class="w"> </span><span class="m">0.003</span><span class="w"> </span><span class="m">0.006</span><span class="w"> </span><span class="m">5.008</span></code></pre></figure>
<p> But the most important feature is enforce Mandatory Access Control policies by applying an AppArmor profile. In this profile you can specify exactly which files and resources on the system a process is allowed to access and which not. For example, the <a href="https://github.com/jeroenooms/RAppArmor/blob/master/inst/profiles/debian/rapparmor.d/r-user">r-user</a> profile used below does <strong>not</strong> have permission to list the contents of the root of the system:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">list.files</span><span class="p">(</span><span class="s2">"/"</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"bin"</span><span class="w"> </span><span class="s2">"boot"</span><span class="w"> </span><span class="s2">"cdrom"</span><span class="w"> </span><span class="s2">"dev"</span><span class="w">
</span><span class="p">[</span><span class="m">5</span><span class="p">]</span><span class="w"> </span><span class="s2">"etc"</span><span class="w"> </span><span class="s2">"home"</span><span class="w"> </span><span class="s2">"initrd.img"</span><span class="w"> </span><span class="s2">"initrd.img.old"</span><span class="w">
</span><span class="p">[</span><span class="m">9</span><span class="p">]</span><span class="w"> </span><span class="s2">"lib"</span><span class="w"> </span><span class="s2">"lib64"</span><span class="w"> </span><span class="s2">"lost+found"</span><span class="w"> </span><span class="s2">"media"</span><span class="w">
</span><span class="p">[</span><span class="m">13</span><span class="p">]</span><span class="w"> </span><span class="s2">"mnt"</span><span class="w"> </span><span class="s2">"opt"</span><span class="w"> </span><span class="s2">"proc"</span><span class="w"> </span><span class="s2">"root"</span><span class="w">
</span><span class="p">[</span><span class="m">17</span><span class="p">]</span><span class="w"> </span><span class="s2">"run"</span><span class="w"> </span><span class="s2">"sbin"</span><span class="w"> </span><span class="s2">"srv"</span><span class="w"> </span><span class="s2">"sys"</span><span class="w">
</span><span class="p">[</span><span class="m">21</span><span class="p">]</span><span class="w"> </span><span class="s2">"tmp"</span><span class="w"> </span><span class="s2">"usr"</span><span class="w"> </span><span class="s2">"var"</span><span class="w"> </span><span class="s2">"vmlinuz"</span><span class="w">
</span><span class="p">[</span><span class="m">25</span><span class="p">]</span><span class="w"> </span><span class="s2">"vmlinuz.old"</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">eval.secure</span><span class="p">(</span><span class="n">list.files</span><span class="p">(</span><span class="s2">"/"</span><span class="p">),</span><span class="w"> </span><span class="n">profile</span><span class="o">=</span><span class="s2">"r-user"</span><span class="p">)</span><span class="w">
</span><span class="n">character</span><span class="p">(</span><span class="m">0</span><span class="p">)</span></code></pre></figure>
<p>This and much more is described in detail in the Journal of Statistical Software: <a href="http://www.jstatsoft.org/v55/i07">http://www.jstatsoft.org/v55/i07</a>.
</p>
OpenCPU Release 1.0.42013-10-17T00:00:00+00:00https://www.opencpu.org/posts/opencpu-release.1.0.4
<a href="https://www.opencpu.org/posts/opencpu-release.1.0.4"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>OpenCPU version 1.0.4 was released to CRAN and Launchpad this week. This release brings some bug fixes/improvements and no breaking changes so you can safely upgrade your 1.0.x installations. Upgrade an existing OpenCPU cloud server using:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get upgrade </code></pre></figure>
<p>Or to install the latest version of the OpenCPU local single-user server in R:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">update.packages</span><span class="p">(</span><span class="n">ask</span><span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"opencpu"</span><span class="p">,</span><span class="w"> </span><span class="n">repos</span><span class="o">=</span><span class="s2">"http://cran.r-project.org"</span><span class="p">)</span></code></pre></figure>
<h2 id="new-in-this-release">New in this release</h2>
<p>One improvement in this release is the capturing of output from the package installation process. This is surprisingly difficult in R, but thanks to some helpful <a href="http://r.789695.n4.nabble.com/Capture-output-of-install-packages-pipe-system2-td4676754.html">tips</a> on r-devel, we found a way to implement it. This makes it much easier to diagnose the problem if a certain package fails to install on OpenCPU.</p>
<p>For example: as described in the API manual <a href="https://cloud.opencpu.org/api.html#api-libraries">section on libraries</a>, the <a href="https://cloud.opencpu.org/ocpu/cran/" target="blank"><code>/ocpu/cran/</code></a>, <a href="https://cloud.opencpu.org/ocpu/bioc/" target="blank"><code>/ocpu/bioc/</code></a> and <a href="https://cloud.opencpu.org/ocpu/github/hadley/" target="blank"><code>/ocpu/github/</code></a> APIs represent <strong>remote libraries</strong>: when a client calls a package in any of these libraries for the first time, the OpenCPU server will attempt to install the current version of the corresponding package on the fly (if not already available), before processing the request. In a <a href="https://cloud.opencpu.org/posts/remotely-use-r-packages-on-github/">previous post</a> we described how this allows anyone on the internet to use your R package without even installing R.</p>
<p>However, sometimes the installation of a package fails, for example because of a missing dependency or version conflict. To make it easier to diagnose the problem, the OpenCPU server now returns the output from the package installation process for failed installations. For example, here are two packages that fail to install, and now we know why :-)</p>
<ul>
<li><a href="https://cloud.opencpu.org/ocpu/github/hadley/dplyr/" target="_blank">/ocpu/github/hadley/dplyr/</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/cran/rgl/" target="_blank">/ocpu/cran/rgl/</a></li>
</ul>
<p>Loading these pages can take a couple of seconds because we have to wait for the installation process to complete. However once a package installation has succeeded it is stored for 24 hours so that the next request/user will be able to use it instantaneously.</p>
<h2 id="about-local-and-remote-libraries">About Local and Remote libraries</h2>
<p>It is important to note that the above only applies to the mentioned <strong>remote libraries</strong>. Package in any of the <string>local libraries</strong> such as <a href="https://cloud.opencpu.org/ocpu/library/" target="blank"><code>/ocpu/library/</code></a> are already installed on the server. When running your own OpenCPU server, it is preferable to install your package on the server in the usual ways and call it via the local library API. The remote libraries are mostly intended to allow anyone to share and use arbitrary packages on public OpenCPU servers.</string></p>
Remotely use R packages on Github through OpenCPU2013-10-01T00:00:00+00:00https://www.opencpu.org/posts/remotely-use-r-packages-on-github
<a href="https://www.opencpu.org/posts/remotely-use-r-packages-on-github"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>Any R package on Github can be used remotely on OpenCPU through the <a href="https://cloud.opencpu.org/api.html#api-libraries"><code class="language-plaintext highlighter-rouge">/ocpu/github/</code></a> API. Users on the internet can browse code, objects, help pages, or call functions in the package without having to learn R or install it on their local machine. Thereby you can make your method, algorithm, plot or DPU more accessible outside the R community.</p>
<p>For example: <a href="https://cloud.opencpu.org/posts/implementing-data-processing-units-with-opencpu/">last time</a> we discussed how OpenMHealth uses the <a href="https://github.com/openmhealth/dpu.mobility/blob/master/R/geodistance.R">geodistance</a> function to calculate the total distance along a set of lon/lat coordinates using <a href="http://en.wikipedia.org/wiki/Haversine_formula">Haversine</a> formula. The <code class="language-plaintext highlighter-rouge">geodistance</code> function is included in the <a href="https://github.com/openmhealth/dpu.mobility">dpu.mobility</a> R package and avaible on the <code class="language-plaintext highlighter-rouge">openmhealth</code> github repository. By putting the <code class="language-plaintext highlighter-rouge">dpu.mobility</code> package on Github, all functionality in the package can now be accessed directly though the OpenCPU cloud server.
Try opening some of the URL’s below in your browser (play around with the URL to get a sense of the API). The package help pages are available under <code class="language-plaintext highlighter-rouge">/man/</code> (in several formats):</p>
<ul>
<li><a href="https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/man/">/ocpu/github/openmhealth/dpu.mobility/man/</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/man/geodistance/text">/ocpu/github/openmhealth/dpu.mobility/man/geodistance/text</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/man/geodistance/html">/ocpu/github/openmhealth/dpu.mobility/man/geodistance/html</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/man/geodistance/pdf">/ocpu/github/openmhealth/dpu.mobility/man/geodistance/pdf</a></li>
</ul>
<p>The R functions and objects in the package are available under <code class="language-plaintext highlighter-rouge">/R/</code>:</p>
<ul>
<li><a href="https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/R">/ocpu/github/openmhealth/dpu.mobility/R</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/R/geodistance">/ocpu/github/openmhealth/dpu.mobility/R/geodistance</a></li>
</ul>
<p>Any R function in the package can be called remotely using <code class="language-plaintext highlighter-rouge">HTTP POST</code>. For example to calculate the distance from LA to NY and back with curl:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">#POST function call as url-encoded</span>
curl https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/R/geodistance/json <span class="nt">-d</span> <span class="se">\</span>
<span class="s1">'long=[-74.0064,-118.2430,-74.0064]&lat=[40.7142,34.0522,40.7142]'</span>
<span class="c">#POST equivalent call using json</span>
curl https://cloud.opencpu.org/ocpu/github/openmhealth/dpu.mobility/R/geodistance/json <span class="se">\</span>
<span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="nt">-d</span> <span class="s1">'{"long":[-74.0064,-118.2430,-74.0064],"lat":[40.7142,34.0522,40.7142]}'</span> </code></pre></figure>
<p>We use <code class="language-plaintext highlighter-rouge">curl</code> for illustration in this example, but any browser or web client could do the same thing, allowing anyone to embed your algorithms or plots in systems and applications.</p>
<h1 id="try-it-yourself">Try it yourself!</h1>
<p>For an R package to be used remotely on OpenCPU, it must be installible with <code class="language-plaintext highlighter-rouge">install_github</code> and the R package name must be identical to the repository name. I.e. if this works on your local machine:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"plyr"</span><span class="p">,</span><span class="w"> </span><span class="s2">"hadley"</span><span class="p">)</span></code></pre></figure>
<p>Then the package will be available remotely though:</p>
<ul>
<li><a href="https://cloud.opencpu.org/ocpu/github/hadley/plyr/">/ocpu/github/hadley/plyr/</a></li>
</ul>
<p>Try to see if you can access your own packages! Some of the usual suspects:</p>
<ul>
<li><a href="https://cloud.opencpu.org/ocpu/github/yihui/knitr/">/ocpu/github/yihui/knitr/</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/github/hadley/plyr/">/ocpu/github/hadley/plyr/</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/github/rstudio/markdown/">/ocpu/github/rstudio/markdown/</a></li>
<li><a href="https://cloud.opencpu.org/ocpu/github/ropensci/rplos/">/ocpu/github/ropensci/rplos/</a></li>
</ul>
<p>HTTP POST calls a function in any of these packages straight from github:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">#from ?llply</span>
curl https://cloud.opencpu.org/ocpu/github/hadley/plyr/R/llply/json <span class="nt">-d</span> <span class="s1">'.data=baseball&.fun=summary'</span>
<span class="c">#simple plot</span>
curl https://cloud.opencpu.org/ocpu/github/hadley/ggplot2/R/qplot <span class="nt">-d</span> <span class="s1">'x=[1,2,3,4,5]&y=[2,3,2,4,2]'</span></code></pre></figure>
<h1 id="publishing-opencpu-apps">Publishing OpenCPU apps</h1>
<p>An OpenCPU app is an R package which includes some web page(s) that call the R functions in the package using the OpenCPU API. Some public example apps are published on the <a href="https://github.com/opencpu">OpenCPU Github Repo</a>, but you can just as easily develop and publish apps by putting them on your own Github repository. For example: Scott Chamberlain has an (old) version of the <code class="language-plaintext highlighter-rouge">gitstats</code> app on his personal github repo at <a href="https://github.com/SChamberlain/gitstats"><code class="language-plaintext highlighter-rouge">github.com/SChamberlain/gitstats</code></a> We can access this version of the app directly on the OpenCPU cloud server using the corresponding url: <a href="https://cloud.opencpu.org/ocpu/github/SChamberlain/gitstats/www/"><code class="language-plaintext highlighter-rouge">/ocpu/github/SChamberlain/gitstats/www/</code></a></p>
<h1 id="final-note">Final note</h1>
<p>One final note: in the current implementation of OpenCPU, packages from Github are installed no more than once every 24 hours. So your most recent Github commits might not show up immediately. The recommended workflow is to use the OpenCPU local <a href="https://cloud.opencpu.org/download.html">single user server</a> to develop your package/app. Once it works locally, push your package to Github to make it available on the OpenCPU cloud server.</p>
Calling R functions through AJAX using opencpu.js2013-09-21T00:00:00+00:00https://www.opencpu.org/posts/getting-started-with-opencpu.js
<a href="https://www.opencpu.org/posts/getting-started-with-opencpu.js"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>The <a href="https://cloud.opencpu.org/jslib.html">opencpu.js</a> library builds on jQuery to call R functions through AJAX, straight from the browser. This makes it easy to embed R based computation or graphics in <a href="https://cloud.opencpu.org/apps.html">apps</a>. Moreover, asynchronous requests (which are native in Javascript) make parallelization a natural part of the application. This post introduces some of the basic features of the library.</p>
<h2 id="getting-started-with-opencpujs">Getting started with opencpu.js</h2>
<p>The <a href="https://github.com/jeroenooms/opencpu.js#readme">readme page</a> for opencpu.js has some brief documentation, but perhaps the easiest way to get started with opencpu.js is by example. The <a href="https://cloud.opencpu.org/apps.html">opencpu apps</a> page lists a couple of example apps that you can play around with. The source code for each app is available from the <a href="https://github.com/opencpu">opencpu github organization</a>, and each app is based on opencpu.js. The <a href="https://cloud.opencpu.org/ocpu/library/appdemo/www/">appdemo</a> app contains some pages with minimal examples illustrating the basic <code class="language-plaintext highlighter-rouge">opencpu.js</code> functionality. Like all OpenCPU apps, you can either use it on the public cloud server, or install for local use:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#install the appdemo app</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"appdemo"</span><span class="p">,</span><span class="w"> </span><span class="s2">"opencpu"</span><span class="p">)</span><span class="w">
</span><span class="c1">#load the app</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">opencpu</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/appdemo/www"</span><span class="p">)</span></code></pre></figure>
<h2 id="hello-world-calling-a-function">Hello World: calling a function</h2>
<p>The <a href="https://cloud.opencpu.org/ocpu/library/appdemo/www/hello.html">hello.html</a> page demonstrates how to call an R function that is included with the R package containing the app. In this example we call the R function named <a href="https://cloud.opencpu.org/ocpu/library/appdemo/R/hello">hello</a>. Navigate to the <a href="https://cloud.opencpu.org/ocpu/library/appdemo/www/hello.html">hello.html</a> page in your favorite browser and look at the html source code to see what is going on. The magic happens in these lines of javascript:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="c1">//read the value for 'myname'</span>
<span class="kd">var</span> <span class="nx">myname</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="dl">"</span><span class="s2">#namefield</span><span class="dl">"</span><span class="p">).</span><span class="nx">val</span><span class="p">();</span>
<span class="c1">//perform the request</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="nx">ocpu</span><span class="p">.</span><span class="nx">rpc</span><span class="p">(</span><span class="dl">"</span><span class="s2">hello</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
<span class="na">myname</span> <span class="p">:</span> <span class="nx">myname</span>
<span class="p">},</span> <span class="kd">function</span><span class="p">(</span><span class="nx">output</span><span class="p">){</span>
<span class="nx">$</span><span class="p">(</span><span class="dl">"</span><span class="s2">#output</span><span class="dl">"</span><span class="p">).</span><span class="nx">text</span><span class="p">(</span><span class="nx">output</span><span class="p">.</span><span class="nx">message</span><span class="p">);</span>
<span class="p">});</span></code></pre></figure>
<p>The first line is basic jQuery syntax and reads the value from the page element with id <code class="language-plaintext highlighter-rouge">namefield</code> down in the html. In the next line we use <code class="language-plaintext highlighter-rouge">ocpu.rpc</code> to call the R function <a href="https://cloud.opencpu.org/ocpu/library/appdemo/R/hello">hello</a> (included in the app package) and pass the value to the <code class="language-plaintext highlighter-rouge">myname</code> argument of the R function. The final argument is the callback handler: a function to (asynchronously) processes the output once the request has returned from the server. In this case our callback handler writes <code class="language-plaintext highlighter-rouge">output$message</code> value returned by our R function to the html field with id <code class="language-plaintext highlighter-rouge">output</code>.</p>
<p>The above is all that is needed to call R from Javascript in the browser. The remaining lines form this example:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="c1">//if R returns an error, alert the error message</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">fail</span><span class="p">(</span><span class="kd">function</span><span class="p">(){</span>
<span class="nx">alert</span><span class="p">(</span><span class="dl">"</span><span class="s2">Server error: </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">req</span><span class="p">.</span><span class="nx">responseText</span><span class="p">);</span>
<span class="p">});</span>
<span class="c1">//after request complete, re-enable the button </span>
<span class="nx">req</span><span class="p">.</span><span class="nx">always</span><span class="p">(</span><span class="kd">function</span><span class="p">(){</span>
<span class="nx">$</span><span class="p">(</span><span class="dl">"</span><span class="s2">#submitbutton</span><span class="dl">"</span><span class="p">).</span><span class="nx">removeAttr</span><span class="p">(</span><span class="dl">"</span><span class="s2">disabled</span><span class="dl">"</span><span class="p">)</span>
<span class="p">});</span></code></pre></figure>
<p>Web developers will immediately recognize this pattern: all functions in the opencpu.js library wrap around the jQuery <code class="language-plaintext highlighter-rouge">$.ajax</code> method and return the <code class="language-plaintext highlighter-rouge">jqXHR</code> object. Thereby you (the programmer) have full control over the request using all methods and properties from <a href="http://api.jquery.com/jQuery.ajax/">jQuery.ajax</a>. So you can register additional handlers to deal with errors or to add additional behavior after the request has completed (in the example to re-enable a button).</p>
<h2 id="making-a-plot">Making a plot</h2>
<p>The opencpu.js library also makes it easy to embed your R plots in a website. The <a href="https://cloud.opencpu.org/ocpu/library/appdemo/www/plot.html">plot.html</a> page illustrates this with a very simple example. Again, look at the source of the HTML page:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="c1">//create the plot area on the plotdiv element</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="dl">"</span><span class="s2">#plotdiv</span><span class="dl">"</span><span class="p">).</span><span class="nx">rplot</span><span class="p">(</span><span class="dl">"</span><span class="s2">randomplot</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
<span class="na">n</span> <span class="p">:</span> <span class="nx">nfield</span><span class="p">,</span>
<span class="na">dist</span> <span class="p">:</span> <span class="nx">distfield</span>
<span class="p">})</span></code></pre></figure>
<p>The syntax for is slightly different than when calling a function before: the plotting widget is implemented as a jQuery plugin and hence called on a dom element, usually an empty <code class="language-plaintext highlighter-rouge"><div></code>. In this case we call the R function <a href="https://cloud.opencpu.org/ocpu/library/appdemo/R/randomplot">randomplot</a> (included with the appdemo package) and pass arguments <code class="language-plaintext highlighter-rouge">n</code> and <code class="language-plaintext highlighter-rouge">dist</code>. Once completed, a png image of the plot is displayed in <code class="language-plaintext highlighter-rouge">#plotdiv</code> and links to pdf and svg images.</p>
<p>Real world examples of apps using <code class="language-plaintext highlighter-rouge">$.rplot</code> are <a href="https://cloud.opencpu.org/ocpu/library/nabel/www/">nabel</a>, <a href="https://cloud.opencpu.org/ocpu/library/gitstats/www/">gitstats</a> and <a href="https://cloud.opencpu.org/ocpu/library/stocks/www/">stocks</a>.</p>
<h2 id="uploading-a-file">Uploading a File</h2>
<p>In many statistical applications the user needs to provide some data, often in the form of a file. When using opencpu.js, calling an R function with a file works exactly the same as calling it with any other value. Look at the source code for <a href="https://cloud.opencpu.org/ocpu/library/appdemo/www/upload.html">upload.html</a> to see this in action.</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="c1">//arguments</span>
<span class="kd">var</span> <span class="nx">myheader</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="dl">"</span><span class="s2">#header</span><span class="dl">"</span><span class="p">).</span><span class="nx">val</span><span class="p">()</span> <span class="o">==</span> <span class="dl">"</span><span class="s2">true</span><span class="dl">"</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">myfile</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="dl">"</span><span class="s2">#csvfile</span><span class="dl">"</span><span class="p">)[</span><span class="mi">0</span><span class="p">].</span><span class="nx">files</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="c1">//perform the request</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="nx">ocpu</span><span class="p">.</span><span class="nx">rpc</span><span class="p">(</span><span class="dl">"</span><span class="s2">readcsvnew</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
<span class="na">file</span> <span class="p">:</span> <span class="nx">myfile</span><span class="p">,</span>
<span class="na">header</span> <span class="p">:</span> <span class="nx">myheader</span>
<span class="p">},</span> <span class="kd">function</span><span class="p">(</span><span class="nx">session</span><span class="p">){</span>
<span class="nx">alert</span><span class="p">(</span><span class="dl">"</span><span class="s2">success:</span><span class="se">\n</span><span class="dl">"</span> <span class="o">+</span> <span class="nx">location</span><span class="p">.</span><span class="nx">protocol</span> <span class="o">+</span> <span class="dl">"</span><span class="s2">//</span><span class="dl">"</span> <span class="o">+</span> <span class="nx">location</span><span class="p">.</span><span class="nx">host</span> <span class="o">+</span> <span class="nx">session</span><span class="p">.</span><span class="nx">getLoc</span><span class="p">())</span>
<span class="p">});</span></code></pre></figure>
<p>Basically for any <code class="language-plaintext highlighter-rouge"><input type="file"></code> HTML element we can pass the file to an R function using <code class="language-plaintext highlighter-rouge">$("#id")[0].files[0]</code> (note this requires HTML5 support). OpenCPU will then copy this file to the working directory of the R process and use the filename as the parameter value. The next section shows how we would actually use this object.</p>
<h2 id="simulating-state-by-chaining-function-calls">Simulating state by chaining function calls</h2>
<p>Thus far all examples contained a single R function call and we would either grab the output or some plot to display on the page. However in practice your application might involve several steps: the user uploads some data, specifies variables, fits a model on the data, etc.</p>
<p>The OpenCPU API is stateless. Clients do not have a private R process and each call to the server is independent of the previous one. Instead, the way you can introduce state is by chaining function calls: the OpenCPU server stores the return object from a function call, and you can pass a reference to such an object as a argument to subsequent function calls. This might sound cumbersome at first, but it results in well organized, scalable applications and makes asynchronous parallel requests a native feature of your application.</p>
<p>A simple example of this concept which builds on the previous example is illustrated in <a href="https://cloud.opencpu.org/ocpu/library/appdemo/www/chain.html">chain.html</a>. Because this example is a bit larger, the javascript code was placed in a seperate file called <a href="https://cloud.opencpu.org/ocpu/library/appdemo/www/chain.js">chain.js</a>. The example starts with:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="c1">//perform the request</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="nx">ocpu</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="dl">"</span><span class="s2">readcsvnew</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
<span class="na">file</span> <span class="p">:</span> <span class="nx">file</span><span class="p">,</span>
<span class="na">header</span> <span class="p">:</span> <span class="nx">header</span>
<span class="p">},</span> <span class="kd">function</span><span class="p">(</span><span class="nx">session</span><span class="p">){</span>
<span class="c1">//on success call printsummary()</span>
<span class="nx">printsummary</span><span class="p">(</span><span class="nx">session</span><span class="p">);</span>
<span class="p">});</span></code></pre></figure>
<p>This look very similar as before: <code class="language-plaintext highlighter-rouge">ocpu.call</code> is used to call the R function <a href="https://cloud.opencpu.org/ocpu/library/appdemo/R/readcsvnew">readcsvnew</a>. However this time the callback function calls another function by passing on the reference to the object returned by <code class="language-plaintext highlighter-rouge">readcsvnew</code> (which we called <code class="language-plaintext highlighter-rouge">session</code> in this example) The <code class="language-plaintext highlighter-rouge">printsummary</code> javascript function then uses this object for the argument <code class="language-plaintext highlighter-rouge">mydata</code> when calling the R function <a href="https://cloud.opencpu.org/ocpu/library/appdemo/R/printsummary">printsummary</a>:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">function</span> <span class="nx">printsummary</span><span class="p">(</span><span class="nx">mydata</span><span class="p">){</span>
<span class="c1">//perform the request</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="nx">ocpu</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="dl">"</span><span class="s2">printsummary</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
<span class="na">mydata</span> <span class="p">:</span> <span class="nx">mydata</span>
<span class="p">},</span> <span class="kd">function</span><span class="p">(</span><span class="nx">session</span><span class="p">){</span>
<span class="kd">var</span> <span class="nx">url</span> <span class="o">=</span> <span class="nx">session</span><span class="p">.</span><span class="nx">getLoc</span><span class="p">()</span> <span class="o">+</span> <span class="dl">"</span><span class="s2">console/text</span><span class="dl">"</span><span class="p">;</span>
<span class="nx">downloadfile</span><span class="p">(</span><span class="nx">url</span><span class="p">);</span>
<span class="p">}).</span><span class="nx">fail</span><span class="p">(</span><span class="kd">function</span><span class="p">(){</span>
<span class="nx">alert</span><span class="p">(</span><span class="dl">"</span><span class="s2">Server error: </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">req</span><span class="p">.</span><span class="nx">responseText</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span></code></pre></figure>
<p>This illustrates the concept of function chaining. We can keep going on and keep calling new functions and pass output from previous function calls as the argument. To see a real world example of this, try the <a href="https://cloud.opencpu.org/ocpu/library/mapapp/www/">mapapp</a> OpenCPU app.</p>
Implementing a DPU with OpenCPU2013-09-11T00:00:00+00:00https://www.opencpu.org/posts/implementing-data-processing-units-with-opencpu
<a href="https://www.opencpu.org/posts/implementing-data-processing-units-with-opencpu"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>One of the prime use cases in the design of OpenCPU has been the “Data Processing Unit”, for short: DPU.
A DPU is a modular, stateless data I/O unit which is called remotely by other software.
In the <a href="http://openmhealth.org/developers/key-architectual-abstractions/">OpenMHealth architecture</a>
a DPU must use JSON for data input and output, and is called over HTTPS. Below two simple examples.</p>
<h2 id="basic-example">Basic example</h2>
<p>Suppose your software needs to calculate a correlation between two vectors. In R we would use the <code class="language-plaintext highlighter-rouge">cor</code> function from the <code class="language-plaintext highlighter-rouge">stats</code>
package to do this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">cor</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">),</span><span class="w"> </span><span class="n">y</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">2</span><span class="p">));</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">-0.1042572</span></code></pre></figure>
<p>Using OpenCPU we can perform the same function call remotely just as easily:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/stats/R/cor/json <span class="nt">-d</span> <span class="s1">'x=[1,2,3,4,5]&y=[3,1,5,2,2]'</span>
<span class="o">[</span>
<span class="nt">-0</span>.10426
<span class="o">]</span></code></pre></figure>
<p>We can go full JSON by specifying the request <code class="language-plaintext highlighter-rouge">Content-type</code> to be <code class="language-plaintext highlighter-rouge">application/json</code>. This is exactly the same request and will yield the same output.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/stats/R/cor/json <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="nt">-d</span> <span class="s1">'{"x":[1,2,3,4,5],"y":[3,1,5,2,2]}'</span></code></pre></figure>
<p>Note that <code class="language-plaintext highlighter-rouge">curl</code> is used here for illustration only, your actual application could use whatever HTTP client library is available for the programming language at hand.</p>
<h2 id="another-example">Another example</h2>
<p>One real application for OpenMHealth required calculation of the total distance between a set of longitude-latitude coordinates as recorded by a mobile device.
Wikipedia tells us that the distance between two points on a sphere is calculated from their longitudes and latitudes using <a href="http://en.wikipedia.org/wiki/Haversine_formula">Haversine formula</a>.
The geosphere package has a function <code class="language-plaintext highlighter-rouge">distHaversine</code> <a href="https://cloud.opencpu.org/ocpu/library/geosphere/man/distHaversine/text">(help)</a>, <a href="https://cloud.opencpu.org/ocpu/library/geosphere/R/distHaversine/print">(source)</a> which implements this equation.</p>
<p>So we created <a href="https://github.com/openmhealth/dpu.mobility/">dpu.mobility</a> package with a function <code class="language-plaintext highlighter-rouge">geodistance</code> <a href="https://cloud.opencpu.org/ocpu/library/dpu.mobility/man/geodistance/text">(help)</a>, <a href="https://cloud.opencpu.org/ocpu/library/dpu.mobility/R/geodistance/print">(source)</a> that iterates
over a set of the locations to calculate the total distance among all points. We also added an option to smooth away outliers (caused by noisy GPS signal).
Now to calculate the distance from LA to NYC and back:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/dpu.mobility/R/geodistance/json <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="nt">-d</span> <span class="s1">'{"long":[-74.0064,-118.2430,-74.0064],"lat":[40.7142,34.0522,40.7142]}'</span></code></pre></figure>
<p>Or in miles:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl https://cloud.opencpu.org/ocpu/library/dpu.mobility/R/geodistance/json <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="nt">-d</span> <span class="s1">'{"long":[-74.0064,-118.2430,-74.0064],"lat":[40.7142,34.0522,40.7142],"unit":"miles"}'</span></code></pre></figure>
<h2 id="when-to-use-a-dpu">When to use a DPU</h2>
<p>So how is this useful? Suppose you are building a system or application and would like to embed some statistical functionality.
One solution is implement the required statistical methods yourself in the language at hand.
However for complex methods this is time consuming and your code might not be as reliable as what is available in R.
Another solution is to call R directly from the application language, using a bridge like RInside or JRI, rpy2, etc.
This might work, but managing R sessions, error handling, security, data I/O, etc can be painful. And if the R session crashes, so does your application.
Furthermore this means that each installation of the application must have a local copy of R and all required packages installed, which quickly becomes a maintenance nightmare.</p>
<p>If what you are doing fits the DPU paradigm, this might make a more elegant design.
Most programming languages these days know their way around http(s) and JSON.
Implement your statistical methods simply as an R function and have OpenCPU deal with management of sessions, security, JSON, etc.
A single OpenCPU cloud server serves all your application instances/users, which is cheap and easier to maintain.
The cloud server might considerably improve performance by caching requests and if your application becomes popular and you need to scale up to serve many simultaneous request per second, you just install a http load balancer with multiple back-end servers. No need to change any code :-)</p>
Knitr/Markdown OpenCPU App2013-08-30T00:00:00+00:00https://www.opencpu.org/posts/knitr-markdown-opencpu-app
<a href="https://www.opencpu.org/posts/knitr-markdown-opencpu-app"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>A new little OpenCPU app allows you to knit and markdown in the browser.
It has a fancy pants code editor which automatically updates the output after 3 seconds of inactivity.
It uses the <a href="http://ace.c9.io/">Ace</a> web editor with <a href="https://github.com/ajaxorg/ace-builds/blob/master/src/mode-r.js"><code>mode-r.js</code></a> (thanks to RStudio for making the latter available).</p>
<p>Like all OpenCPU apps, the source package lives in the <a href="https://github.com/opencpu">opencpu app repo</a> on github.
You can try it out on the <a href="https://cloud.opencpu.org/apps.html">public cloud server</a>, or run it locally:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#install the package</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"markdownapp"</span><span class="p">,</span><span class="w"> </span><span class="s2">"opencpu"</span><span class="p">)</span><span class="w">
</span><span class="c1">#open it in opencpu</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">opencpu</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/markdownapp/www"</span><span class="p">)</span></code></pre></figure>
<p>The app uses the knitr R package and a some standard javascript libraries.
What remains is a <a href="https://github.com/opencpu/markdownapp/blob/master/inst/www/index.html">few lines of javascript</a>
to call OpenCPU when the editor is inactive. The entire app was created in about an hour. Feel free to fork and modify :-)</p>
OpenCPU 1.0 release!2013-08-27T00:00:00+00:00https://www.opencpu.org/posts/opencpu-release-1.0
<a href="https://www.opencpu.org/posts/opencpu-release-1.0"><img alt="opencpu logo" src="https://www.opencpu.org/images/mediumlogo.jpg"></a>
<p>After more than 3 years of development, we release the first official version of the OpenCPU system.
Based on feedback and experiences from the beta series, OpenCPU version 1.0 has been rewritten entirely from scratch.
The result is simple and flexible API that is easier to understand yet more powerful than before.</p>
<p>With the new release also comes a new website and blog in which we will post tutorials and examples over the upcoming weeks/months.
This first post features some highlights to get your feet wet.</p>
<h2 id="the-package-api">The package API</h2>
<p>Try opening these URL’s in your browser to explore objects and manuals (help pages) from a package:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/library/
https://cloud.opencpu.org/ocpu/library/ggplot2/
https://cloud.opencpu.org/ocpu/library/ggplot2/R/
https://cloud.opencpu.org/ocpu/library/ggplot2/R/diamonds
https://cloud.opencpu.org/ocpu/library/ggplot2/R/mpg/json
https://cloud.opencpu.org/ocpu/library/ggplot2/R/mpg/csv
https://cloud.opencpu.org/ocpu/library/ggplot2/R/mpg/rda
https://cloud.opencpu.org/ocpu/library/ggplot2/R/qplot
https://cloud.opencpu.org/ocpu/library/ggplot2/man/
https://cloud.opencpu.org/ocpu/library/ggplot2/man/qplot/text
https://cloud.opencpu.org/ocpu/library/ggplot2/man/qplot/html
https://cloud.opencpu.org/ocpu/library/ggplot2/man/qplot/pdf
</code></pre></div></div>
<p>Or interface static files:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/library/MASS/DESCRIPTION
https://cloud.opencpu.org/ocpu/library/MASS/NEWS
https://cloud.opencpu.org/ocpu/library/MASS/scripts/
https://cloud.opencpu.org/ocpu/library/MASS/scripts/ch01.R
</code></pre></div></div>
<h2 id="external-repositories">External Repositories</h2>
<p>The <code class="language-plaintext highlighter-rouge">/ocpu/library/</code> API interfaces to packages which are installed in the global library on the server.
Want to try another package? With a little extra patience, you can open any package straight from cran or github:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/cran/JJcorr/
https://cloud.opencpu.org/ocpu/github/hadley/plyr/
</code></pre></div></div>
<p>Of course this will only work if the package installation is successful. When a package on an external repository
is accessed for the first time, the request might take quite a while because it is installed on the fly. But once
it is working, you can use it just like packages installed on the server.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://cloud.opencpu.org/ocpu/github/jeroenooms/gitstats/www/
</code></pre></div></div>
<p>This way you can share your own packages and apps without hosting a personal OpenCPU cloud server.</p>
<h2 id="running-a-function--script">Running a function / script</h2>
<p>The core feature of OpenCPU is the ability to call functions and run scripts (including sweave/knitr scripts).
To get started, you can use the <a href="https://cloud.opencpu.org/ocpu/test/">testing page</a> to poke around in the API.
Alternatively use <code class="language-plaintext highlighter-rouge">curl</code> to call OpenCPU from your command line:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">#run a script</span>
curl <span class="nt">-X</span> POST https://cloud.opencpu.org/ocpu/library/MASS/scripts/ch01.R
<span class="c">#call a function</span>
curl https://cloud.opencpu.org/ocpu/library/stats/R/rnorm <span class="nt">-d</span> <span class="s1">'n=10&mean=5'</span></code></pre></figure>
<p>A successful POST will always return a HTTP 201 response indicating the location of where to retrieve results from the execution (objects, graphics, files, stdout, etc)</p>
<h2 id="opencpu-apps">OpenCPU Apps</h2>
<p>One of the major improvements in OpenCPU 1.0 is improved support for apps.
An OpenCPU app is an R package which includes some web page(s) that call the R functions in the package using the OpenCPU API.
This makes a convenient way to develop, package and ship standalone R web applications.
Have a look at the <a href="https://cloud.opencpu.org/apps.html">example apps</a>.</p>
<h2 id="the-single-user-server">The single-user server</h2>
<p>OpenCPU 1.0 is available both as a cloud server, and single-user server. The latter will run inside an interactive
R session and is used to run and develop local apps.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s2">"opencpu"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">opencpu</span><span class="p">)</span></code></pre></figure>
<p>After installing OpenCPU, we install apps just like we would install a package:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="c1">#gitstats app</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"gitstats"</span><span class="p">,</span><span class="w"> </span><span class="s2">"opencpu"</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/gitstats/www"</span><span class="p">)</span><span class="w">
</span><span class="c1">#stocks app</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"stocks"</span><span class="p">,</span><span class="w"> </span><span class="s2">"opencpu"</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/stocks/www"</span><span class="p">)</span><span class="w">
</span><span class="c1">#nabel app</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"nabel"</span><span class="p">,</span><span class="w"> </span><span class="s2">"opencpu"</span><span class="p">)</span><span class="w">
</span><span class="n">opencpu</span><span class="o">$</span><span class="n">browse</span><span class="p">(</span><span class="s2">"/library/nabel/www"</span><span class="p">)</span></code></pre></figure>