Why Use Docker with R? A DevOps Perspective
October 16, 2017
There have been several blog posts going around about why one would use Docker with R. In this post I’ll try to add a DevOps point of view and explain how containerizing R is used in the context of the OpenCPU system for building and deploying R servers.
Has anyone in the #rstats world written really well about the *why* of their use of Docker, as opposed to the the *how*?— Jenny Bryan (@JennyBryan) September 29, 2017
1: Easy Development
The flagship of the OpenCPU system is the OpenCPU server: a mature and powerful Linux stack for embedding R in systems and applications. Because OpenCPU is completely open source we can build and ship on DockerHub. A ready-to-go linux server with both OpenCPU and RStudio can be started using the following (use port 8004 or 80):
docker run -t -p 8004:8004 opencpu/rstudio
Now simply open http://localhost:8004/ocpu/ and
http://localhost:8004/rstudio/ in your browser!
Login via rstudio with user:
opencpu) to build or install apps.
See the readme for more info.
Docker makes it easy to get started with OpenCPU. The container gives you the full
flexibility of a Linux box, without the need to install anything on your system.
You can install packages or apps via rstudio server, or use
docker exec to a
root shell on the running server:
# Lookup the container ID docker ps # Drop a shell docker exec -i -t eec1cdae3228 /bin/bash
From the shell you can install additional software in the server, customize the apache2 httpd config (auth, proxies, etc), tweak R options, optimize performance by preloading data or packages, etc.
2: Shipping and Deployment via DockerHub
The most powerful use if Docker is shipping and deploying applications via DockerHub. To create a fully standalone application container, simply use a standard opencpu image and add your app.
For the purpose of this blog post I have wrapped up some of the example apps as docker containers by adding a very simple
Dockerfile to each repository. For example the nabel app has a Dockerfile that contains the following:
FROM opencpu/base RUN R -e 'devtools::install_github("rwebapps/nabel")'
It takes the standard opencpu/base image and then installs the nabel app from the Github repository. The result is a completeley isolated, standalone application. The application can be started by anyone using e.g:
docker run -d -p 8004:8004 rwebapps/nabel
-d daemonizes on port 8004. Now open the app via: http://localhost:8004/ocpu/library/nabel. Obviously you can tweak the
Dockerfile to install whatever extra software or settings you need
for your application.
Containerized deployment shows the true power of docker: it allows for shipping fully self contained appliations that work out of the box, without installing any software or relying on paid hosting services. If you do prefer professional hosting, there are many companies that will gladly host docker applications for you on scalable infrastructure.
3 Cross Platform Building
There is a third way Docker is used for OpenCPU. At each release we build
opencpu-server installation package for half a dozen operating systems, which
get published on https://archive.opencpu.org.
This process has been fully automated using DockerHub. The following images automatically
build the enitre stack from source:
DockerHub automatically rebuilds this images when a new release is published on Github.
All that is left to do is run a script
which pull down the images and copies the
opencpu-server binaries to the archive server.