Menu Home

Setting up RStudio Server quickly on Amazon EC2

I have recently been working on projects using Amazon EC2 (elastic compute cloud), and RStudio Server. I thought I would share some of my working notes.

Amazon EC2 supplies near instant access to on-demand disposable computing in a variety of sizes (billed in hours). RStudio Server supplies an interactive user interface to your remote R environment that is nearly indistinguishable from a local RStudio console. The idea is: for a few dollars you can work interactively on R tasks requiring hundreds of GB of memory and tens of CPUs and GPUs.

If you are already an Amazon EC2 user with some Unix experience it is very easy to quickly stand up a powerful R environment, which is what I will demonstrate in this note.

To follow these notes you must already have an Amazon EC2 account, some experience using the AWS (Amazon Web Services) console, and managing ssh key pairs for use with EC2. Start by copying down the path where you have stored your secret half of your ssh key pair (your key should always be stored somewhere safe, such as a private and encrypted disk volume; if you don’t have a key pair you will be prompted create one during machine launch).

You can set up an RStudio Server instance as follows.

Choose Ubuntu Server as your AMI (Amazon Machine Image) type. This step is choosing your operating system, the script we will use was developed for apt package management– so we suggest using the Ubuntu operating system.


Then choose your hardware or machine type. I will show a t2.micro instance (1 virtual CPU, 1 GB memory), but there are a lot of bigger machine types available (including up to 96 VCPUs, hundreds of GB memory, and GPU compute instances specialized for deep learning tasks).


Now copy the IPv4 DNS name Amazon assigns to your instance (as shown below).


In our case we have:

  • Path to key: /Volumes/Private/Accounts/wvdbkp.pem.txt
  • IPv4 DNS name:

Now download and run the following bash script on your local (or client) machine (this is assuming you have a bash shell and Unix components, which are available on OSX, Linux, BSD, and even Windows; we have only tested this from an OSX client).

In our case we run the script in a bash shell with the arguments as follows:

   bash confEc2RServer.bash 

You would run the script with your own ssh key path and your server IPv4 DNS name.

The script combines some ideas from Deep Learning with R, François Chollet with J. J. Allaire, Manning 2018 and Jeremy Howard’s Practical Deep Learning For Coders with our own experiences working with EC2. The script will produce output for about 3 minutes, when it stops producing output (but is still running) it has switched from installation to running a ssh tunnel to direct web-requests targeting your local machine to appear as local requests on the remote server.

At this point you direct your web browser to and login with the user name “ruser” and password “ruser“.

At this point you should see a RStudio Server Console and you should be ready to work.


The instance type we have configured includes a local PostgreSQL database (user name “ruser” and password “ruser“). Both the Web-Server and database default to only accepting connections that are considered “local” by the remote server. We are able to access the RStudio Server Console through our ssh tunnel, and the database is only available to processes local to the server. If you end the script you are running on your client machine, you close the ssh tunnel (and lose access to the remote server). We haven’t configured any GPU features such as CUDA, TensorFlow, or Keras (however that just a matter of picking a deep learning AMI and adding a couple of steps and available from the appendix of Deep Learning with R).

And that is it. Don’t forget to copy results off the server when you are done and to dispose of the server by moving its instance state to “Terminate” in the AWS console (this frees the virtual machine, and usually destroys all storage associated with the machine; this is critical to do so that you don’t experience continuing fees for a machine you are done with).


Categories: Tutorials

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

6 replies

  1. After logging into RStudio the first steps I usually take are: fix some RStudio preferences (such as not saving/loading workspaces and no in-line Markdown results), update the packages, install some packages, and confirm I can connect to the local PostgreSQL database:

    update.packages(ask = FALSE, checkBuilt = TRUE)
    install.packages(c("DBI", "RPostgres"))
    db <- DBI::dbConnect(RPostgres::Postgres(),
                         host = 'localhost',
                         port = 5432,
                         user = 'ruser',
                         password = 'ruser')
    # <PqConnection> ruser@localhost:5432
    1. That is a neat resource, thanks.

      I took a quick look at the bootstrap script and it looks really neat ( I downloaded it by changing the S3 path s3://aws-bigdata-blog/artifacts/aws-blog-emr-rstudio-sparklyr/ to a https path: ).

      It also referenced some interesting projects: and .

  2. A note. If you move your server state to “STOP” instead of “TERMINATE” it can be re-started in the state it was left off (possibly with a different IP address, but should have same DNS name). You will still accrue some charges for storage. When re-starting such a machine you don’t re-run the whole script, just the final portion:

    ssh -i "${pempath}" \
      -o UserKnownHostsFile=/dev/null \
      -o StrictHostKeyChecking=no -N -L 8787: \

    Where ${pempath} is replaced with the path to your SSH key, and ${ec2target} is replaced with the server’s DNS name.

  3. Note one should always be worried about security.

    However, the weak passwords used here should not be a problem as the default configurations of Ubuntu ssh requires publickey authentication (does not accept passwords from remote connections), PostgreSQL defaults to only accepting remote connections, and we have re-configured RStudio Server to only accept local connections (obviously other services can be issues). The cacheing actions of ssh-agent will obscure this on your machine, but other machines attempting to connect with the “ruser” password will see something like the following:

    $ ssh
    Permission denied (publickey).