quickstart

Author

Dhrumil Mehta

Python + R

A workflow for using Python and R in the same quarto notebook!

GitHub Repo: https://github.com/dmil/quarto-quickstart

Prefer to work in Jupyter notebooks? Check out this alternate setup that allows you to use Python and R together in a Jupyter notebook.

Installation

When you run this notebook, it should ask to install the reticulate package. Once that is installed correctly, you should be able to run both R and Python cells.

⚠️ Caveat: to make this work, you may need to re-install Python.

This setup only works if Python was installed with the --enable-shared option. I had to first uninstall python and then re-install it with the --enable-shared option. Since I use pyenv, I installed it as follows:

env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.11.2

Setup

Note: The following cell creates a python virtualenv, installs some packages (like pandas and scikit-learn).

https://rstudio.github.io/reticulate/articles/versions.html

library(reticulate)

venv_path <- paste0(getwd(), '/.venv')

# create virtualenv and install python libraries
python_libraries <- c("pandas", "scikit-learn")
virtualenv_create(venv_path, packages=python_libraries)

# use virtualenv
use_virtualenv(venv_path, required=TRUE)

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code.

You can embed R code like this:

# Imports in R
require('tidyverse')

And python code like this:

# Imports in Python
import pandas as pd
import numpy as np

Sharing variables between Python and R

The following code defines a dataframe df in Python with random data points.

# create a dataframe of random numbers with two columns, A and B
df = pd.DataFrame(
    np.random.randint(0,100,size=(100, 2)), columns=list('AB'))

# display first 5 rows
df.head(5)
    A   B
0  55  88
1  42   0
2  81  86
3   9  26
4  99  12

The following is an R cell. The df variable defined above is available as py$df. Python variables should be accessible in R prefixed with py$ .

title <- "Random points plotted"

ggplot(py$df) +
    aes(x=A, y=B) +
    geom_point() +
    theme_minimal() +
    ggtitle(title)

R variables should be accessible in Python prefixed with r.

r.title
'Random points plotted'

See the reticulate cheat sheet for more details on how to use Python and R together.

Why I like this setup

  • It lets me do some things in Python (hitting APIs, scraping, etc)

  • It lets do other things in R (statistical calculations, visualization, etc)

  • I have found it useful to learn R (or Python) while on deadline. When I was learning R it allowed me to switch back to Python if I was stuck on a step and the deadline was approaching, and then switch back to R to finish up.

  • I have also found it helpful for teaching R to students who are already familiar with Python. I’m sure it can also be used well for the reverse.

  • If your code starts to get complicated, you can move Python functions into .py files in the repo and then call those functions from the notebook. You can also define functions in R and save them into a .R file which you can call from this notebook. Storing code away in functions in separate files let me keep the notebooks very clean and readable and proved to be a good way to annotate my process for the quantitative edit.