Handling secrets and pinning files to Posit Connect

Keep your secrets secret
code
data
security
Author

Lisa

Published

October 1, 2024

Imagine a world where you can sleep peacefully at night knowing that your secrets (that you need for connecting to data sources) are being handled responsibly and will never leak.

Built in with your version control

Github has secret scanning for enterprise accounts or public repositories: https://docs.github.com/en/code-security/secret-scanning/introduction/about-secret-scanning

Environment variables

When working with pulling data from secure databases or other sources a developer might find themselves in a situation of needing to provide very sensitive information, such as a password or a token, in order to access the data that is needed or to successfully deploy a project. Handling those secrets in way that doesn’t expose them in the code directly is critical and where using environmental variable’s for securing sensitive variables is strongly recommended.

Additionally there may be parameters that are often needed that can be accessed as a variable more easily rather than having to type in every time.

For both of these cases knowing how environment variables can be leveraged can be very rewarding and it is surprising how little effort it can to take to set up.

Python: os package

import os

# Setting a new environment variable
os.environ["API_KEY"] = "YOUR_API_KEY"

# Retrieving the environment variable
var = os.environ["variable_name"]

R: usethis package

usethis has a function for creating and editing the .Renviron file:

library(usethis)

# Edit the global .Renviron file
usethis::edit_r_environ()

# Edit the project specific .Renviron file
usethis::edit_r_environ(scope = "project")

Add the variables to that file in the format variable_name = “variable_value” and save it. Restart the session so the new environment variables will be loaded with ctrl shift f10 or through the RStudio IDE with session -> restart R.

# Saved variables can be accessed with:
variable_name <- Sys.getenv("variable_name")

While it’s recommended to add env vars to the environ file and not use in your code (otherwise it defeats the point), you can set an env var on the fly with:

Sys.setenv("name" = "value")

R: Working with the .Renviron file

When R starts it loads a bunch of variables, settings, and configs for the user. This is really powerful and some of the magic for how it can work so apparently seamlessly.

However for power users we can leverage these behind the scenes config files so that we can include such things as variables in our project without including it in our code. The .Renviron file is the one most commonly interacted with for adding sensitive variables to a project in order to protect them from being exposed in the code.

With increased use of these behind the scenes config files and the growing direction of arranging code into projects there was the development of giving, on startup, having multiple options for each config file that can be loaded depending on what the user specifies. Broadly speaking there are two levels of config files: - User - Project

On startup, since R is trying to make things as seamless as possible for the user, it will use some logic to figure out which config to use. It will assume that if a project level config exists it should load that one (and not any others). If that project level config doesn’t exist, then it will default to the user level config. For more details on the different config files and the nuances see Managing R with .Rprofile, .Renviron, Rprofile.site, Renviron.site, rsession.conf, and repos.conf.

Just to re-iterate the key takeaway: When in doubt note that the project level file is given preference over user level config files. Only if the project level config file doesn’t exist will the user level file be sourced/pulled in.

There is a really excellent overview of R’s startup process here but in short it can be thought of this way:

Example with a user level .Renviron config file

usethis has a function for creating and editing the .Renviron file

library(usethis)
usethis::edit_r_environ()

Add the variables to that file in the format variable_name = "variable_value" and save it. Restart the session so the new environment variables will be loaded with ctrl shift f10 or through the RStudio IDE

Saved variables can be accessed with:

variable_name <- Sys.getenv("variable_name")

When working in a more complex environment structure where separate project, site, and user environments are being support this support article has useful information with a deeper dive into R’s startup here.

Example with a project level .Renviron config file

Storing secrets securely can be done using the edit_r_environ function from the usethis package. For more overview see this overview.

Example:

library(usethis)
usethis::edit_r_environ(scope = "project")

Accessing those stored parameters later can be done using Sys.getenv("DB_NAME").

Be sure to add the project level .Renviron file to your .gitignore so you aren’t exposing secrets when code is being saved to your git repository. Similarly this can be done with the edit_git_ignore(scope = c("user", "project")) function. For more best practices see securing credentials.

After updating these files the project should be closed and re-opened for any additions to be pulled in. One way to do this is through session -> restart R (ctrl-shift-f10).

Gitignore

While typically explicitly listing the file name is the desired addition, wildcards can be added to exclude a type of file. For example: *.html.

Example excluding a number of pieces that would be undesirable to check into version control:

.gitignore
# History files
.Rhistory
.Rapp.history

# Session Data files
.RData

# Example code in package build process
*-Ex.R

# Output files from R CMD build
/*.tar.gz

# Output files from R CMD check
/*.Rcheck/

# RStudio files
.Rproj.user/

# produced vignettes
vignettes/*.html
vignettes/*.pdf

# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth

# knitr and R markdown default cache directories
/*_cache/
/cache/

# Temporary files created by R markdown
*.utf8.md
*.knit.md

# Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html
rsconnect/

# Deployment details from rsconnect-python
rsconnect-python/

# Temporary files
.DS_Store
__pycache__
.ipynb_checkpoints

rmarkdown-notebook/flights.csv

.venv
venv
.env
.Rprofile

/.luarc.json

# OS and python artifacts
.DS_Store
__pycache__/
*.py[cod]

# docs artifacts
/docs/_site

# Package Manager license
/ppm.lic

Example with project level github secrets for environment variables

Another approach, particularly useful when automating testing and deployments using github actions, is to include the environment variables as secrets. Once this has been added through the git UI for the project they can then be referenced in the relevant deployment .yaml file with something like CONNECT_ENV_SET_ZD_USER: ${{ secrets.ZD_USER }}. In the R scripts they will be referenced as usual with something like Sys.getenv("DB_NAME").

References for adding environment variables through the Connect UI

Starting with version 1.6, RStudio Connect allows Environment Variables. The variables are encrypted on-disk, and in-memory.

This can be done at the project level with securing deployment through the Connect UI.

Handling secret files

Options:

  • base64 encode the json string, set an environment variable, read the env var, and base64 decode it - If it fits in an env var, you could always cache it from the env var to your temp dir right before you use
  • Save it somewhere on Connect, let developers know the path
  • Use a supervisor script to save into the sandbox environment
  • Possibly save it as a pin?

Useful linux commands and generating a dummy key file

# cd ~/.ssh
openssl genrsa -out test_key 4096
openssl rsa -in test_key -pubout -out test_key.pub
openssl pkcs8 -topk8 -inform pem -in test_key -outform PEM -v2 aes-256-cbc -out test_key.p8
  • The second command generate a RDS private key (unencrypted). https://www.openssl.org/docs/man1.1.1/man1/openssl-genrsa.html
  • The third command generate a public key for the private key above.
  • The fourth command generate an encrypted private key using a cipher aes-256-cbc. Please refer the following URL for more about AES 256 CBC.https://datatracker.ietf.org/doc/html/rfc3602
# Set permissions 
chmod -R 400 ~/.ssh/mykey.pem 

# Add key to ssh-agent 
ssh-agent -s
eval `ssh-agent -s`
ssh-add ~/.ssh/mykey.pem

If you already have a ssh key and want to get the key.pub public key info:

ssh-keygen -y -f ~/.ssh/mykey.pem
ssh-keygen -y -f ~/.ssh/mykey.pem > key.pub

Pin the key to Connect

Setup

Env variables that need to be set are:

CONNECT_SERVER=<server, eg https://colorado.posit.co>
CONNECT_API_KEY=<API key from Connect server>

Code for pinning a file

library(dplyr)
library(tidyverse)
library(stringr)
library(readr)
library(pins)
library(rsconnect)
library(usethis)

#Check our environment variables
# usethis::edit_r_environ()
#Needs: 
# CONNECT_SERVER=<server>
# CONNECT_API_KEY=<API key from Connect server>

board <- board_connect(auth = "envvar")

board %>% pin_upload(paths="test_key", name="test_key")
board %>% pin_upload(paths="test_key.p8", name="test_key-p8")
board %>% pin_upload(paths="test_key.pub", name="test_key-pub")

cache_path=board %>% pin_download(name="test_key")
print(cache_path)

Python

Python pins package has parity: https://rstudio.github.io/pins-python/reference/pin_upload.html#pins.boards.BaseBoard.pin_upload

Encode key file as base64 json string

librar(readr)
library(openssl)

# Locally, encode the key and save as environment variable
key = openssl::base64_encode(readr::read_file(file="test_key.pub"))

# Save as env var
Sys.setenv("TEST_SSH_KEY" = "123")
test_key_set <- Sys.getenv("TEST_SSH_KEY")

# Create a tempfile in the sandboxed current location
cached_key <- tempfile()

# Cache the SSH file from the environment variable
readr::write_file(openssl::base64_decode(Sys.getenv("TEST_SSH_KEY")), file = cached_key)

it may be useful to nest in an if statement so that this only happens on Connect, but on Workbench it will still use the managed credentials:

# Use SSH key if on Connect, otherwise use managed credentials
if (Sys.getenv("RSTUDIO_PRODUCT") == "CONNECT"){
}