Imagine a world where you can sleep peacefully at night knowing that your secrets (that you need for connecting to data sources) are being handled responsibly and will never leak.
Built in with your version control
Github has secret scanning for enterprise accounts or public repositories: https://docs.github.com/en/code-security/secret-scanning/introduction/about-secret-scanning
Environment variables
When working with pulling data from secure databases or other sources a developer might find themselves in a situation of needing to provide very sensitive information, such as a password or a token, in order to access the data that is needed or to successfully deploy a project. Handling those secrets in way that doesn’t expose them in the code directly is critical and where using environmental variable’s for securing sensitive variables is strongly recommended.
Additionally there may be parameters that are often needed that can be accessed as a variable more easily rather than having to type in every time.
For both of these cases knowing how environment variables can be leveraged can be very rewarding and it is surprising how little effort it can to take to set up.
Python: os
package
import os
# Setting a new environment variable
"API_KEY"] = "YOUR_API_KEY"
os.environ[
# Retrieving the environment variable
= os.environ["variable_name"] var
R: usethis
package
usethis
has a function for creating and editing the .Renviron file:
library(usethis)
# Edit the global .Renviron file
::edit_r_environ()
usethis
# Edit the project specific .Renviron file
::edit_r_environ(scope = "project") usethis
Add the variables to that file in the format variable_name = “variable_value” and save it. Restart the session so the new environment variables will be loaded with ctrl shift f10 or through the RStudio IDE with session -> restart R.
# Saved variables can be accessed with:
<- Sys.getenv("variable_name") variable_name
While it’s recommended to add env vars to the environ file and not use in your code (otherwise it defeats the point), you can set an env var on the fly with:
Sys.setenv("name" = "value")
R: Working with the .Renviron file
When R starts it loads a bunch of variables, settings, and configs for the user. This is really powerful and some of the magic for how it can work so apparently seamlessly.
However for power users we can leverage these behind the scenes config files so that we can include such things as variables in our project without including it in our code. The .Renviron file is the one most commonly interacted with for adding sensitive variables to a project in order to protect them from being exposed in the code.
With increased use of these behind the scenes config files and the growing direction of arranging code into projects there was the development of giving, on startup, having multiple options for each config file that can be loaded depending on what the user specifies. Broadly speaking there are two levels of config files: - User - Project
On startup, since R is trying to make things as seamless as possible for the user, it will use some logic to figure out which config to use. It will assume that if a project level config exists it should load that one (and not any others). If that project level config doesn’t exist, then it will default to the user level config. For more details on the different config files and the nuances see Managing R with .Rprofile, .Renviron, Rprofile.site, Renviron.site, rsession.conf, and repos.conf.
Just to re-iterate the key takeaway: When in doubt note that the project level file is given preference over user level config files. Only if the project level config file doesn’t exist will the user level file be sourced/pulled in.
There is a really excellent overview of R’s startup process here but in short it can be thought of this way:
Example with a user level .Renviron config file
usethis has a function for creating and editing the .Renviron file
library(usethis)
usethis::edit_r_environ()
Add the variables to that file in the format variable_name = "variable_value"
and save it. Restart the session so the new environment variables will be loaded with ctrl shift f10 or through the RStudio IDE
Saved variables can be accessed with:
variable_name <- Sys.getenv("variable_name")
When working in a more complex environment structure where separate project, site, and user environments are being support this support article has useful information with a deeper dive into R’s startup here.
Example with a project level .Renviron config file
Storing secrets securely can be done using the edit_r_environ function from the usethis package. For more overview see this overview.
Example:
library(usethis)
usethis::edit_r_environ(scope = "project")
Accessing those stored parameters later can be done using Sys.getenv("DB_NAME")
.
Be sure to add the project level .Renviron file to your .gitignore so you aren’t exposing secrets when code is being saved to your git repository. Similarly this can be done with the edit_git_ignore(scope = c("user", "project"))
function. For more best practices see securing credentials.
After updating these files the project should be closed and re-opened for any additions to be pulled in. One way to do this is through session -> restart R (ctrl-shift-f10).
Gitignore
While typically explicitly listing the file name is the desired addition, wildcards can be added to exclude a type of file. For example: *.html
.
Example excluding a number of pieces that would be undesirable to check into version control:
.gitignore
# History files
.Rhistory
.Rapp.history
# Session Data files
.RData
# Example code in package build process
*-Ex.R
# Output files from R CMD build
/*.tar.gz
# Output files from R CMD check
/*.Rcheck/
# RStudio files
.Rproj.user/
# produced vignettes
vignettes/*.html
vignettes/*.pdf
# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth
# knitr and R markdown default cache directories
/*_cache/
/cache/
# Temporary files created by R markdown
*.utf8.md
*.knit.md
# Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html
rsconnect/
# Deployment details from rsconnect-python
rsconnect-python/
# Temporary files
.DS_Store
__pycache__
.ipynb_checkpoints
rmarkdown-notebook/flights.csv
.venv
venv
.env
.Rprofile
/.luarc.json
# OS and python artifacts
.DS_Store
__pycache__/
*.py[cod]
# docs artifacts
/docs/_site
# Package Manager license
/ppm.lic
Example with project level github secrets for environment variables
Another approach, particularly useful when automating testing and deployments using github actions, is to include the environment variables as secrets. Once this has been added through the git UI for the project they can then be referenced in the relevant deployment .yaml file with something like CONNECT_ENV_SET_ZD_USER: ${{ secrets.ZD_USER }}
. In the R scripts they will be referenced as usual with something like Sys.getenv("DB_NAME")
.
References for adding environment variables through the Connect UI
Starting with version 1.6, RStudio Connect allows Environment Variables. The variables are encrypted on-disk, and in-memory.
This can be done at the project level with securing deployment through the Connect UI.
Handling secret files
Options:
- base64 encode the json string, set an environment variable, read the env var, and base64 decode it - If it fits in an env var, you could always cache it from the env var to your temp dir right before you use
- Save it somewhere on Connect, let developers know the path
- Use a supervisor script to save into the sandbox environment
- Possibly save it as a pin?
Useful linux commands and generating a dummy key file
# cd ~/.ssh
openssl genrsa -out test_key 4096
openssl rsa -in test_key -pubout -out test_key.pub
openssl pkcs8 -topk8 -inform pem -in test_key -outform PEM -v2 aes-256-cbc -out test_key.p8
- The second command generate a RDS private key (unencrypted). https://www.openssl.org/docs/man1.1.1/man1/openssl-genrsa.html
- The third command generate a public key for the private key above.
- The fourth command generate an encrypted private key using a cipher aes-256-cbc. Please refer the following URL for more about AES 256 CBC.https://datatracker.ietf.org/doc/html/rfc3602
# Set permissions
chmod -R 400 ~/.ssh/mykey.pem
# Add key to ssh-agent
ssh-agent -s
eval `ssh-agent -s`
ssh-add ~/.ssh/mykey.pem
If you already have a ssh key and want to get the key.pub public key info:
ssh-keygen -y -f ~/.ssh/mykey.pem
ssh-keygen -y -f ~/.ssh/mykey.pem > key.pub
Pin the key to Connect
Setup
Env variables that need to be set are:
CONNECT_SERVER=<server, eg https://colorado.posit.co>
CONNECT_API_KEY=<API key from Connect server>
Code for pinning a file
library(dplyr)
library(tidyverse)
library(stringr)
library(readr)
library(pins)
library(rsconnect)
library(usethis)
#Check our environment variables
# usethis::edit_r_environ()
#Needs:
# CONNECT_SERVER=<server>
# CONNECT_API_KEY=<API key from Connect server>
<- board_connect(auth = "envvar")
board
%>% pin_upload(paths="test_key", name="test_key")
board %>% pin_upload(paths="test_key.p8", name="test_key-p8")
board %>% pin_upload(paths="test_key.pub", name="test_key-pub")
board
=board %>% pin_download(name="test_key")
cache_pathprint(cache_path)
Python
Python pins package has parity: https://rstudio.github.io/pins-python/reference/pin_upload.html#pins.boards.BaseBoard.pin_upload
Encode key file as base64 json string
librar(readr)
library(openssl)
# Locally, encode the key and save as environment variable
= openssl::base64_encode(readr::read_file(file="test_key.pub"))
key
# Save as env var
Sys.setenv("TEST_SSH_KEY" = "123")
<- Sys.getenv("TEST_SSH_KEY")
test_key_set
# Create a tempfile in the sandboxed current location
<- tempfile()
cached_key
# Cache the SSH file from the environment variable
::write_file(openssl::base64_decode(Sys.getenv("TEST_SSH_KEY")), file = cached_key) readr
it may be useful to nest in an if statement so that this only happens on Connect, but on Workbench it will still use the managed credentials:
# Use SSH key if on Connect, otherwise use managed credentials
if (Sys.getenv("RSTUDIO_PRODUCT") == "CONNECT"){
}