Overview
Setting the repository for R and Python package installations is a critical part of having a secure system. Malicious package attacks are a big issue, and domain squatting attacks are a new-ish vector for these to get in the door.
Having a controlled repository with vulnerable package blocking is critical. But it’s only useful if users are actually using the repository. When using something like Package Manager you’ll want users to install packages from there, rather than the broader internet.
At a glance
R repository
The best pattern is to configure the repository across R sessions using R config options, not RStudio configs. A shared site library can be created (leveraging Rprofile.site and Renviron.site for example) or using renv with a shared renv package cache (maintaining reproduceability through the renv.lock file).
The Renviron.site would override the Rprofile.site setting and therefore may be more robust.
Option 1: Renviron.site
Referencing: https://github.com/sol-eng/singularity-rstudio/blob/main/data/workbench/scripts/run.R
Create a Renviron.site file and define:
/opt/R/Some-R-Version/lib/R/etc/Renviron.site
RENV_PATHS_PREFIX_AUTO=TRUE
RENV_PATHS_CACHE=/scratch/renv
R_LIBS_SITE=R_LIBS_SITE=${R_LIBS_SITE-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib64/R/library:/usr/share/R/library'}Option 2: .Rprofile.site
Set .libPaths() in .Rprofile.site:
/opt/R/Some-R-Version/lib/R/etc/Rprofile.site
options(repos = c(CRAN = "https://packagemanager.posit.co/all/latest"))
if (interactive()) {
options(width = 120)
}
if (Sys.info()[["sysname"]] == "Windows") {
Sys.setenv(RENV_DOWNLOAD_METHOD = "curl")
}
if ("folder" %in% tolower(list.files("C:/"))) {
if (!"Rlib" %in% list.files("C:/username/")) {
print("Creating Rlib folder")
dir.create("C:/username/Rlib",mode = "0777",recursive = T)
}
cat("\033[0;32;1mSetting local user lib\033[0m\n")
.libPaths(c("C:/username/Rlib" , .libPaths() ) )
} else {
cat("\033[0;33;1mYou should consider getting the access right so we can put your local R-lib there, instead of OneDrive.\033[0m\n")
}Test
Test this by running and checking the outputs of:
.libPaths()options()$repos
Python repository
Add the following lines to /etc/pip.conf:
/etc/pip.conf
[global]
timeout = 60
index-url = https://pkg.demo.posit.team/blocked-python/latest/simple
trusted-host = pkg.demo.posit.teamLonger explanation and considerations
Blocking access to cran and/or pypi
While not needed in all cases, an admin may want to block direct access to cran or pypi in order to force all package installs to go through the secured repository. This typically only makes sense at the corporate level.
Firewall rules, DNS filtering, and/or web proxy servers can be set to restrict which sites are accessible by users within the network and from specific devices. Configs can also be installed into users machines to access a specific repository URL, for example on windows to use group policies to push environment elements for members of the AD domain.
Regardless of method chosen, having a clear definition of acceptable use that emphasizes why these steps were taken can a long way to getting users onboard.
R Repository
Startup behavior of R when loading package environment details
R Startup behavior (funny): https://rstats.wtf/r-startup.html
Credit: This section was largely taken from an internal Posit resource of unknown authorship
From bash it is different than from the editor.
See here for the official startup docs from R.
R from the command line/bash will take the user’s environment. It will not read any additional bash files during start-up (which the RStudio products will do). It will still read in the R startup files (Renviron/Rprofile).
Prior to loading the R session from the bash shell any commands in this file will be read and execute commands if it exists: /etc/profile
Next, the the first of the following files that exists and is readable will have commands executed from (only one of these files will be read and executed):
~/.bash_profile~/.bash_login~/.profile
R then always loads the following (in order):
R_HOME/etc/Renviron.site(set for all users).Renviron- user-specific, typically in the user’s home directory, but can be elsewhere (for instance, in a Project folder)R_HOME/etc/Rprofile.site(set for all users).Rprofile- user-specific, typically in the user’s home directory, but can be elsewhere (for instance, in a Project folder)
Beyond this, what gets put in the environment depends on the product.
RStudio Server / Workbench: before any of this executes, RStudio Server will first include the contents of /etc/rstudio/rsession-profile if it exists. It will also add anything set in rsession-ld-library-path in rserver.conf to the LD_LIBRARY_PATH environment variable.
Shiny Server / Connect: you can affect the environment variables for a specific application using program supervisors
This page in the documentation is the new home for the golden source of info for startup behavior: https://docs.posit.co/ide/user/ide/guide/environments/r/managing-r.html
R sessions across all Workbench IDE’s
Relying on repos.conf for R repository configuration IMHO is a clear anti-pattern which especially large customers with large setups should not use. repos.conf is a relic from a time where there was only RSP.
Configure the repository settings in R directly, rather than through the RStudio settings.
Resources:
A great resource for setting this up simply is: https://docs.posit.co/ide/user/ide/guide/environments/r/managing-r.html
For a more complex example that could be used in slurm environments refer to: https://github.com/sol-eng/singularity-rstudio/blob/main/data/workbench/scripts/run.R
The Renviron.site would override the Rprofile.site setting and therefore may be more robust.
Renviron.site
Create a Renviron.site file and define:
/opt/R/Some-R-Version/lib/R/etc/Renviron.site
Key1=value1
RENV_PATHS_PREFIX_AUTO=TRUE
RENV_PATHS_CACHE=/scratch/renv
R_LIBS_SITE=R_LIBS_SITE=${R_LIBS_SITE-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib64/R/library:/usr/share/R/library'}And then Sys.getenv("Key1") will return "value1" in a users R session.
This can be set at the user or system level. Users have the choice between user or project level (project taking preference). The usethis package includes a helper function for editing .Renviron files from an R session with usethis::edit_r_environ(). For a system level install it is placed per R version, for example at /opt/R/4.1.1/lib/R/etc/Renviron.site.
.Rprofile.site
Set .libPaths() in .Rprofile.site:
/opt/R/Some-R-Version/lib/R/etc/Rprofile.site
options(repos = c(CRAN = "https://packagemanager.posit.co/all/latest"))
if (interactive()) {
options(width = 120)
}Again this file can be se at the user or system level. At the user level the easiest way to edit your .Rprofile file is to use the usethis::edit_r_profile() function from within an R session. You can specify whether you want to edit the user or project level .Rprofile. For a system level install it is placed per R version, for example at /opt/R/4.2.0/lib/R/etc/Rprofile.site.
Workbench and RStudio sessions only
The oft recommended path is to use the repos.conf or rsession.conf file to configure the repository URL.
This might look like:
/etc/rstudio/rsession.conf
r-cran-repos=http://cran.at.r-project.org/Or:
/etc/rstudio/repos.conf
RSPM=https://packagemanager.posit.co/cran/__linux__/jammy/latest
CRAN=https://packagemanager.posit.co/cran/__linux__/jammy/latest
Australia=https://cran.ms.unimelb.edu.au/
Austria=https://lib.ugent.be/CRAN/And adding to rsession.conf:
/etc/rstudio/rsession.conf
# Use this to change the location / name of the repos.conf file
r-cran-repos-file=/etc/rstudio/repos.confReference: https://docs.posit.co/ide/server-pro/rstudio_pro_sessions/package_installation.html
LD_LIBRARY_PATH
Reference: https://rstudioide.zendesk.com/agent/tickets/107856
There are a few different places the LD_LIBRARY_PATH can be modified within Workbench settings, only some of which will work for packages like rJava. For instance, including the ldpaths script in /etc/rstudio/r-versions can ensure the correct library is set on R session startup. The final step in this support article shows a method of setting this up which should work even on non-containerized Workbench sessions (the mkdir command can be excluded, since that directory should already exist on a server-installed version of Workbench):
We need to force the installed R version to use it’s own ldpaths startup script when it starts inside the container.
RUN mkdir -p /etc/rstudio && printf "Path: /opt/R/${R_VERSION}\nScript: /opt/R/${R_VERSION}/lib/R/etc/ldpaths" > /etc/rstudio/r-versions
These steps are good to follow: https://solutions.posit.co/envs-pkgs/using-rjava/index.html#additional-steps-for-workbench
The additional steps that need to be followed on Workbench are:
/etc/rstudio/r-versions
Path: /opt/R/4.2.0
Script: /opt/R/4.2.0/lib/R/etc/ldpathsTroubleshooting
To determine the environment details it can be useful to run Sys.getenv() from inside and outside RStudio, to see if the user’s bash files are setting environment variables inappropriately for the system.
If the issue is occurring within RStudio it can be helpful to capture the output ofsystem("ldd /usr/lib/rstudio-server/bin/rsession") from inside and outside RStudio to see which R libraries are being loaded.
Test from a user session the repository details with:
.libPaths()options()$repos
Permissions on the various configs can cause various issues with soft fails, for example a working permission is: chmod 644, or for the rstudio directory consider chmod 0755 /etc/rstudio or chmod o+x /etc/rstudio to add the x bit for just rstudio-server without opening up the other permissions if it is desired to keep it restricted (may result in odd behavior).
Check permissions with: ls -la /etc/rstudio
Python repository
All projects system wide for the current user
Set your user-specific pip configuration file to use the desired repository address:
pip config set global.index-url https://pkg.demo.posit.team/blocked-python/latest/simple
pip config set global.trusted-host pkg.demo.posit.teamSpecific project for the current user
To associate a Package Manager repository with a specific Python project, configure a repository index URL to be used with a requirements.txt file for the project or virtual environment: Copy and paste the following lines to the top of the requirements.txt file in your project:
--index-url https://pkg.demo.posit.team/blocked-python/latest/simple
--trusted-host pkg.demo.posit.team
Globally for all users, on Workbench, and/or on Connect
Add the following lines to /etc/pip.conf:
/etc/pip.conf
[global]
timeout = 60
index-url = https://pkg.demo.posit.team/blocked-python/latest/simple
trusted-host = pkg.demo.posit.teamPip limitations
The current stated config order of priority is SITE > USER > GLOBAL. For example, global values would be overwritten by user values. It will look for files in these locations:
$ pip config list -v
For variant 'global', will try loading '/etc/xdg/pip/pip.conf'
For variant 'global', will try loading '/etc/pip.conf'
For variant 'user', will try loading '~/.pip/pip.conf'
For variant 'user', will try loading '~/.config/pip/pip.conf'
For variant 'site', will try loading '[venv]/pip.conf'
Note that pip does not currently allow or a repository order preference to be respected.
Packages are expected to be unique up to name and version, so two wheels with the same package name and version are treated as indistinguishable by pip. This is a deliberate feature of the package metadata, and not likely to change.
References: