Improving app performance with profvis

Posit Solutions Engineering (Lisa Anders)

Posit, PBC

Why you should be load testing

  • You want to know what to prioritize to improve your application
  • Often what’s holding your application back isn’t intuitive

“Illustration from Hadley Wickham’s talk”The Joy of Functional Programming (for Data Science).” by Allison Horst

Profiling

Profile apps to understand where it is spending the bulk of its time. Oftentimes the result is surprising and may point at the issue being a specific function or command rather than needing an overhaul of the app itself or changing how it is running on the server.

profvis - Profvis is a tool for helping you to understand how R spends its time.

library(profvis)

# general code example
profvis({
  data(diamonds, package = "ggplot2")
  
  plot(price ~ carat, data = diamonds)
  m <- lm(price ~ carat, data = diamonds)
  abline(m, col = "red")
})

# shiny app example
profvis({runApp()})

Profiling

For more information refer to the support article

Load testing

Using load testing with profiling grants a very granular view of where the performance issues are happening. Oftenlower usage apps may appear to have great performance, only to struggle as more users access that piece of content due to multiple users sharing the same R or Python process.

shinyloadtest - Load testing helps developers and administrators estimate how many users their application can support.

Load testing Overview

shinyloadtest - Load testing helps developers and administrators estimate how many users their application can support.

The steps:

  • Part 1: Record a typical user session for the app.
shinyloadtest::record_session('https://shinyapp.example.com/')
  • Part 2: Replay the session in parallel, simulating many simultaneous users accessing the app.
shinycannon recording.log https://shinyapp.example.com/ --workers 5 --loaded-duration-minutes 2 --output-dir run1
  • Part 3: Analyze the results of the load test and determine if the app performed well enough.
df <- shinyloadtest::load_runs("run1")
shinyloadtest::shinyloadtest_report(df, "run1.html")

Let’s look, in more detail, at running this from Workbench for apps deployed to Connect.

Part 1: User Recording

The Connect API key is stored as the r environment variable connect_api_key. It can be edited/modified using the usethis package with:

library(usethis)
usethis::edit_r_environ()

Create “Recording” of a typical user’s interaction

library(shinyloadtest)

shinyloadtest::record_session(
  target_app_url='https://colorado.posit.co/rsc/content/bec1d4bc-2ab7-4ba3-9bd6-b9e336bf3ff9/', 
  connect_api_key=Sys.getenv("CONNECT_API_KEY"))

Use the URL for the app from the “open solo” mode on Connect.

Alternatively, we can programmatically create the recording using shinytest2.

Part 2: Load testing, install Shinycannon

Set the env variable for the connect api key in your terminal with (note that set is used in windows, export for mac or linux). Do this in terminal (after adding your API key).

export SHINYCANNON_CONNECT_API_KEY=<add your key here>

Verify that it was set (note that %% is used in windows, $ in linux).

echo $SHINYCANNON_CONNECT_API_KEY

Shinycannon installation is optional on Linux, the jar file can be called directly (useful in organizations where system installation is restricted for security reasons).

Test that shinycannon works by calling the help documentation with:

cd test
java -jar shinycannon-1.1.3-dd43f6b.jar -h

Part 2: Load testing, continued

We will run the load test for simulating the number of simultaneous users, each time saving the results to a different folder:

java -jar shinycannon-1.1.3-dd43f6b.jar recording.log https://colorado.posit.co/rsc/content/d2c40c48-ae0b-48d8-888a-e8626322565d/ --workers 1 --loaded-duration-minutes 2 --output-dir run1 --overwrite-output

java -jar shinycannon-1.1.3-dd43f6b.jar recording.log https://colorado.posit.co/rsc/content/d2c40c48-ae0b-48d8-888a-e8626322565d/ --workers 5 --loaded-duration-minutes 2 --output-dir run2 --overwrite-output

Alternatively we could pass in the command from R using the system() command, for example:

connect_api_key = Sys.getenv("CONNECT_API_KEY")

system(
  sprintf(
    # "set SHINYCANNON_CONNECT_API_KEY=", #Change to this if you are running on Windows
     "export SHINYCANNON_CONNECT_API_KEY=",
    connect_api_key
    )
)

target_url <- "https://colorado.posit.co/rsc/content/d2c40c48-ae0b-48d8-888a-e8626322565d/"
workers <- 1
dir <- "run1"
system(
  sprintf(
    "java -jar shinycannon-1.1.3-dd43f6b.jar recording.log %s --workers %s --loaded-duration-minutes 2 --output-dir %s --overwrite-output",
    target_url, workers, dir
  )
)

Part 3: Analyze the results

Reference the documentation to understand the different charts: https://rstudio.github.io/shinyloadtest/articles/analyzing-load-test-logs.html?q=output#report-output

library(dplyr)

df <- load_runs(
  `1 user` = "run1",
  `5 users` = "run2"
)

shinyloadtest_report(df, "report.html")

Notes:

  • RMarkdown and various dependencies will need to be installed.
  • After running this, depending on your organizations security policies, it may help to open as a “preview” rather than in web browser (the error message will be something like “CORS restricted”).
  • It may also be helpful to set self_contained = TRUE, or self_container = FALSE depending on any error messages encountered.

Load testing output

For more information refer to shinyloadtest

Load testing output: Impact on session duration

For more information refer to shinyloadtest

What about Python?

Where to go from here?

Optional/Backup

Data best practices

Apply data best practices and see if that improves performance:

  • Pull data on a scheduler
  • Reducing the data being loaded
  • Selecting a faster data storage system, for example by pinning arrow files
  • Utilize cacheing

Async

As a last resort we can consider async.

In general, async is only useful when there are specific steps that take a long time to run, since that will free up the process to service other users. Usually async is saved as a last resort because it is usually the most challenging to implement.

When using async, encouraging developers to include additional debug messages, for example with log4r in R, is particularly important. This will allow developers to trace back errors to the session and connection.

When to know it’s a real server issue

Ask the questions:

  • Are all applications impacted?
  • Has performance gotten worse over time?

Use the tools: