Dynamic Matrices in GitHub Actions: Streamlining Our Jest Runs
We had a problem: our test runs in CI were too slow. This was in our primary frontend application at Salesloft, a codebase nearing one million lines of code. Granted, considering how large the codebase is, the test runs weren’t too bad. Worst case scenario they would take about 10 minutes wall clock time1, but we knew there was room for improvement because our billable time was still quite high. What follows is our adventures streamling our Jest runs by making our GitHub actions job matrix dynamic.
The following solution was implemented with Jest because it supports sharding and fully utilizes multi-core machines out-of-the-box. However, any test runner that does the same will benefit from this approach.
#Where we started
We use GitHub actions for continuous integration at Salesloft. As I wrote a few weeks ago, we were already limiting CI test runs to packages that had changed in our monorepo. The next few sections build on code from that post, so I do recommend giving it a readthrough before continuing.
Our Monorepo Structure
To recap from the previous blog post, our pnpm-workspace.yaml
file looks something like this:
Apps are organized as packages in the apps/
directory. infrastructure/
holds all the packages needed to run, build, test, lint, format, and ship the repo. The scripts we’ll be looking at below are in an infrastructure package. platform/packages/
and shared/
contain shared code teams can use when building out their apps. In typical monorepo fashion, each package manages its own dependencies and many of these package depend on others from within the monorepo.
Here is the snippet of GitHub actions workflow file that ran our Jest tests:
The important thing to note here is that we’re hard-coding a matrix for the CI nodes we want to run. Matrices in GitHub actions allow you to run multiple versions of the same job with different arguments each time. In this case, the value for ci_node_total
never changes. It’s always 16
. But because there are sixteen values for ci_node_index
, GitHub will run this job sixteen times2.
There are problems with this approach though:
- We’re having to check out the entire repo for each of these jobs. This takes a significant (relative) amount of time with our repo having grown so much in the last few years. We need the full git history (as opposed to a shallow clone) because PNPM uses git under the hood to determine which packages have changed. This has gotten significantly better now that the checkout action supports partial cloning but not having to do full checkouts for each Jest job would speed up the process.
- Our setup code – encapsulated in a composite action at
.github/actions/setup
in our repo – is not particularly fast. Depending on how many tests we’re running, the setup code can range from 5% to 50% of the overall runtime. It’s in these latter cases particularly where having fewer runners handling more tests would result in better efficiency. - We were overprovisioning runners in most cases. We rarely run all tests in our repo. The majority of PRs run 5% to 25% of our tests. We don’t need sixteen runners in those cases.
#Making it dynamic
Instead of hard-coding our matrix, we want the number of jobs run to be dynamically allocated for each CI run based on how many tests files need to be processed. This means we need to split the logic for running our tests into two parts: an orchestrator and runners. The orchestrator is responsible for cloning the entire repository, determining which packages have changed, counting the resulting number of test files, and passing those values out to the runners. The runners are only concerned with running the tests handed to them.
Here’s what the GitHub workflow looks like for the orchestrator:
The outputs
section allows us to pass outputs from a specific step to other jobs that depend on test-orchestration
. We will use this when we configure the runners. The rest of this job looks like our old tests
job except for the last step. We run the determineTestMatrix.js
file instead of running Jest directly. Let’s dig into the JavaScript file to see what it’s doing.
There’s a few things going on here, so let’s break it down. The first thing we do (after imports and a few utility functions) is call getRoots
to determine which packages have changed (again, see my previous post for more details). Next, we run Jest with the --listTests
flags to list out all the tests files we need to execute. It is using getRoots
internally so the tests it returns will always be inside the directories listed in roots
. Then, we calculate how many runners we need, dividing the number of tests by 200 and maxing out at 24 runners. The result of this math is numberOfRunners
. We create an array from 1
to numberOfRunners
for matrix.ci_node_index
and set matrix.ci_node_total
to an array with numberOfRunners
as the sole value. If you squint, you’ll notice that the shape of matrix
matches the shape of the hard-coded job matrix we used to have in our workflow! Lastly, we stringify the JSON values and pass them out of the step via core.setOutput
3. This script’s job is done. Time for the runners to take over.
Interestingly, the runners need very little customization compared to their prior setup. We’ll be running the exact same steps as we did with the hard-coded matrix. The primary difference this time is how we’re defining matrix
:
Notice we add needs: [tests-orchestrator]
at the top to make sure this runs after the orchestrator job. This also allows us to use outputs from that job using the needs.tests-orchestrator.outputs
expression. The real magic though is in this line:
This parses the matrix
value we passed out of the orchestrator script and uses it to define the matrix for the workflow. So if our script had passed out { ci_node_index: [1, 2], ci_node_total: [2] }
, it would run two jobs. But our script could just as easily have passed { ci_node_index: [1, 2, 3, 4, 5], ci_node_total: [5] }
which would run 5 jobs. The great thing is that it’s all dynamic! We run jobs based on the number of tests we need to execute. Most of our test runs now only run 2 to 4 runner jobs. This greatly reduced our billable time.
The runners don’t need to clone the entire repo anymore. Nor are they having to make determinations on what packages have changed or what files to run. This makes speeds up their wall clock time.
Now, I said we didn’t need to add any special scripts for the runner, but we did have to make one change. Since we already determined which roots we need to run against in the orchestrator, we can pass those values in and use them directly without having to go to PNPM to determine which packages changed. In our workflow file, we do that by passing in JEST_ROOTS
via env.JEST_ROOTS: ${{ needs.tests-orchestrator.outputs.roots }}
. We need make a small update to getRoots.js
to handle that:
#Across the finish line
With that, we are now A) running tests only for packages that have changed and B) dynamically allocating runners based on how many test files are included in the current run. This has given us huge gains in our CI pipeline bringing our billable time for each down from multiple hours to 20 minutes on average. With the test runners being more focused, we also moved them to 4x runners (as opposed to the default 2x) which gave us a ~2.1x improvement in wall clock time. All in all, these changes have been an immense qualify of life as well as business improvement.
#Footnotes
-
Throughout this article, I will use the phrases “wall clock time” or “run time” to refer to how many minutes it took if one was timing with a stopwatch. “Billable time” or “total time” refers to the sum of time all CI job runners ran for. For example, a job running across 15 runners may take 5 minutes according to wall clock time. But the total billable time would be
5 minutes * 15 runners
, or 75 minutes worth of billable time. ↩ -
Technically, GitHub actions will run a job for every permutation of the values provided. We can multiply the number of values for each key to know how many jobs GitHub will run. For instance, the following matrix runs six jobs (
3 x 2
):Whereas this matrix runs twenty seven jobs (
↩3 x 3 x 3
): -
GitHub actions only allows passing strings back and forth between jobs. Thankfully, stringifying and parsing JSON is trivial and GitHub actions even provides a helper function to parse JSON directly in workflow files. ↩