Malware In GitHub Repositories

It is unsurprising to find malware hosted on GitHub. GitHub, being a free website specifically geared towards hosting and deploying code for millions of people and organizations, which makes it an ideal location for malicious actors to hide their own code. Whether pulling from their own repositories or pulling from the handy collections of malware analysts, bad actors have a handy location for their malware to reside.

Posted in Behind the Code

July 15, 20215 min read

Overview

A recent investigation uncovered two previously unexpected locations where malware could be found:

The repository description
Easily parsed Markdown files

A crafty attacker can easily use these innocuous locations to successfully hide and deploy a payload from GitHub than using traditional file-based methods. As such, malware analysts and researchers need to be on the lookout for additional non-traditional retrieval methods from GitHub as well as any manipulation of the retrieved content.

Discovery Method

A recent search of GitHub was run to search for a simple code snippet:

Aside from a wide range of “expected” repositories relating to scanners (Fig 1), there were also a number of repositories that appeared to display PHP code in the repository description (Fig 2).

Fig 1

Fig 2

Investigation

A quick investigation found these repositories either didn’t have any files in them or only had “expected” repository files such as Markdown files, LICENSE, files, and / or Git files. So where was the PHP code?

In the case of the first repository, there were no clues other than the search results. So, we need to take a look at just what GitHub displays in those results. Aside from the repository name, we can get some information about the repository including the code language in use (if any), the license that applies to the repository code, and the last time the repository was updated. Additionally, if the repository owner has added a description to the repository settings, we get an excerpt from that.

So, what is going on with these repositories that have no code? In those cases, what we see in the search results is the repository description.

Now that we know where the code is, we must ask ourselves two things. First, “So what?” Secondly, “How does that get leveraged?” The answer to both questions is “easily.”

Deployment Method 1

The first, and more worrying, deployment method leverages the repository description data to make the malware available without the use of actual files. The description, and everything “about” the repository, is readily available through GitHub’s Repository API. A simple call to the API with the user/repo in the URI and we get a nice JSON blob back.

This, in turn, can be stripped of slashes and executed with a simple PHP snippet.

Unlike the typical GitHub retrieval process which involves the obvious retrieval of a file from GitHub’s “raw content” URL (e.g., https://raw.githubusercontent....;owner>/<repo>/master/<path>), this mechanism simply appears to be retrieving the description of the repository. While the example above makes it clear that we are retrieving code and executing it, a crafty attacker could easily disguise it, so the evaluation happens elsewhere or uses a different method.

The main danger here is that the retrieval and display of a “description” field is, to most people, completely innocuous. In particular, the retrieval and injection of JSON content is completely normal in many cases. No one would think twice about a piece of code that retrieves a JSON blob, grabs a description field, and displays it or, with appropriate verbiage, executes it in some way. The only indicator, to most people, in this case is the repository owner and repository name.

Deployment Method 2

The second deployment method leverages files, but not in the typical fashion.

By retrieving the raw Markdown file, the README.md in the repository I found, the attacker would simply be retrieving an innocuous README. However, as shown below, some simple string replacement will convert the Markdown content into executable PHP code.

Again, there is little about the retrieval of a README file or, more generally, a Markdown file that would raise suspicion. Markdown files are, after all, not scripts or executables and contain numerous formatting characters that would make execution impossible. A crafty attacker could very easily disguise this retrieval in a way that allays suspicion about the purpose of the retrieval. Again, the only indicators where would be the repository owner and repository name if the attacker kept the data in the readme or license file.

Conclusion

Malware analysts around the world know and understand that malicious actors leverage GitHub to deploy their tools. This is nothing revelatory. What is unusual in these cases, is the alternative deployment locations for these samples and the stealthy way in which they reside in GitHub.

The primary danger with them lays in the fact, for all intents and purposes, these repositories appear empty and in innocuous locations which are being contacted to retrieve the payloads. It would take very little effort for a bad actor to store JavaScript or another client-side scripting language within the same field and then leverage it as the payload for a drive-by attack. With millions of people using GitHub to deploy code it is imperative that those who use it understand the risks that go along with it. SiteLock knows how important it is to protect websites from potential threats like hidden malware, even though they may be coming from a trusted source. Contact us to learn about our website security products or to speak with one of our security professionals about our services today.