- Researchers of the Leiden Institute of Advanced Computer Science stated that there are thousands of fake PoCs in GitHub repositories.
- During the binary analysis, the team examined 6,160 executables. The results show that 2,164 malicious samples are found in 1,398 repositories.
- The team also found a link in a repository to Pastebin that will be saved as a VBScript, then run by the first exec command, which contains the Houdini malware.
Researchers of the Leiden Institute of Advanced Computer Science announced the results of their research. According to the report, there are thousands of GitHub repositories that are offering fake proof-of-concept exploits. Some of them are even including malware. The researchers analyzed approximately 47,000 repositories between 2017 and 2021.
Spreading malware
The researchers started with observations about some indicators of malicious PoCs and what kind of methods are being used in these PoCs and what are well-known and easy-to-implement methods can be used in order to create a malicious PoC for a specific CVE exploit. Then the team clustered the data based on programming language as well as the year of CVE, which will be useful in similarity analysis and o identifying useful indicators of maliciousness and extracting them from PoCs and repositories. The team identified the indicators of malicious PoCs:
- IP analysis: As PoCs are intended to be used by different people on different machines, in general, a PoC should not have any communications with a predetermined public IP address. This could be an indication of malicious behavior, e.g., to exfiltrate information from the machine executing the PoC to that server.
- Binaries analysis: Some PoC repositories come with pre-built binaries to ease the process of exploiting a given security issue. This is why the team also extracts binaries from the repositories. In this work, the team focused on EXE files which can be run on Windows systems, since also most of the malware attacks are conducted against Windows users.
2,864 of the 150,734 unique IPs extracted matched blocklist entries. 1,522 of them are detected as malicious on Virus Total and 1,069 of them are present in the AbuseIPDB database. During the binary analysis, the team examined 6,160 executables. The results show that 2,164 malicious samples are found in 1,398 repositories. In total, 47,313 repositories were tested and 4.893 of them were deemed malicious.
The team also found various malicious proof of concepts made for CVEs. These PoCs have had multiple intentions: some of them contain malware, some are used to gather information about users of the PoC, and others are made to simply mock people and remind them that running proof of concepts without reading the code can be harmful.
- Malware: One interesting example was shared in the repository, intended to be a PoC for CVE-2019-0708, which is the famous BlueKeep. This repository was created by a user under the name Elkhazrajy. The source code contains a base64 line that once decoded will be running. It contains another Python script with a link to Pastebin that will be saved as a VBScript, then run by the first exec command, which contains the Houdini malware.
- Exfiltration scripts: These scripts were generally made to gather some information about the person running it, e.g., IP address, system information, User agent, IPbased geolocation, etc. One example was the malicious PoC made to exfiltrate a few basic details about the machine running it.
- Prank scripts: Fake but not malicious, these scripts are made generally by people who are aware of the issue and trying to educate the rest of the community by sharing prank scripts that, once running, will either show a prank message or something else.
The researchers said,
« We conduct a quantitative and qualitative investigation of CVE Proof of Concepts maliciousness on GitHub. In this research we proposed heuristics to detect malicious PoCs based on inclusion of malicious IP addresses, analysis of instructions obfuscated with hexadecimal and base64 encodings, and malicious binaries targeting Windows systems. Out of 47313 GitHub repositories with PoCs we detected 4893 malicious repositories (i.e., 10.3%). The next step after this research is to develop a more robust approach to for detecting malicious instructions, e.g., based on code similarity features or dynamic analysis.
To the best of our knowledge, our work is the first that investigates, analyses and proposes a heuristic-based solution to detect and flag malicious PoCs of CVEs. Our approach is based on analysing source code for malicious calls to servers as well as extracting hexadecimal payloads and Base64 encoded scripts that contains malicious instructions, which could be exfiltrating information, downloading malicious files from the internet or containing a backdoor. However, this approach cannot detect every malicious PoC based on source code, since it is always possible to find more creative ways to obfuscate it. We have investigated code similarity as a feature to help identifying new malicious repositories. Our results show that indeed malicious repositories are on average more similar to each other than non-malicious one. This result is the first step to develop more robust detection techniques. »