Researchers re-discovered an unpatched 2007 Python tarfile module vulnerability that could affect 350,000+ open-source applications and projects. If exploited, it could allow attackers to control devices. Here’s what to know about this threat and how to address it to secure your supply chain
It’s no secret that people love open-source software. In fact, Linux Foundation reports that up to 98% of codebases have free or open source software (FOSS) in them. This is likely because open-source software is often viewed as bit of a godsend; it’s free to use and often is thought to be more secure than proprietary software because so many more eyes are looking at it. However, the second half of that statement isn’t always the case, as proven by a newly re-discovered vulnerability that came to light this week.
New research from the cybersecurity company Trellix called out the Python tarfile vulnerability in a new report. Their Advanced Research Center team initially thought they’d discovered a new zero-day vulnerability but quickly realized that it was actually an older vulnerability (CVE-2007-4559) that’s gone unaddressed for the last 15 years. This vulnerability is thought to be present in more than 350,000 open-source projects and, potentially, countless other closed-source projects.
Let’s explore what this vulnerability is and how it can be exploited, what it does that poses a threat to our organization and customers, and what you can do to mitigate this issue and prevent it from affecting your supply chain.
Let’s Take a Look at the Python Tarfile Vulnerability CVE-2007-4459
The National Institute of Standards and Technology (NIST) describes CVE-2007-4559 as a type of “directory traversal vulnerability” that can be exploited through the use of specific functions (extract and extractall) in Python tape archive file (tarfile) modules. The concern here is that these functions enable attackers to execute code and gives them the ability to read and modify sensitive files on the backend and take over your device.
In case you’re not familiar with Python and we’ve lost you, let’s break all of this down a little more:
- Python is a programming language used by developers to create numerous frameworks and applications.
- A directory traversal vulnerability is a weakness that can be exploited to give an attacker access to files on your application server.
- Tar is a command that enables you to read, access, and bundle files (called tarfiles) for tar archives. The tar command compresses and bundles multiple files and their metadata into a single file to save space in your archive.
- The tarfile module enables users to parse and change a file’s metadata without authorization before it gets added to your tar archive. Trellix researcher Kasimir Schulz (the one who wrote the aforementioned Trellix article) cautions that attackers can use the tarfile module to create serious exploits in just a handful of lines of code.
What Exploiting This Vulnerability Can Do
So, what exactly does this vulnerability really allow cybercriminals to do? According to the report, this vulnerability gives an attacker the unauthorized ability to read and overwrite arbitrary files remotely. These files include everything from general server files to those containing sensitive user data. Attackers do this by including a dot-dot (“..”) sequence with a separator (either “/” or “\”) in TAR archive filenames. (Don’t worry, we’ll explore all of this a little later, so stay tuned.)
In layman’s terms, all of this means that bad guys can simply add a basic amount of code to retrieve random files from various levels of your filesystem directory. Depending on what attackers manage to get their hands on, you may find yourself facing a litany of issues, including
- Data breaches,
- Non-compliance fines and penalties,
- Lost user trust, and
- Potential lawsuits.
Check out this video by Trellix, which shows an example of how easily an attacker can exploit this vulnerability to gain admin-level code execution capabilities in Spyder, which is a Python-based open source research environment:
The crew at Trellix also tested this vulnerability on a couple of other systems: Polemarch (an IT infrastructure management service) and Universal Radio Hacker (a wireless protocol analysis tool). You’ll find more info, including videos, relating to these platforms in the Trellix article.
A Quick Overview of How This Potential Exploit Works
Okay, now that we know what this Python traversal attack vulnerability is, let’s take a look at how a directory traversal attack works in general using a basic example.
Say, you’re a small camera and photo processing business. On your website, you have image files and descriptions of your digital SLR camera products and accessories. These image files accessed using an HTML code similar to this:
<img src= “/images/yourimagefilename.jpg”>
What your server does is use the information to find the specified file (in this case, yourimagefilename.png) in order to display the image file you’re requesting. With me so far? Good. This is where things get start getting dicey.
Since the files are stored at the location /products/cameras/accessories/images/, it means an attacker can extract your image file’s absolute file path. For example, this can look akin to the following:
Knowing this, it means an attacker can retrieve an arbitrary file from your file system using just this little bit of knowledge. They can even more one level up in your server’s filesystem by requesting a URL with the dot-dot and separator (“../” or “..\”) we mentioned a little earlier. The more uses of “../” the attacker includes in the file path, the higher up in your directory they’ll be able to retrieve increasingly sensitive files. (For example, …/…/…/” moves them up three levels, which could be your filesystem’s root directory.)
Doing this, the bad guy can then try to retrieve different common operating system files that contain nuggets of valuable sensitive data, such as user profile information.
Why This Python Vulnerability Is an Issue: It’s Prevalent and Easy for Attackers to Exploit
When it was originally announced back in 2007, this vulnerability was classified as having a low security impact rating. But that was then; this is now. In their September 2022 report, Trellix researchers shared that “hundreds of thousands of repositories” are vulnerable to this security issue, making this a more serious issue today. Ideally, it’s a security issue that needs to be addressed quickly before attackers start putting it to use.
While Python isn’t the most commonly used programming language, it’s still popular and has been around for a while (since the early 90s). It also picked up a decent amount of traction in terms of usage over the past two years during the Covid 19 pandemic. It’s versatile, working on various operating systems and platforms.
The good news here is that Trellix’s report indicates that there aren’t any known instances of this vulnerability being exploited in the wild. However, it doesn’t mean that the same will always be true. After all, bad guys are always looking for new ways to attack and ways to recycle tried-and-true attack methods.
Most cybercriminals aren’t mad hackers who thrive off the challenge of trying to figure out ways to hack your network or applications. They’re often opportunistic attackers who often prefer to go for the low-hanging fruit (i.e., easy targets). It’s a heck of a lot easier and more cost effective for them to exploit known vulnerabilities. And many attackers do what they do because they want a quick-and-easy payday.
But doesn’t exploiting this type of vulnerability require a lot of technical know-how and skills? Eh, not really. Even an attacker with relatively rudimentary knowledge of cybersecurity can potentially exploit this vulnerability. That’s what makes it particularly concerning. Many companies integrate open-source code into their products. This means that this tarfile module-related vulnerability is thought to be highly widespread and puts supply chains globally at risk.
How to Identify the Tarfile Vulnerability Within Your Applications
As a way to help organizations and developers nip this security issue in the bud, Trellix researchers create Creosote, an open-source tool that searches any directory you specify for Python files. This works on Windows, MacOS and Linux systems.
Once these files are found, it scans them for vulnerabilities, which it classifies into three primary categories of vulnerabilities, labeled from highest to lowest risk:
- Vuln — This indicates that the file requires analysis and you should proceed with caution.
- Probable Vuln — As the name implies, this means that there’s something about the file that indicates that there might be a vulnerability.
- Potential Vuln — This is the least worrisome category and just aims to ensure that nothing gets missed.
To learn more about CVE-2007-4559 and the risks it poses to modern software supply chains, be sure to read Trellix’s full report.