Once again, the pattern of taking over a known npm package and modifying it with malicious intent has happened. In this case, it's with the event-stream module in the npm repository. In this broadcast I speaker with Thomas Hunter, Software Developer at Intrinsic and author of "Compromised npm Package: event-stream", and Brian Fox, CTO of Sonatype, author of the Forbes "Open Source Developers And Infrastructure Are The New Front Line Of Security?" article.
Mark Miller: Hi this is Mark Miller in New York City. I've got Brian Fox in the Boston area and Thomas Hunter in the San Francisco area. What we want to do is talk about the npm compromised package, the event-stream package, that was announced earlier in a couple articles. For background Brian does have an article on this topic in Forbes and Thomas has an article in the Intrinsic blog that talks about this. Good morning guys.
Thomas Hunter: Morning.
Brian Fox: Morning.
Mark Miller: Brian, I'll start with you. Give us a little background on yourself for people that don't know you.
Brian Fox: I'm a co founder and CTO at Sonatype and longtime Apache Maven committer and PMC member. So Sonatype, we've been in the business of providing a central repository for Maven for 15 years or something like that now. Supply chain attacks, those types of things on open source, is definitely something that we stay on top of and have been paying attention to for a long time.
Mark Miller: Thomas, a little bit about yourself.
Thomas Hunter: I'm a software engineer. As you mentioned, I work at Intrinsic. We build a product that protects Node applications. We also provide compatibility with various Node modules. So that kind of day to day basis I'm frequently pouring through node source code and looking at popular models. Stuff like that.
Mark Miller: Thomas, your article I thought was impressive in the clarity and the simplicity it had. For those of you that haven't read it, it's called Compromised NPM Package: event-stream. Thomas, can you give us a little bit about the article?
Thomas Hunter: I can start off with a little bit of history, what exactly happened. Essentially the ownership of event-stream, which is a pretty popular Node module, about one and a half million downloads (a week), was transferred to a unlisted user essentially. That user created a new package which is not a sub-dependency of that package and that new package snuck in some bad code. In essence we looked at the timeline and maybe who was effected. The code was encrypted and minified so we tore it apart and tried to recreate what it was doing.
Mark Miller: Brian, when you saw this we started working on it this morning. It reminded us immediately about a lot of the things you've been talking about for the last six or eight months.
Brian Fox: Yes Going back all the way to about July of last year and a starting trend of people attacking the supply chain, primarily npm and Python. We've seen examples of it happen with Gentoo Linux and Docker Hub and in some cases Home Brew as well. So it's not just limited to those ecosystems. But we've seen an increase in these attacks trying to get things into the supply chain.
It's not unlike the super-micro alleged attack that was all the news from Bloomberg. It's a very similar concept that if you can attack the supply chain and get some malicious code into the repository then get it used by millions of developers it has a tremendous amplifying effect on your attack surface. So this event-stream is just another one of those attacks from my perspective.
I've been talking about this for about a year now because after awhile I started to feel like groundhog's day. "Didn't we just have this conversation not six weeks ago?" kind of thing. Unfortunately it continues to happen and not enough people are aware of the vector generally. So we've been trying to change that.
Mark Miller: Thomas, you're looking at a lot of the Node stuff like this. Are you seeing this more frequently? Was this just bigger than normal?
Thomas Hunter: We're definitely seeing this happen more frequently. This is probably the most interesting event recently. I also posted a previous article called The Dangers of Malicious Modules and it was something like this that happened a few times to varying degrees in the npm ecosystem.
Mark Miller: One of the things too Thomas, that's happening on the responses to your article, is people are talking about lines of responsibility. Can you touch on that? I mean who is ultimately responsible here?
Thomas Hunter: That is a great question. I don't think I have a good answer unfortunately. I think the current pattern with npm modules is that somebody built a project, it's usually related to something very interesting. I've got several modules myself. After awhile you lose interest. And it's sort of like ultimately who takes ownership of that module? Honestly I think that's something as a community that we're still figuring out.
I believe what happened with event-stream is that npm itself ultimately took ownership of the package. But I think we need to come up with a better solution as a community for that.
Mark Miller: Brian you've been dealing with this for 11, 12 years now. Where is the line of responsibility?
Brian Fox: I think ultimate responsibility lies with the end user. I mean you're the ones who are probably getting paid for what you're doing, right? The publishers generally are not getting paid. You're getting paid to put these open source things into your application.
So probably you have some liability in your contracts but your application is different.Pretty much all these licenses disclaim warranty. You know, "you should be aware" kind of thing. Clearly the only real, from a legal perspective answer, is that. I think there's probability in a bunch of these things.
Just on Twitter not 20 minutes ago somebody was complaining about the extra steps that we require before somebody can publish things into central, saying it's too hard, "Why can't it be like npm and make it easy?" Well this is a side effect of that. Maybe not this example exactly but we've seen other ones where the credentials were compromised or certainly left-pad where people could just show up and remove stuff from the repo.
Those are things that were never allowed from the central repository going back longer than these new ecosystems have existed. There's some reasons why you see this happen in more ecosystems than others. We've had this effect of "we've made it easy to publish stuff", we've made it easy to consume stuff. The inevitable outcome of that is we've dumbed down the ecosystem in many cases where we've just made it super easy for anybody to come up and put something out there.
The perception generally of the software industry is that open source is maybe more like BSD, gray beards who are deep in security and really know what they're doing and the guys that created Apache. There's lots of good processes at these projects and forges and things like that, that can help prevent these things. It's not going to stop them completely but it helps mitigate these types of things.
But what's actually happening is you've got high school, college kids writing things on the weekend, publishing stuff, they're not yet fully aware of the profound implications that happens when they themselves get attacked. We saw a lot of the compromises earlier this year and last year appeared to have come from stolen credentials or just weak passwords.
That's when I really started talking about this, is that I wanted to get the message out where the consumers understand that the people producing this stuff are not the people writing open source from the late 90's and early 2000's. You need to be aware of that generally. And to try to speak to the people who are producing a stop and say, "Guys, come on. You gotta treat your own security like millions of users are at risk." Because frankly they are.
If your laptop credentials get stolen, your key gets stolen, your password gets stolen, somebody's going to masquerade as you and inject that Bitcoin Miner or any of these number of things. Or remote code exploit. Then your credibility is none 'because nobody can really ever prove if you did it on purpose or if it really was an accident.
So if you don't care about your end users, at least care about your reputation and take this stuff seriously. That's kind of the message that I've been trying to get out for about a year now.
Mark Miller: Thomas, one of these that Brian just mentioned implicitly was the social engineering that happened on this one. We don't know yet whether the person that took over this project was the one that actually had malevolent intent here. How does this get tracked down from this point?
Thomas Hunter: From what I've seen it does appear that the user that took over did seem to have the malevolent intent in this case. The owner of the actual module with the bad code, that was actually released by ... I forgot the user's name. It had "Glass" in it. But that was released by somebody who had apparently never released anything before. However, the new owner of the event-stream package went ahead and made that package a dependency. It doesn't really make sense to make a brand new package that nobody else is depending on. A dependency of an important package, as far as I can tell, it seems pretty nefarious.
Brian Fox: Referring to one of the other examples that happened. In that case I think this one it seems (is) pretty cut and dry. But this pattern we saw before where somebody got credentials of mail-parser and then added another dependency to it. So it was sort of a timeline. They created this new dependency that had a remote exploit in it. It was new so nobody used it. Then somehow they got the ability to publish a new version of mail-parser, which was popular, added their thing as a dependency. Get cookies is the malicious one. And then they added it to mail-parser and boom, they have a huge audience.
This pattern we saw here is one we've seen before. The difference is, in this case, they social engineered the takeover of the project itself. Which shows more of an advanced persistent threat type of thing versus somebody just exploiting something they found. Because apparently they gain trust by committing some legit fixes to the code to appear to be a real committer and then took it over. Clearly this was the game plan all along.
There was one back in May of this year. There was a Python one. SSH Decorator that was stealing private keys. This one also had similar elements where it was clearly a persistent threat because the thing they chose to basically hack into was a piece of code who's job was to deal with SSH key. So if you had any kind of trip wire or something like that watching what was going on, it wouldn't trigger any red flags. This module was manipulating your key. But what it was actually now doing is taking that key and sending it to some eastern European IP address.
But in that instance, I think there was, at least the last stuff I saw, it was very dubious about who actually did it. The original author said, "It wasn't me." But there wasn't any obvious way to prove that it wasn't. That's what I was talking about, that reputation. So now at least in my mind, that whole thing is very suspect. I don't think that was exactly the case here. I think the story supports the narrative that we've been given but that's not always the case. When your credentials get attacked and somebody does something acting as you, when you don't have signatures and things like that, you also don't have the way to prove it.
Thomas Hunter: In this particular case the supposedly malicious author, the GitHub repo was quite bare and didn't even have a profile photo, just three small repositories. I don't believe there would be a smearing of reputation at all.
Brian Fox: What I found was interesting in some of the write ups and comments was that whoever published that module also took the nefarious code out somewhat later. So it was only in there for a few days. It looked like they were doing that so that the latest version that was kicking around in there looked okay. But anybody that had downloaded and cashed it already had their malicious payload. So again, it's just another one of these pieces of evidence that points to an APT type of threat. This isn't just somebody messing around.
Mark Miller: That also opens up the conversation on timeline Brian. This has been, from my understanding, it's been known for two months or it's been running for two months. What's the timeline that all this is happening?
Brian Fox: I think it was published a couple months ago. I think it's generally been known for about a week. It's just only started to become really more widely known. I'm not sure of the exact dates.
Mark Miller: Thomas, when we're talking about that, so it's been known, it's in the wild for the last week and people have known about it. What's taking so long for it to get the visibility that it's starting to get now?
Thomas Hunter: That is a great question. The original two months that it sat there, there just weren't enough eyes looking at it honestly. The thing is the code that was sitting on GitHub differed from the code that was in the package that somebody actually download the tarball you would see that it contains an index stud GS file but also contains a minified GS file. And that minified file had some nasty code injected into that. But if you were actually to audit the code by looking at GitHub, you wouldn't have spotted it.
I don't know the situation that caused somebody to originally find the issue though. As far as the amount of time it took, I believe it was almost week, maybe four or five days from an issue being made until npm rectified the situation. That timeline I'm not sure about. There was a holiday, so I believe that probably made things more complicated.
Mark Miller: Both of our companies are interested in actually surfacing vulnerabilities and letting people know about it. That's what we do, right? In something like this, and Brian you mentioned it so I'll go to you first, what is the end user's responsibility? We can't expect every developer to drill down into every dependency when they're using a module.
Brian Fox: Unfortunately within some of the ecosystems the modules are so much smaller which means you end up with that many more. It becomes impossible. I think what we're seeing in the industry is even just the simple best practices of being prepared for the inevitable don't happen.
The analogy I like to use is that we all as consumers expect that our manufactures have complete bill of materials whether it's our car or the planes we fly in or the food that we eat, that it is traceable when something happens. That's not if something happens, when something happens they can find out where it's affected and do a diligent type of recall.
Unfortunately most companies don't even have that level of diligence for the very same software running in those pieces of hardware that they take such care to keep track of the physical pieces. I'm talking about the financial markets, the air traffic control, the planes, all these kinds of things.
It starts with that, being able to have a complete bill of materials of what's going into your applications and understanding not just having a list somewhere or in a developer's head, but being able to quickly do a query and understand "have we used this version since it was published". Because in this case the nefarious one is now gone. So you need to be able to look back, not just on what you have now but what were you running in the last two months? Because you might have left behind a miner or something like that, is what we saw on some of these other cases.
Being able to quickly respond, understand, and triage if you're affected, where you're affected, and do something about it is unfortunately not the norm. People need to get to that point. It sort of begs another question. How do you prevent these things up front?
That's more of an ecosystem community kind of thing. I think some of the stuff that Thomas, you guys are doing that try to help you make the applications themselves more resilient. So it's definitely a defense in depth type of approach but frankly if you can't even tell me are we using this module, have we ever used this module, and in which apps, you've got no chance. And that's what it really comes down to.
Thomas Hunter: I think there's a few attempts to make the npm ecosystem more secure. There's all these tools that are always performing static analysis on the actual code but unfortunately that can't always catch these issues. A lot of times they're finding things that aren't necessarily issues as well. So it can get a little noisy.
Through the NSP project, which recently essentially merged into the npm incorporated they have a database of known bad packages that people can report. But unfortunately that approach is a bit more reactive. It's not until an issue is actually discovered and made public that you're able to get protected from that.
Of course once your application is actually running and in production, using an analysis package you're not actively made safe by that.
Mark Miller: At Intrinsic Thomas, what is the notification process? Let's say that you guys did find this. How are you notifying the community that you found this thing?
Thomas Hunter: We don't really crawl packages to look for vulnerabilities. We do crawl packages a lot to look for perhaps compatibility with our product or how can we create a policy around it. So that requires an intimate knowledge of the actual module. We do frequently find security vulnerabilities in these modules and then we will submit a PR to the maintainer to fix such an issue.
But really, since we run at runtime, the application will essentially generate an error when it attempts to, for example contact this rogue IP from this application, that this module's doing, we would have prevented that at runtime and then you'd receive a message.
Mark Miller: Brian, same thing to you, how is your team notifying not just clients but the public in general as to something that's been found?
Brian Fox: There's multiple different layers. We also run the OSS index website which has information integrated into a lot of tools. We have integrations with our Nexus repository, we also have a free vulnerability scanner on the website to do an assessment. And then of course our enterprise products allow companies to give them a bill of materials definitively from the bottom up so they have that big list that I was talking about. Then they can marry that with the data that we provide to do automated control so you can block developers from starting to use a known vulnerable thing or stop it from being released or a whole period of different possibilities.
We're generally about trying to provide those automated contextual controls so that we can take this information and make it actionable and relevant within a development environment. Our data research and that stuff that happens behind the scenes is what we do to drive all of that.
That's why we're sort of looking at this from two perspectives. We're looking at how do we make central more resilient types of things for the community in general. But then as it applies to all of the ecosystems, not just java, we have to be aware of these things because that's what our customers are using as they do.
Mark Miller: One of the interesting things for me on this one is the attack was very specific. In essence as an overall ecosystem and community it has relatively low impact because of what it was trying to do. From my understanding of it, is that if there wasn't a minor within the system who's attacking, nothing would happen. It's not like they did something else. But it does open up the possibility Thomas, that something very nefarious could have been placed here and people wouldn't have known for awhile.
Thomas Hunter: Absolutely.
Brian Fox: That's why I've been talking about it for so long. We haven't had the big one yet. Certainly if your SSH key got compromised you might have the big one and we don't know about it. But in terms of these exploits, some of them were typo squats, some of them were inserting minors, so they were stealing CPU, I mean somebody's paying that. Some of them were remote code exploits but not a lot of them.
The pattern is the thing that is really concerning to me. That new trend of people actively doing this and honing the craft of how do I add a new one to the belt? How do I take over an existing project without having to steal their credentials? Show up and commit a little bit and then ask to get privileges. That's also the definition for how you become a committer on any open source project. So these trends are the things that are scary. If you just look at it as they're building their moves, at some point the big pay load is gonna come.
That's why we're talking about it; to try and raise awareness so more people can be prepared for when that happens.
Mark Miller: The key to what you just said for me Brian, is that this is not just a developing pattern, it is a documented pattern. Now we're seeing this process over and over again.