Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage] [index] [] [@jschauma] [RSS]

Patching is hard. Knowing what to patch is harder still.

September 17th, 2017

Friends, SysAdmins, #infosec, lend me your ears
I come to bury Equifax, not praise them.
The incompetence that companies employ lives after them;
the laziness they oft interred with their bones;
So let it be with Equifax.

After Equifax was pwned, word got out that CVE-2017-5638, a Remote Code Execution vulnerability in Apache Struts, was exploited in the attack. This vulnerability had been published on March 7th, 2017; the compromise was first detected on July 29th, 2017, with earliest access as far back as May 13th, 2017. That is, the vulnerability had been exploited approximately 2 months after it was announced.

This two month window then quickly became a major point of criticism from the #infosec twitterati and some press coverage. I frequently argue in favor of regular, automated software updates; I know from experience that unpatched software poses the biggest risk to just about any organization, that attackers are more likely to use known, published vulnerabilities rather than the ominous 0-day. But I also know that patching a large infrastructure is not trivial.

Over the last 17 years or so, I have helped design, develop, build, maintain, and secure large scale systems that you likely have used (albeit often indirectly). I have tried to solve the problem of applying software updates in a variety of ways, and I have come to the conclusion that it's not currently realistic to believe that you can keep your entire infrastructure and serving stack up to date at all times.

Today's software dependencies are so complex that your systems likely rely on components that you've never heard of on the order of thousands of distinct packages with dozens of different versions being in use. (And this is not to mention that all of the services in question nowadays frequently are running inside virtual machines on hardware that you do not even own.) We're again using static binaries and in some ways lost visibility of what's running where.

How people imagine patching works. How patching 'works' in reality.
How people imagine software vulnerability patching works. An abbreviated reality of how software vulnerability patching "works".

The logistics of getting a software update out to all your systems are complicated; see e.g. this Twitter thread for a summary. (There is much work needed to make this process easier. Auto-updating containers and auto-rebooting services are just some of my crazy ideas.) But that's step two -- step one is to actually be aware that a software update is available and required.

"Easy peasy, just monitor all CVEs from NIST!" -- right? Well, having tried this approach before, let me tell you some of the ways that this doesn't work:

  • There's just to damn many.
    In the last 16 days, almost 600 new CVEs have been published; over 7000 in 2017 so far. (And we're ignoring the number of old CVEs that are updated with new findings and hence might need new attention.) And the rate at which they are discovered appears to be increasing. Which leads to...
  • Most of them don't apply to you.
    Hey, you'd think that'd be a good thing. But the problem is that this leads to way too many false positives if you're trying to look at them manually. Which you kind of have to, because...
  • Identifying which software a CVE applies to is difficult.
    The Common Platform Enumeration used in the CVEs tries to identify software by vendor, name, and version. But version comparison across package managers, public and private forks, enabled features, runtime configuration etc. yields a many-to-many mapping with low certainty of applicability without input by the local subject matter expert. Which is tricky, because...
  • You often don't know who in your organization would know more about a random piece of software.
    The majority of the software you are using is pulled in as a dependency, possibly without anybody really knowing what it does or that you use it (remember left pad?). Even for some software heavily relied upon in your organization, you may not have a team actually tracking updates; you often run with whatever version happened to be working at the time the product was initially released. But it's worse than that, because...
  • You don't even know all the places a given piece of software is used in.
    Remember ImageTragick? How long did it take you to find all the places where libmagic was involved? Where somebody had bundled a version of the library of a different version than the one handled by the OS or your package manager? Where containers were created containing a vulnerable version?

So, yes, you can slurp in all the data, but making it actionable requires a lot of work. Worse, this uses a blacklist approach to system security: "we know software X, Y, and Z is used here, so we will keep an eye open for vulnerabilities in those". (Not to mention that this only covers vulnerabilities once they actually are assigned a CVE; a lot of software providers or open source projects out there don't even (know to) request CVEs for security fixes.)

None of this is inherently unsolvable, but all of this is hard. You may argue that we should be in the business of solving hard problems like these -- and I would applaud you, for I have made that case myself. But that sort of work is like insurance: generally regarded as not particularly sexy, non-revenue generating (and frequently interrupting revenue streams), and only of interest after the shit has hit the fan.

Ensuring that even only those packages you (know you should) care about are updated across your entire fleet, patching your systems and services, as of today, is still a very difficult and laborsome task, that may take several weeks even if it is prioritized correctly.

One problem with prioritizing software updates is the constant SysAdmin's Dilemma: if you're doing your job well, nobody notices. Patching systems and then not getting compromised does not get recognition; the business may only know that this happened if the update negatively impacted productivity, revenue, uptime, or what have you. Which is one of the reasons why hardly anybody applies all software updates all the time. Instead, we (try to) spot-patch when we're under high pressure or imminent (known) threat.

But spot-patching packages only fixes symptoms of a fragile security infrastructure. Band-aids don't fix bullet holes. The only alternative here is to design and build your systems from the ground up for resiliency, to allow for rapid, regular, automated, and unattended updates. Rest assured, though, that this requires a fundamental shift in how we approach software development and system- and service maintenance.

So no, I'm not excusing Equifax for their slow patching (doing so would be both irrelevant and wrong), but I suggest that the focus on the patch-window is a silly whack-a-mole distraction. Which is in Equifax's interest: Yes, they should have updated Struts. They should have known that they use Struts, and they should have tracked releases and security announcements, then taken action. But I do think there's a bit more to this:

I don't know the details of the full attack chain, but usually, an RCE in the web framework is an entry point, not "mission accomplished". There usually is a need for lateral movement, for retaining and expanding access, to elevate privileges, and to finally exfiltrate the data without being detected. In other words, there are a number of other failures that took place here that we are not talking about.

Given the quality of the breach response and whatever else came to light since then, I suspect that making public that the breach involved CVE-2017-5638 is a convenient way for Equifax to shift the blame: In the public mind, cyber is indistinguishable from magic, and so Equifax could not possibly have defended themselves against those using dark and evil spells. An "RCE in the Apache Struts Framework" sure sounds a lot more dangerous and difficult to defend against than "we used admin/admin to let third parties access our data".

September 17th, 2017

[Safely Creating And Using Temporary Files] [Index] [The Razor's Edge - Cutting Your TLS Baggage]