Collaborative Data Science – Inside the 2016 Verizon DBIR Vulnerability Section.

Michael Roytman    May 1, 2016

The best part about working in a nascent, yet-unsolved-perhaps-never-to-be-solved industry is that the smartest minds are often struggling with the same problems, and are only a tweet or a phone call away if you need help. I’ve had help from fellow data scientists, NIST and MITRE folk, competitors, practitioners, professors and the like. While rock-star-syndromes are surely out there and aren’t very helpful, I want to point out that the industry is full of brilliant people without much ego, ready to help throughout their busy days.

Recently, Adrian Sanabria from 451Research called me to question some of the assumptions I used to write the 2016 Verizon DBIR Vulnerability section. We had a two-hour chat and some follow up emails, and he led me to think of a new way to look at this data. This is absolutely indispensable when we defenders are working together against a sentient attacker. It’s also thrilling! Where else does one get to collaborate with competitors, researchers and clients alike as part of of their day to day job?

Adrian’s big insight was that the top 10 list of vulnerabilities is confusing because the applicability to the enterprise is lost on these arcane, working exploits, and the gap between peoples’ expectations and reality is creating some friction. We had an excellent offline discussion in which he dove deeply into the assumptions of my work, asked thoughtful, deep questions in private, and together, we came up with a better metric for generating a top 10 vulnerabilities list.

To address these issues, I scaled the total successful exploitation count for every vulnerability in 2015 by the number of observed occurrences of that vulnerability in Kenna’s aggregate dataset. Sifting through 265 million vulnerabilities gives us a top 10 list perhaps more in line with what was expected – but equally unexpected! The takeaway here is that datasets like the one explored in the DBIR might be noisy, might have false positives and the like, but carefully applied to your enterprise the additional context successful exploitation data lends to vulnerability management is priceless. Generating signal is science. So here’s a new top 10 list (biased because occurrence rates are measured by Kenna Security customers, a very convenient convenience sample).

Prevalence Scaled Top 10 Vulnerabilities

Prevalence Scaled Top 10 Vulnerabilities

1. CVE-2002-0013 – Vulnerabilities in the SNMPv1 request handling of a large number of SNMP implementations allow remote attackers to cause a denial of service or gain privileges via (1) GetRequest, (2) GetNextRequest, and (3) SetRequest messages
2. CVE-2002-0012 – Vulnerabilities in a large number of SNMP implementations allow remote attackers to cause a denial of service or gain privileges via SNMPv1 trap handling
3. CVE-2015-0204 – The ssl3_get_key_exchange function in s3_clnt.c in OpenSSL before 0.9.8zd, 1.0.0 before 1.0.0p, and 1.0.1 before 1.0.1k allows remote SSL servers to conduct RSA-to-EXPORT_RSA downgrade attacks and facilitate brute-force decryption by offering a weak ephemeral RSA key in a noncompliant role, related to the “FREAK” issue. NOTE: the scope of this CVE is only client code based on OpenSSL, not EXPORT_RSA issues associated with servers or other TLS implementations.
4. CVE-2001-0540 – Memory leak in Terminal servers in Windows NT and Windows 2000 allows remote attackers to cause a denial of service (memory exhaustion) via a large number of malformed Remote Desktop Protocol (RDP) requests to port 3389.
5. CVE-2015-1637 – Schannel (aka Secure Channel) in Microsoft Windows Server 2003 SP2, Windows Vista SP2, Windows Server 2008 SP2 and R2 SP1, Windows 7 SP1, Windows 8, Windows 8.1, Windows Server 2012 Gold and R2, and Windows RT Gold and 8.1 does not properly restrict TLS state transitions, which makes it easier for remote attackers to conduct cipher-downgrade attacks to EXPORT_RSA ciphers via crafted TLS traffic, related to the “FREAK” issue, a different vulnerability than CVE-2015-0204 and CVE-2015-1067.
6. CVE-2012-0152 – The Remote Desktop Protocol (RDP) service in Microsoft Windows Server 2008 R2 and R2 SP1 and Windows 7 Gold and SP1 allows remote attackers to cause a denial of service (application hang) via a series of crafted packets, aka “Terminal Server Denial of Service Vulnerability.”
7. CVE-2001-0877 – Universal Plug and Play (UPnP) on Windows 98, 98SE, ME, and XP allows remote attackers to cause a denial of service via (1) a spoofed SSDP advertisement that causes the client to connect to a service on another machine that generates a large amount of traffic (e.g., chargen), or (2) via a spoofed SSDP announcement to broadcast or multicast addresses, which could cause all UPnP clients to send traffic to a single target system.
8. CVE-2001-0876 – Buffer overflow in Universal Plug and Play (UPnP) on Windows 98, 98SE, ME, and XP allows remote attackers to execute arbitrary code via a NOTIFY directive with a long Location URL.
9. CVE-2013-0229 – The ProcessSSDPRequest function in minissdp.c in the SSDP handler in MiniUPnP MiniUPnPd before 1.4 allows remote attackers to cause a denial of service (service crash) via a crafted request that triggers a buffer over-read.
10. CVE-2014-0160 – The (1) TLS and (2) DTLS implementations in OpenSSL 1.0.1 before 1.0.1g do not properly handle Heartbeat Extension packets, which allows remote attackers to obtain sensitive information from process memory via crafted packets that trigger a buffer over-read, as demonstrated by reading private keys, related to d1_both.c and t1_lib.c, aka the Heartbleed bug.

What we set out to build at Kenna Security is a new way of thinking about information security vulnerabilities – a framework for efficiently measuring remediation, a way to stay ahead of attackers by making sure that every action was a useful one.

At first it seemed like an impossible task, precisely because of the issues outlined in @attritionorg’s incredibly thorough and well thought out criticism of the Verizon DBIR’s vulnerability chapter. On the whole, Brian is correct: IDS alerts generate a ton of false positives, vulnerability scanners often don’t revisit signatures, CVE is not a complete list of vulnerability definitions.

But those are just the trees, and we’ll get to them later. The forest is that this somewhat stark status quo is exactly the situation faced by thousands of enterprise practitioners, who are armed with nothing but their vulnerability scans and some logs, and must wade through millions of vulnerabilities and determine what must be done next. We seek to combine these datasets, in some inventive ways – not to say “this somewhat noisy data indicates you must fix an FTP vulnerability or perish”, but rather to use exploitation data to add context to vulnerability scans, and to confirm some assumptions about automated attacks.

But most importantly, we seek to give the reader of the Verizon DBIR cold hard data to take back to management and be able to say “Yes I know it’s patch Tuesday, but today I shall fix a CVE from 2006, because it’s likely actively exploited, and 273 of our servers are vulnerable to it”. That is the reason the Verizon DBIR is useful and esteemed. Insights generated from data, no matter how counter-intuitive the insight, is the whole of why data science exists and can help security practitioners.

Dan Geer and Jay Jacobs’ article on how to properly describe convenience samples (make no mistake, this is one of them), borrows a few excellent ideas from medicine and lays out a few guidelines, so I will follow them here (better late than never):

The data used by the Verizon DBIR Vulnerability section is comprised of two datasets.

The first is a convenience sample that includes 2,442,792 assets (defined as: workstations, servers, databases, ips, mobile devices, etc) and 264,912,235 vulnerabilities associated to those assets. The vulnerabilities are generated by 8 different scanners, they are: Beyond Security, Tripwire, McAfee VM, Qualys, Tenable, BeyondTrust, Rapid7, and OpenVAS . This dataset is used in determining remediation rates and the normalized open rate of vulnerabilities.

The second is a convenience sample that includes 3,615,706,022 successful exploitation events which all take place in 2015 which come from partners such as Alienvault’s Open Threat Exchange.

Please note the methodology of data collection: Successful Exploitation is defined as one successful technical exploitation of a vulnerability on one machine at a particular timestamp. The event is defined as: 1. An asset has a known CVE open. 2. An attack come in that matches the signature for that CVE on that asset and 3. One or more IOCs are detected/correlated post attack. It is not necessarily a loss of a data, or even root on a machine. It is just the successful use of whatever condition is outlined in a CVE. If any readers would like to see a sample of the dataset, feel free to each out to me at my Kenna Security email, Below are descriptive statistics for every CVE in the original top 5. The data comes from internally instrumented dashboards which will update live as new data rolls in (every hour).

  1. CVE-2015-1637
  2. CVE-2015-0204
  3. CVE-2003-0818
  4. CVE-2002-1054
  5. CVE-2002-0126 

Enterprises use vulnerability scanners to manage vulnerabilities, the lowest-hanging fruit is to answer the question “Of these vulnerabilities, where should I focus my remediation efforts?”. In fact, the entire vulnerabilities section of this year’s DBIR focuses on what can be done given the status quo of vulnerability management, not what should be done in a perfect world. The underlying distributions indicate that 1. Attackers target old vulnerabilities often 2. Attackers automate their campaigns and spray across the internet. The implications can then be used to formula a strategy that is more effective. Essentially, patch Tuesday rolls along. Everyone scrambles to fix the new MS vulnerabilities. The data says “No. Be better.”

data data data everywhere

Data is your voice of reason.

Our point is not “If you don’t have BlackMoon FTP you’re safe”. Our point is that fat tailed statistical distributions necessitate a different approach to vulnerability management. It would be a shame if we lost the forest for the exploit signatures.

Leave a Reply

Your email address will not be published. Required fields are marked *