SIRAcon Attendees, Start Your Engines

Michael Roytman    October 25, 2013

“Information is the oil of the 21st century, and analytics is the combustion engine.” –  Peter Sondergaard, SVP Gartner

SIRAcon

This week I attended SIRAcon in Seattle, a conference hosted by the Society of Information Risk Analysts. I spoke about the methodology behind Risk I/O’s “fix what matters” approach to vulnerability management, and how we use live vulnerability and real-time breach data to build the model, as well as why such a model performs better than existing CVSS-based risk rankings. However, there were a few persistent themes between the many qualified and excellent speakers at the conference. Combining and implementing these practices is not a simple matter, but organizations should take note, and as an industry, information security can evolve.

1. This is not our first rodeo.

Risks are everywhere – and other industries not that different from ours have caught on. Ally Miller’s morning keynote discussed the structured, quantified way in which fraud detection teams are built: starting with real-time data collection, updating of large, global models which guide decisions about fraud, and the ability to make those decisions in real-time. This requires clever interfacing with business processes and excellent infrastructure, but it’s been done before, and needs to be done with respect to vulnerability management as well. Alex Hutton used Nate Silver’s latest book on Bayesian modeling to raise some parallel questions about infosec. He drew analogues to seismology and counter-terrorism, and the maturity and relative similarity of those fields (large, often hard to quantify or observe risk) is something for us to explore as well. Lastly, his talk raised a healthy discussion on the differences between forecasting and prediction. A prediction describes the expectation of a specific event (“it will rain tomorrow”), whereas a forecast is more general, and describes the probability of a number of events over time (“there will be 2 inches of rain in december”, “there is a 20% chance of rain over the next day”). Largely, the discussion was focused on how management perceives differences between the two. In seismology, we fail at prediction because the mechanics are obfuscated, and so we can only forecast. The same seems to be largely true of infosec.

2. Good models need good data.

Adam Shostack from Microsoft gave a very convincing closing keynote on the value of data-driven security programs. Running experiments targeted at collecting data will generate scientific tools and take the qualitative (read: fuzzy) decision-making out of risk management. The alternative is the status quo – reliance on policies or measuring organizational performance against standards, which is paramount to stagnation, which no one can say about our adversaries. He states that although almost all organizations have been breached, it is incredibly difficult to develop models of breaches, largely because global breach datasets are hard to come by. Not so! We’re hard at work incorporating new sources of breach data into Risk I/O – but he’s most certainly correct that this is a hard project for any single company to undertake. Adam concluded with a call for organizations to encourage better sharing of data (hear, hear), and this mirrored the sentiment of other talks (particularly Jeff Lowder’s discussion of why we need to collect data to establish base-rate probabilities) about the need for a centralized, CDC-like body for infosec data.

So let’s get some data. We’re already off to a pretty good start.

6 thoughts on “SIRAcon Attendees, Start Your Engines

  1. Adam Shostack

    Thanks for the kind words!

    I think the work you’re doing is fascinating, and raises important questions. It’s tremendously helpful to understand what definitions, methodologies and data sets feed into our understanding. Let me define breaches1 as “an IDS detects an attack against a vuln found by a scanner” and breaches2 as “a lawyer writes a letter to a regulator.” (Definition 1 is my takeaway from questions in your talk.)

    With that in mind, what’s the relationship between the 1.5 million breaches1 you see, and the 1,500 breaches2 that datalossDB sees? What happens to the other 1.5 million? Presumably, some subset are false positives from the two tools. Some may cause the target to crash, rather than be taken over. However, many are probably real unauthorized acquisition of control flow on a system. Can we quantify those? What prevents the rest from becoming a “Breach2”? Is it layered preventative controls? A failure of detective controls? A failure of attackers to ever do anything?

    Lastly, two nits: I lack data about how many organizations have been breached, and argued by appeal to authorities. Second, I didn’t call for a CDC-like body. While I’m not opposed to that, I think there’s a number of possible models and my main criteria is how much data would be published to enable diverse analytic approaches.

    Btw, my slides are at http://sdrv.ms/H51F1T

  2. Michael RoytmanMichael Roytman Post author

    Thanks for writing Adam!

    My apologies for pinning down CDC. To be honest, less so that the specifics of what the CDC does or how it’s run (I’ve got very little clue) I am trying to expose the takeaway of the need for a centralized data repository (which I think most in attendance, yourself included, agree on in theory if not on the execution of such a body).

    You’re dead-on about breaches 2 vs breaches1. In this case, breaches1 are potentially 0-impact breaches (because no data loss occurred, or maybe the breach was 1 of a 3-chain exploit wherein the last two failed). Breaches1 are best for determining base rates of attack probabilities, not for base rates of “events that cause damage” which is likely the more accurate definition of a “breach”.

    Breaches2 are only those which have had impact (actually, significant enough to incur a lawyer!). Moreover, datalossdb includes other types of breaches (essentially, everything in the DBIR that’s not “hacking”). Likely there is a small subset of 2 which is (proportionally) an even smaller subset of 1 that is the real meaty, impactful data we care about in terms of hacking attacks because it quantifies both impact and likelihood. Our dataset only deals with likelihood.

  3. Ed Bellisebellis

    @Adam and @Michael. In this case I would actually extend breaches1 definition. Our data is based on not just an IDS alert against a vulnerability scan but includes ‘indicators of compromise’ as determined by the in-place SIEM that suggest the exploit was successful.

    In this case I think it’s closer to a breach definition than an attack definition. Michael is correct in that we have no visibility into actual impact of the breach.

  4. Adam Shostack

    Hi Ed,

    Can you say a bit more about what sort of indicators for a SIEM? (For example, are these high S/N ratio alerts within 15 minutes? Or ‘the SIEM said something about that target within a week’? )

  5. Ed Bellisebellis

    Hey Adam, no problem. I think establishing definitions is important to frame the conversation. Unfortunately there isn’t just a small set of IOCs that I could throw out here as being used to determine a breach. I can say a number of these IOCs are open sourced and shared across the Open Threat Exchange using OpenIOC.

    Some examples could be found in the AlienVault Labs github repo: https://github.com/AlienVault-Labs/AlienVaultLabs

    That’s far from an exhaustive list but at least would give you some ideas of the IOCs that are being matched to IDS alerts.

Leave a Reply

Your email address will not be published. Required fields are marked *