There’s been a lot of talk about Big Data in the security space over the past couple of years, and it seems that almost every week a new Big Data offering enters the space, whether it’s in discussion, in development, or in production. It’s no secret that here at Risk I/O, we’ve embraced the industry’s demands and are hard at work developing our precognition offerings, many of which have been dubbed “Big Data.” This is accurate, since we aggregate and correlate data both inter- and intra-client.

However, in traditional Big Data problems—such as consumer shopping or health care—we would be looking to mine huge data sources for patterns of behavior, or commonalities between clients and attackers. In information security, we have huge amounts of SIEM data on activity around our assets, and great data on which firms have which vulnerabilities. The problem is that we’re missing successful exploit data or breach data, so just hunting for patterns or casting regressions isn’t quite as productive, because what we’re after are the probabilities and locations of attacks. Given the limited exploit data out there, I’ve been looking for some workarounds. Thankfully, there are some alternative ways of reducing risk.

I wanted to share a quick insight we’ve gathered in the process of ramping up new offerings. Particularly, I’ve invested some time into analyzing game theory as a tool for infosec analysis. Game theory is an applied field of mathematics developed as a means of computing optimal strategy sets for rational actors.

A common application is to analyze adversaries locked in a repeated game, with the outcome of one affecting constraints on the subsequent games. Repeated games assume that players will have to take into account the impact of their current strategy on the future actions of other players; this is sometimes called their reputation. In web security, the analogue is clear cut: hackers are looking to choose which assets to exploit, and each one of us wants to decide which assets to patch first, which vulnerabilities to deal with first, etc. For a good literature review of the field, take a look at Game Theory Meets Network Security and Privacy. Good work is being done in academia already on the subject, and the results allow us to turn Big Data problems into tractable “medium data” problems rather quickly:

**Asset Network Topology Might Matter Less than You Think**

A recent paper by Sandia National Laboratories and Duke University uses a game theoretic model of arbitrary network topology, which varies in both the topology and the degree of uncertainty in links. It’s a great use of game theory to determine optimal strategies. The main result is that:

**IF **

(Some assets are prioritized higher than others, that is, utility from defending the network is non-homogenous)&&(the costs of defending or remediating an asset’s vulnerabilities are not too high)

**THEN **

Network topology doesn’t affect the payoff to the defender.

Most real world systems meet the first condition, and while “high costs” are a judgment call (there is some evidence in this blog post), they are minuscule in comparison to the costs of a data breach. The result is profound: it means a map of network topology—or “second order” spillover impacts from one asset being compromised—don’t need to be factored into a data model when choosing which assets to defend. All of a sudden, the data looks smaller. This does not imply that we should ignore the topology of networks, but it does mean that we need to think critically (test, test, test) about whether including all the data we have is the best approach.

The next question we need to answer is – does result this make sense? After a good discussion with @sintixerr and @MrMeritology on twitter (started by this post), I’m convinced that it does. Here’s **why** network topology likely doesn’t matter when defining the defender’s optimal strategy.

**Attack Paths –**In much of game theory, the best defense against an attacker, especially one about which there is little information, is dynamic. The reason? Attackers randomize their attack patterns and paths. As the number of attack paths grows, the value of a particular asset itself matters more than its’ effect on others. Simply put, if they’re going to get there, hackers will get where they need to eventually.**Attacker Types and Evolution –**The importance of assets changes (see @sintixerr’s post above) as a result of evolving hacker strategies (and types of attackers). And since we can’t (yet) predict how this importance will change, we also can’t predict which links in a network will matter more than others. It’s important to note here that most enterprises are also threatened by more than just one type of attacker, so any risk assessment will have conflicting estimates of risk. The aforementioned game theory paper proves this point by showing invariance to attacker type.

There’s too much uncertainty about if, who, where, or when one will be attacked. Credit to @MrMeritology for this Smithonian article, which distinguishes the problem at hand: “*A mystery cannot be answered; it can only be framed, by identifying the critical factors and applying some sense of how they have interacted in the past and might interact in the future.*” The takeaway here is that Big Data is not always smart data. Big Data will let us solve every jigsaw puzzle and an NxN rubrik’s cube. Smart data will tell us which factors truly matter.

Before we launch into full-scale Hadoop implementations and start firing up R regressions on every variable we can get our hands on, it’s worth our time to take a step back, think about what’s available to us, what’s not, and what that means. My contention is that thinking about optimal strategies, which are robust to uncertainly, can alleviate the need to predict exploits – at least until the data gets big enough. More insights from the frontlines coming soon.

Pingback: RSA Week Recap | The Risk I/O BlogThe Risk I/O Blog