For a field that loves statistics, computer security sure treats them casually. So, in digesting one of the many annual looks at computer security, I decided to play just as casually to make a point about their fallibility.
Verizon recently published their annual Data Breech Investigation Report (henceforth: DBIR) and one of the charts that really jumped out at me was the money shot: the hacking per capita. And, like so many before, such hacking reports often reflect more on the survey makers than cybersecurity trends.
Obviously, the case is being made that China is a problem. I’m not arguing with that. In fact I think it probably is. But I don’t think that the computer security community is close to understanding what’s going on, or why, and simplistic statistics don’t serve us very well.
So, the Verizon statistic allegedly tells us something about overall hacking by volume (percentage of hacks) and type (target: espionage, financial, other) and encourages us to conclude that China is the problem. But China represents one-fifth of the world’s population (and more Internet users than the United States). Does it have one-fifth of the world’s hackers? Of course we don’t have any data that would help us conclude that hacking is normally distributed within a population, but what if we did?
I didn’t have a credible way of reasonably estimating population of internet users by country, so I just normalized against global population and used Verizon’s statistics on hacking as a percentage.
Suddenly, the fact that China is responsible for 30 percent of all global hacking seems more reasonable, considering it has more Internet users than any other country in the world. I’ll note that Americans appear to be behaving much better than I’d expect, but perhaps that’s a result of bias in Verizon’s sampling methods.
We really need to fear the Romanians, Bulgarians and Armenians.
That point is that this is not the kind of data that public policy decisions should be based on. If we wanted to do this right, we’d need a good definition of what constituted an attack, and that’s a huge problem in and of itself: if my network is compromised with a Trojan is that one unit of attack (object: my network) or ten units (object: my home servers, desktop, laptop, and the machine I play World of Warcraft on) or two units (object: my log server and my file server)?
Back when I was on the SANS Newsbites editorial board, every year when someone would publish the results of a survey (“hacking is on the rise!”) I’d point out the problem of self-selected sample bias in surveys. I’d suggest that the title should be something like: “9/10 of people who were bored enough to fill in a survey at a conference, and who claimed their job title was CSO, CTO, or CEO checked the box that says ‘hacking is on the rise.'”
I think it’s only fair to Verizon to point out that they’ve been appropriately cautious with the statistic. For example, one Verizon spokesperson told ZDNet that the high number of data breaches attributed to China should not mean it is the most active perpetrator of cyberespionage activities.
We need new metrics and cautious action to make the web as free and safe as possible.