Incidents as a measure of cybersecurity progress
Phil Venables published a helpful collection of ways that risk and cybersecurity leaders can share their successes, ideally on an ongoing basis. His working theory, which I believe is correct, is that we’re not great at this. And as a result, many of our peers only hear from us when things go sideways, which leads to a variety of problems.
His first suggestion is aptly focused on incidents:
The classic case is incidents. Your main moment in the sun might be in the middle of an incident. If successfully detected and recovered from then you will likely get some kudos. But, too many of these and leadership might conclude you’re not doing an effective job. However, you can provide a better perspective if you place these incidents in some context, such as the percentage of incidents experienced vs. incidents that were avoided because threats were thwarted or risks were mitigated. Of course, this can require some degree of subjectivity depending on how you measure it. You could use a regularly communicated set of messages such as: this month our controls stopped 100,000+ phishing emails, repelled 200,000+ perimeter intrusion attempts, deflected 150,000+ malware infection attempts, and so on vs. only reporting the incidents. In the event of a truly sophisticated intrusion or other circumstance then those incidents might be better seen as what they are, that is an unusual event, not an event that is thought to happen upon every intrusion attempt.
Understanding how to think and talk about incidents is critically important. Among other things, incidents are a foundational measure of control effectiveness, continuous improvement, and overall operational workload in security operations.
I’ve encountered this challenge a number of times over the years in organizations big and small. A simplified version of my approach to incidents:
- Define “incident”: It should be clear to everyone when an incident occurs. Important so that you can respond, obviously, but also so that you know when to capture relevant data and information.
- Define incident severity levels: A typical model ranges from level 5 (least severe, think “a control prevented delivery of a phishing email”) to level 1 (most severe, think “we have to notify our customers and board of directors”).
- Have a simple, repeatable, measured incident management process: Determine where you’ll capture data and information related to incidents, and your workflow from initial documentation, response, and post-incident analysis.
If you do these three things, you’re positioned to respond, measure, improve, and communicate incident-related information. But to gain (and ultimately share!) useful insights, you have to ask the right questions and look at the data in useful ways. A few aspects of incidents that I’ve found useful for gaining insights and reporting include:
- Overall number of incidents: This is a performance indicator, not a success measure. Think of it as the building block or denominator for most other incident-related reporting. That said, there’s plenty to be learned from the numbers alone. For example, it’s okay to have a lot of incidents, particularly if they’re lower severity and you’re learning from them and making improvements. Conversely, having very few incidents might be cause for concern, as it might be a sign that incidents aren’t being detected or properly documented.
- Incidents by root cause: One of the most useful data points to capture during post-incident ananalysis are root causes. In general, repeat root causes aren’t ideal and are indicators that you want to take some preventative action. You’ll also learn that getting to a single root cause isn’t always easy.
- Incidents by severity: If you have 100 incidents in January, and 25% are higher severity, it’s probably a positive sign if you still have 100 in February but only 10% are higher severity.
There are many more, and far better examples. But in general, insights gleaned from incidents–trends in particular–are one of the most useful means of assessing operational maturity and making meaningful improvements to any system. As a cybersecurity leader, you can’t get too good at understanding and being comfortable talking about incidents and what they mean to your team and organization.