Modern security tools continue to improve in their ability to defend organizations’ networks and endpoints against cybercriminals. But the bad actors still occasionally find a way in.
Security teams must be able to stop threats and restore normal operations as quickly as possible. That’s why it’s essential that these teams not only have the right tools but also understand how to effectively respond to an incident. Resources like an incident response template can be customized to define a plan with roles and responsibilities, processes and an action item checklist.
But preparations can’t stop there. Teams must continuously train to adapt as threats rapidly evolve. Every security incident must be harnessed as an educational opportunity to help the organization better prepare for — or even prevent — future incidents.
SANS Institute defines a framework with six steps to a successful IR.
1. Preparation
2. Identification
3. Containment
4. Eradication
5. Recovery
6. Lessons learned
While these phases follow a logical flow, it’s possible that you’ll need to return to a previous phase in the process to repeat specific steps that were done incorrectly or incompletely the first time.
Yes, this slows down the IR. But it’s more important to complete each phase thoroughly than to try to save time expediting steps.
1: Preparation
Goal: Get your team ready to handle events efficiently and effectively
Everybody with access to your systems needs to be prepared for an incident — not just the incident response team. Human error is to blame for most cybersecurity breaches. So the first and most important step in IR is to educate personnel about what to look for. Leveraging a templated incident response plan to establish roles and responsibilities for all participants — security leaders, operations managers, help desk teams, identity and access managers, as well as audit, compliance, communications, and executives — can ensure efficient coordination.
Attackers will continue to evolve their social engineering and spear phishing techniques to try to stay one step ahead of training and awareness campaigns. While most everybody now knows to ignore a poorly written email that promises a reward in return for a small up-front payment, some targets will fall victim to an off-hours text message pretending to be their boss asking for help with a time-sensitive task. To account for these adaptations, your internal training must be updated regularly to reflect the latest trends and techniques.
Your incident responders — or security operations center (SOC), if you have one — will also require regular training, ideally based on simulations of actual incidents. An intensive tabletop exercise can raise adrenaline levels and give your team a sense of what it’s like to experience a real-world incident. You might find that some team members shine when the heat is on, while others require additional training and guidance.
Another part of your preparation is outlining a specific response strategy. The most common approach is to contain and eradicate the incident. The other option is to watch an incident in progress so you can assess the attacker’s behavior and identify their goals, assuming this does not cause irreparable harm.
Beyond training and strategy, technology plays a huge role in incident response. Logs are a critical component. Simply put, the more you log, the easier and more efficient it will be for the IR team to investigate an incident.
Also, using an endpoint detection and response (EDR) platform or extended detection and response (XDR) tool with centralized control will let you quickly take defensive actions like isolating machines, disconnecting them from the network, and executing counteracting commands at scale.
Other technology needed for IR includes a virtual environment where logs, files, and other data can be analyzed, along with ample storage to house this information. You don’t want to waste time during an incident setting up virtual machines and allocating storage space.
Finally, you’ll need a system for documenting your findings from an incident, whether that’s using spreadsheets or a dedicated IR documentation tool. Your documentation should cover the timeline of the incident, what systems and users were impacted, and what malicious files and indicators of compromise (IOC) you discovered (both in the moment and retrospectively).
2: Identification
Goal: Detect whether you have been breached and collect IOCs.
There are a few ways you can identify that an incident has occurred or is currently in progress.
- Internal detection: an incident can be discovered by your in-house monitoring team or by another member of your org (thanks to your security awareness efforts), via alerts from one or more of your security products, or during a proactive threat hunting exercise.
- External detection: a third-party consultant or managed service provider can detect incidents on your behalf, using security tools or threat hunting techniques. Or a business partner may see anomalous behavior that indicates a potential incident.
- Exfiltrated data disclosed: the worst-case scenario is to learn that an incident has occurred only after discovering that data has been exfiltrated from your environment and posted to internet or darknet sites. The implications are even worse if such data includes sensitive customer information and the news leaks to the press before you have time to prepare a coordinated public response.
No discussion about identification would be complete without bringing up alert fatigue.
If the detection settings for your security products are dialed too high, you will receive too many alerts about unimportant activities on your endpoints and network. That is a great way to overwhelm your team and can result in many ignored alerts.
The reverse scenario, where your settings are dialed too low, is equally problematic because you might miss critical events. A balanced security posture will provide just the right number of alerts so you can identify incidents worthy of further investigation without suffering alert fatigue. Your security vendors can help you find the right balance and, ideally, automatically filter alerts so your team can focus on what matters.
During the identification phase, you will document all indicators of compromise (IOCs) gathered from alerts, such as compromised hosts and users, malicious files and process, new registry keys, and more.
Once you have documented all IOCs, you will move to the containment phase.
3: Containment
Goal: Minimize the damage.
Containment is as much a strategy as it is a distinct step in IR.
You will want to establish an approach fit for your specific organization, keeping both security and business implications in mind. Although isolating devices or disconnecting them from the network may prevent an attack from spreading across the organization, it could also result in significant financial damage or other business impact. These decisions should be made ahead of time and clearly articulated in your IR strategy.
Containment can be broken down into both short- and long-term steps, with unique implications for each.
- Short-term: This includes steps you might take in the moment, like shutting down systems, disconnecting devices from the network, and actively observing the threat actor’s activities. There are pros and cons to each of these steps.
- Long-term: The best-case scenario is to keep infected system offline so you can safely move to the eradication phase. This isn’t always possible, however, so you may need to take measures like patching, changing passwords, killing specific services, and more.
During the containment phase you will want to prioritize your critical devices like domain controllers, file servers, and backup servers to ensure they haven’t been compromised.
Additional steps in this phase include documenting which assets and threats were contained during the incident, as well as grouping devices based on whether they were compromised or not. If you are unsure, assume the worst. Once all devices have been categorized and meet your definition of containment, this phase is over.
Bonus step: Investigation
Goal: Determine who, what, when, where, why, how
At this stage it is worth noting another important aspect of IR: investigation.
Investigation takes place throughout the IR process. While not a phase of its own, it should be kept in mind as each step is performed. Investigation aims to answer questions about which systems were accessed and the origins of a breach. When the incident has been contained, teams can facilitate thorough investigation by capturing as much relevant data as possible from sources like disk and memory images, and logs.
This flowchart visualizes the overall process:
You may be familiar with the term digital forensics and incident response (DFIR) but it’s worth noting that the goals of IR forensics differ from the goals of traditional forensics. In IR the primary goal of forensics is to help progress from one phase to the next as efficiently as possible in order to resume normal business operations.
Digital forensics techniques are designed to extract as much useful information as possible from any evidence captured and turn it into useful intelligence that can help build a more complete picture of the incident, or even to aid in the prosecution of a bad actor.
Data points that add context to discovered artifacts might include how the attacker entered the network or moved around, which files were accessed or created, what processes were executed, and more. Of course, this can be a time-consuming process that might conflict with IR.
Notably, DFIR has evolved since the term was first coined. Organizations today have hundreds or thousands of machines, each of which has hundreds of gigabytes or even multiple terabytes of storage, so the traditional approach of capturing and analyzing full disk images from all compromised machines is no longer practical.
Current conditions require a more surgical approach, where specific information from each compromised machine is captured and analyzed.
4: Eradication
Goal: Make sure the threat is completely removed.
With the containment phase complete, you can move to eradication, which can be handled through either disk cleaning, restoring to a clean backup, or full disk reimaging. Cleaning entails deleting malicious files and deleting or modifying registry keys. Reimaging means reinstalling the operating system.
Before taking any action, the IR team will want to refer to any organizational policies that, for example, call for specific machines to be reimaged in the event of a malware attack.
As with earlier steps, documentation plays a role in eradication. The IR team should carefully document the actions taken on each machine to ensure that nothing was missed. As an additional check you can perform active scans of your systems for any evidence of the threat after the eradication process is complete.
5: Recovery
Goal: Get back to normal operations.
All your efforts have been leading here! The recovery phase is when you can resume business as usual. Determining when to restore operations is the key decision at this point. Ideally, this can happen without delay, but it may be necessary to wait for your organization’s off-hours or other quiet period.
One more check to verify that there aren’t any IOCs left on the restored systems. You will also need to determine if the root cause still exists and implement the appropriate fixes.
Now that you have learned about this type of incident, you’ll be able to monitor for it in the future and establish protective controls.
6: Lessons learned
Goal: Document what happened and improve your capabilities.
Now that the incident is comfortably behind you, it’s time to reflect on each major IR step and answer key questions, there are plenty of questions and aspects that should be asked and reviewed, below are a few examples:
Probing these will help you step back and reconsider fundamental questions like: Do we have the right tools? Is our staff appropriately trained to respond to incidents?
Then the cycle returns to the preparation, where you can make necessary improvements like updating your incident response plan template, technology and processes, and providing your people with better training.
4 pro tips to stay secure
Let’s conclude with three final suggestions to bear in mind:
1. The more you log, the easier investigation will be. Make sure you log as much as possible to save money and time.
2. Stay prepared by simulating attacks against your network. This will reveal how your SOC team analyzes alerts and their ability to communicate – which is critical during a real incident.
3. People are integral to your org’s security posture. Did you know 95% of cyber breaches are caused by human error? That’s why it’s important to perform periodic training for two groups: end users and your security team.
4. Consider having a specialized 3rd party IR Team on call that can immediately step in to help with more difficult incidents that may be beyond your team’s ability to resolve. These teams, which may have resolved hundreds of incidents will have the IR experience and tools necessary to hit the ground running and accelerate your IR.