Most people are familiar with or even encountered phising attacks. A lot of research and detection methods have been developed regarding preventing phising attacks which target a general audience. However, much less work has been done to prevent spearphising attacks. Like a phising attacks, a spearphising attack involves using emails to trick recipients into performing a dangerous action. However, unlike phising attacks, spearphising attacks are specifically targeted towards a single user rather than a general audience. Emails sent in spearphising attacks are specially crafted for the target making them much more difficult to attack. Additionally, since spearphising attacks are much more uncommon than phising attacks, it’s difficult to develop detection methods that can catch new spearphising attacks. This blog post summarizes the work done to develop a detection technique for spearphising attacks as published in a 2017 paper entitled “Detecting Credential Spearphising Attacks in Enterprise Settings” [1].
The authors of this paper develop a spearphising detection technique that successfully detects attacks in a real set of enterprise email data. Their method combines both domain knowledge and data analysis. This paper won the 2017 Internet Defense Prize and stands out because spearphising attacks are a notoriously difficult attack to detect.
The main contributions of this paper are:Spearphising is unusual from other security attacks in that it doesn't require a lot of technical sophistication. Attacks are hard to detect because emails are handcrafted. This paper looks at credential spearphising, a specific type of spearphising attack where the malicious email tries to convince the recipient into clicking a link and then entering their credentials. Victims of this attack generally possess some sort of privileged access, so the authors focus on detecting attacks within an enterprise setting.
The first portion of this paper goes into detail about the attack taxonomy of spearphising attacks which is vital to the success of their detection method. They break the attack down into two stages:
The authors further split the lure stage into three distinct types of impersonation models. Spearphising usually involves posing as a trusted figure and the impersonation models represent how the attacker goes about accomplishing this. Their detection method targets three types of impersonation:
The data used in the spearphising detection mechanism was obtained from Lawrence Berkley National Labs. The dataset was completely anonymized as all the values in the logs were hashed. This means, for example, that the authors could not actually see an original email address but only it's hashed value. Since the same email address will have the same hash value, they are still able to determine if that address has previously sent emails. They use three different types of data in their detection method:
Based on this dataset and the attack taxonomy developed earlier, the authors customize specific characteristics that correspond to each type of spearphising attack. Figure 2 shows an overview of the charateristics.
A spearphising attack will comprise of two stages: the lure stage and the exploit stage. Therefore, the authors combine features from both to determine if a spearphising attack has occured. For the lure stage, they look specifically at sender reputation. The sender reputation will vary based on which impersonation model is being used.
For a name spoofing attack, the attacker wants to pose as a real person using a fake email address. In this case, we can expect that the name of the email is reputable, but the email address is not. In terms of features, they expect that the name used in this attack sends emails quite frequently but the address itself, which is only posing as the sender, will not appear frequently.
For a previously unseen attack, the attacker does not base their reputation on any known person. Therefore, it can be expected that the frequency of both the name and the email address are low.
For a lateral attack, the attacker gains access to a legitimate user accounts and uses that account to send their spearphising email. In this case, the authors look at LDAP logs for login information of the account sending the email. They expect that the location of login will be suspicious because of unauthorized use. Therefore, they look at how many other employees have logged into the same city and how often the sender logs in from the city.
Detecting the exploit stage requires looking at the domain reputation of the URL in the spearphising attack. The authors assume that the URL used in a spearphising attack will be relatively uncommon. Therefore, they use the NIDS logs to look for cases where the URL is not frequently visited by employees and the difference between the first recorded visit to the URL and when the email containing the URL arrived.
Using the features described earlier, the authors develop a detection mechanism that allows them to detect spearphising attacks in real time. The most important part of the detection mechanism is the algorithm they used for scoring each event and comparing features, called Directed Anomaly Scoring (DAS).
When an event arrives, DAS ranks the event by comparing how suspicious it is relative to previous events. The event is represented by its feature vector which contains 2 domain reputations and 2 sender reputation features. These features correspond to the lure and exploit stages as described in the previous section. The event is scored for each of the 3 types of impersonation model since each impersonation model looks at different features. The event receives an anomaly score based on how many events it is more suspicious than. An event E is only as suspicious as another event E’ if all of its features are more suspicious than the features of event E’. DAS works well because the attack model for spearphising is so specific and it’s important to minimize the number of false positives. For example, even if the sender name for an event is extremely uncommon, the email will not be flagged if it contains a common URL. Figure 3 shows how DAS works scoring an event based on how suspicious it is compared to other events. As we can see, even though Event 1 has a “suspicious” score in one of its features, DAS only categorizes it as more suspicious than Event A because it is benign in the other feature.
DAS is used as the scoring mechanism in the overall detection architecture presented in Figure 4. There are three parts to the detection process:The authors evaluated how their detection method performed on the dataset obtained from LBNL. This datset contained over 370 million emails. They evaluate their method based on performance (true positives) and time burden (false positives). Their goal is to detect all spearphising attacks in the dataset while maintaining a small number of alerts per day. They designate an alert budget of 10 alerts per day: 4 alerts for a name spoofing attack, 4 alerts for a previously unseen attack and 2 alerts for a lateral attacker (which they expect to occur less often). They establish 19 spearphising attacks within the dataset based on the incident database of the LBNL security team.
With a median of 7 alerts per day, their detection system successfully identified 17/19 of the attacks. This meets the goals originally set by the authors. Although their system missed 2 attacks, their detection method is a vast improvement for spearphising detection. They compared their detection model to several other machine learning models commonly used for anomaly detection. They found that other algorithms could find at most 4/19 of the spearphising attacks with a daily alert budget of 10 alerts. To achieve detection of 17/19 attacks, the other models needed an increased alert budget of more than 91-2,455 alerts per day. Maintaining a low alert budget is important because a security analyst is manually inspecting each alert.
I really enjoyed this paper because it combined both machine learning/data analysis with security knowledge to develop a detection mechanism for an attack that was previously very difficult to intercept. The authors did a very good job of explaining their entire detection mechanism and how each part worked. Their idea was both simple and effective and could be applied easily in real world settings. As a security professional who has worked in security operations centers before, I appreciate that they took into account a low alert budget and real world constraints. Spearphising detection is so difficult because of the lack of data available. It would be interesting to see how this approach applies to other enterprise settings because of how specific the features are. For example, I work for a organization that has employees telecommuting from all over the United States. I imagine it would be hard to establish features for a lateral attacker because the location for login varies so greatly.
CS/ECE 5584: Network Security, Fall 2017, Ning Zhang