Detecting Credential Spearphishing in Enterprise Settings

A summary of the detection techniques presented in this paper

Posted by Kate Nguyen

Introduction

Most people are familiar with or even encountered phising attacks. A lot of research and detection methods have been developed regarding preventing phising attacks which target a general audience. However, much less work has been done to prevent spearphising attacks. Like a phising attacks, a spearphising attack involves using emails to trick recipients into performing a dangerous action. However, unlike phising attacks, spearphising attacks are specifically targeted towards a single user rather than a general audience. Emails sent in spearphising attacks are specially crafted for the target making them much more difficult to attack. Additionally, since spearphising attacks are much more uncommon than phising attacks, it’s difficult to develop detection methods that can catch new spearphising attacks. This blog post summarizes the work done to develop a detection technique for spearphising attacks as published in a 2017 paper entitled “Detecting Credential Spearphising Attacks in Enterprise Settings” [1].

The authors of this paper develop a spearphising detection technique that successfully detects attacks in a real set of enterprise email data. Their method combines both domain knowledge and data analysis. This paper won the 2017 Internet Defense Prize and stands out because spearphising attacks are a notoriously difficult attack to detect.

The main contributions of this paper are:

use of domain knowledge to derive features that target the various stages of spearphising
design of a DAS anomaly detection technique that doesn’t require labeled training data and works well with an unbalanced set of benign vs malicious data
proving real time detection of spearphising attacks with an acceptable false positive rate

What is Spearphising: Attack Taxonomy

Spearphising: Social engineering attack where attacker sends targeted deceptive email to trick recipient into performing some kind of dangerous activity

Spearphising is unusual from other security attacks in that it doesn't require a lot of technical sophistication. Attacks are hard to detect because emails are handcrafted. This paper looks at credential spearphising, a specific type of spearphising attack where the malicious email tries to convince the recipient into clicking a link and then entering their credentials. Victims of this attack generally possess some sort of privileged access, so the authors focus on detecting attacks within an enterprise setting.

Figure 1: An example of a spearphising attack, note that the email contains specific information (in this case email address) about the target [2]

The first portion of this paper goes into detail about the attack taxonomy of spearphising attacks which is vital to the success of their detection method. They break the attack down into two stages:

Lure: Convincing target to perform action, typically achieved by posing as trusted or authoritative figure
Exploit: Exploiting trust gained to induce target into performing a dangerous activity

The lure stage corresponds to the email itself, which convinces users to proceed to the exploit stage. The exploit stage involves the user clicking on the url the leads them to a malicious website and then tricking them into entering their credentials.

The authors further split the lure stage into three distinct types of impersonation models. Spearphising usually involves posing as a trusted figure and the impersonation models represent how the attacker goes about accomplishing this. Their detection method targets three types of impersonation:

Name Spoofer: The attacker poses as a real, trusted individual while using a fake/malicious email address
Previously Unseen: The attacker does not attempt to pose specifically as another person, but they may still be percieved as trustworthy
Lateral Attacker: The attacker sends the spearphising email from compromised user's email address

Security Model

The goals of this paper were to detect real credential spearphising attacks in an enterprise setting, without creating too many false alarms. They limit their detection to an alert budget of 10 alerts per day. This is because alerts need to be manually inspected by a security analyst, so it's important to minimize the number of alerts that the analyst has to go through while still being able to detect real attacks.

Data and Features

The data used in the spearphising detection mechanism was obtained from Lawrence Berkley National Labs. The dataset was completely anonymized as all the values in the logs were hashed. This means, for example, that the authors could not actually see an original email address but only it's hashed value. Since the same email address will have the same hash value, they are still able to determine if that address has previously sent emails. They use three different types of data in their detection method:

SMTP Logs: Emails sent to and from the organization between 3/1/2013 and 1/14/2017
NIDS Logs: Intrusion detection logs that provided information on which URLs were visited, if those URLs had previously been visited and which emails the URL corresponded to
LDAP Logs: Information of employee login including when, where and how often a user logs in

Based on this dataset and the attack taxonomy developed earlier, the authors customize specific characteristics that correspond to each type of spearphising attack. Figure 2 shows an overview of the charateristics.

Figure 2: Table showing how each type of spearphising the attack has specific features and characteristics used to detect it

A spearphising attack will comprise of two stages: the lure stage and the exploit stage. Therefore, the authors combine features from both to determine if a spearphising attack has occured. For the lure stage, they look specifically at sender reputation. The sender reputation will vary based on which impersonation model is being used.

For a name spoofing attack, the attacker wants to pose as a real person using a fake email address. In this case, we can expect that the name of the email is reputable, but the email address is not. In terms of features, they expect that the name used in this attack sends emails quite frequently but the address itself, which is only posing as the sender, will not appear frequently.
For a previously unseen attack, the attacker does not base their reputation on any known person. Therefore, it can be expected that the frequency of both the name and the email address are low.
For a lateral attack, the attacker gains access to a legitimate user accounts and uses that account to send their spearphising email. In this case, the authors look at LDAP logs for login information of the account sending the email. They expect that the location of login will be suspicious because of unauthorized use. Therefore, they look at how many other employees have logged into the same city and how often the sender logs in from the city.

Detecting the exploit stage requires looking at the domain reputation of the URL in the spearphising attack. The authors assume that the URL used in a spearphising attack will be relatively uncommon. Therefore, they use the NIDS logs to look for cases where the URL is not frequently visited by employees and the difference between the first recorded visit to the URL and when the email containing the URL arrived.

Detection Architecture

Using the features described earlier, the authors develop a detection mechanism that allows them to detect spearphising attacks in real time. The most important part of the detection mechanism is the algorithm they used for scoring each event and comparing features, called Directed Anomaly Scoring (DAS).

Figure 3: Directed anomaly scoring example [1]

When an event arrives, DAS ranks the event by comparing how suspicious it is relative to previous events. The event is represented by its feature vector which contains 2 domain reputations and 2 sender reputation features. These features correspond to the lure and exploit stages as described in the previous section. The event is scored for each of the 3 types of impersonation model since each impersonation model looks at different features. The event receives an anomaly score based on how many events it is more suspicious than. An event E is only as suspicious as another event E’ if all of its features are more suspicious than the features of event E’. DAS works well because the attack model for spearphising is so specific and it’s important to minimize the number of false positives. For example, even if the sender name for an event is extremely uncommon, the email will not be flagged if it contains a common URL. Figure 3 shows how DAS works scoring an event based on how suspicious it is compared to other events. As we can see, even though Event 1 has a “suspicious” score in one of its features, DAS only categorizes it as more suspicious than Event A because it is benign in the other feature.

DAS is used as the scoring mechanism in the overall detection architecture presented in Figure 4. There are three parts to the detection process:

Feature extraction: This is where the feature vectors are generated for each event. Each event will have 3 feature vectors corresponding to the different scores for each of the three impersonation models.
Nightly anomaly scoring: The purpose of this stage is to allow real-time comparison based on the past months’ worth of data. This way, events are evaluated for how suspicious they are to a larger timeline. Each night the detection system collects all the events from the past month and selects 30 x alert budget of the most suspicious events to develop a comparison set. The comparison set is used by the detection system for realtime generation.
Realtime alert generation: When an email with a URL arrives, the detection system computes it’s feature vector and evaluates it in respect to the comparison set. If the event is more suspicious than any of the events in the comparison set, an alert is generated.

Figure 4: Overview of the detection mechanism presented in the paper that generates realtime alerts[1]

Evaluation

The authors evaluated how their detection method performed on the dataset obtained from LBNL. This datset contained over 370 million emails. They evaluate their method based on performance (true positives) and time burden (false positives). Their goal is to detect all spearphising attacks in the dataset while maintaining a small number of alerts per day. They designate an alert budget of 10 alerts per day: 4 alerts for a name spoofing attack, 4 alerts for a previously unseen attack and 2 alerts for a lateral attacker (which they expect to occur less often). They establish 19 spearphising attacks within the dataset based on the incident database of the LBNL security team.

With a median of 7 alerts per day, their detection system successfully identified 17/19 of the attacks. This meets the goals originally set by the authors. Although their system missed 2 attacks, their detection method is a vast improvement for spearphising detection. They compared their detection model to several other machine learning models commonly used for anomaly detection. They found that other algorithms could find at most 4/19 of the spearphising attacks with a daily alert budget of 10 alerts. To achieve detection of 17/19 attacks, the other models needed an increased alert budget of more than 91-2,455 alerts per day. Maintaining a low alert budget is important because a security analyst is manually inspecting each alert.

Discussion and Thoughts

I really enjoyed this paper because it combined both machine learning/data analysis with security knowledge to develop a detection mechanism for an attack that was previously very difficult to intercept. The authors did a very good job of explaining their entire detection mechanism and how each part worked. Their idea was both simple and effective and could be applied easily in real world settings. As a security professional who has worked in security operations centers before, I appreciate that they took into account a low alert budget and real world constraints. Spearphising detection is so difficult because of the lack of data available. It would be interesting to see how this approach applies to other enterprise settings because of how specific the features are. For example, I work for a organization that has employees telecommuting from all over the United States. I imagine it would be hard to establish features for a lateral attacker because the location for login varies so greatly.

Discussion Questions

What are other security problems that would be a good fit with DAS? What sort of characteristics of a problem indicates that it would be better solved with DAS than other machine learning algorithms commonly used in anomaly detection?
Authors make it seem really easy to pick which features to look at, how much trial and error should we expect if we are using DAS? How much domain knowledge is needed?
What are some evasion strategies? How could we prevent poisoning? How to make DAS more robust?

Citations

[1] “Detecting Credential Spearphishing Attacks in Enterprise Settings”. G. Ho, A. Sharma, M. JAved, V. Paxson and D. Wagner. 2017 USENIX Security Symposium. https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-ho.pdf
[2] “What Is “Spear Phishing”, and How Does It Take Down Big Corporations?”. How-To-Geek. https://www.howtogeek.com/142635/htg-explains-what-spear-phishing-attacks-are-and-why-theyre-taking-down-big-corporations/