DDoS attack is perpetrated by one or more compromised systems controlled by an attacker to flood a predetermined target using series of malformed or malicious packets that overwhelm the allocated resources
application-bug level attacks and infrastructural level attacks:
Exmple of application-bug level: ping-of-death attack. Attachkers use a ping packet size of 65,535 bytes which exceeds the maximum IPv4 packet size. When most modern operating systems try to handle such packets, they generally
freeze, crash or reboot due to buffer overflow.
Exmple of infrastructural level attacks: TCP SYN flood. Attachkers send a lot of syn packets but don’t response to ack packets. As a result, the servers don’t know if syn+ack processes are successed or not and will re-try 5 times normally, which results in a huge consumption of resources.
Here’s a list of DDoS attacks, features and tools.
Typical DDoS detection techniques classify packet traffic as either legitimate or malicious, and can be broadly categorized into signature based, anomaly based and hybrid. Signature based detection techniques use a set of rules obtained with domain knowledge. Anomaly based detection techniques are based on the assumption that most instance are normal and only few are abnormal. We’ll talk about Anomaly based detection methods in details in the following paragraphs.
The anomaly approach is typically carried out in two phases, namely: training and detection phases. In the training phases, each instance of the input data can be represented as a sequence of features, such that the data can be transformed to structured data. Another important aspect of the input data is that they should be labelled. There are mainly three types of methods: supervised, semi-supervised and unsupervised. The most commonly used methods are supervised methods becuase they’re the easilest. Most ready-to-use methods are supervised methods such as SVM, random forest, logistic regression. However, supervised learning needs sufficient labelld data which is not always the actual status for many problems. Semi-supervised learning can be used when there’re not sufficient labelled data. It can train the prediction model not only by considering the differences between positive instances and negative instance in the labelled data, but also considering the similarities between instances in limited labelled data and sufficient unlabelled data. At last, when there’re not labelled data at all, unsupervised methods can be used to do the classification based on the assumption that normal instances are significantly more frequent than anomalies in the data.
Take KDD’99 dateset as an example. KDD 99 is a dataset proposed in KDD conference for a competition of intrusion detection. In KDD 99 there are more than 5,000,000 connection records. Each record was labelled as normal or attack. The specific types of attacks were also labelled including DDos, R2L, U2R, and Probing. The features for each instance can be devided into three types including basic features of individual TCP connections, content features within a connection suggested by domain knowledge, traffic features computed using a two second time window. With these features and labels, a classifier can be easily trained. However, I have several doubts about the machine learning based detection methods, which are discussed in the last section.
In the paper ‘Distributed denial of service (DDoS) resilience in cloud: Review and conceptual cloud DDoS mitigation framework’, the authors group existing techniques into different “ring” classes based on the algorithms used. The 5 classes are machine learning, statistical, data mining, classifier, artificial intelligence. However in my point of view, this setting of classes doesn’t make sense because they are not the same level of methods. For example, data mining can be seen as the application of machine learning, and classifier is a part of machine learning methods.
CS/ECE 5584: Network Security, Fall 2017, Ning Zhang