Distributed Early Worm Detection Based on Payload Histograms
Y. Waizumi, M. Tsuji, H. Tsunoda, N. Ansari, and Y. Nemoto. ICC 2007, 24-28 June 2007 Page(s):1404 - 1408.
DOI: 10.1109/ICC.2007.236
Abstract
Epidemic worms has become a social problem owing to their potency in paralyzing the Internet, thus affecting our way of life. Recent researches have pointed out that epidemic worms can propagate similar payloads rapidly. It was shown that it is possible to evaluate similarities between these payloads in terms of a 256-dimensional vector based on histograms of the appearance frequencies of 256 character codes. This observation has also been confirmed by our earlier works. However, this method, if applied to flows from only one network, which means a network managed by an independent organization, is prone to a high rate of false positives in cases such as when normal emails are sent through a mailing list. To overcome this problem, we propose a new scheme which checks for any similarity between flows detected at several IDSs in a distributed environment. The proposed scheme is based on the fact that normal payloads propagating from different networks are different, whereas in the case of epidemic worms payloads even propagated through different networks but generated by the same worm exhibit similarity. We have demonstrated the effectiveness of the proposed scheme through extensive experiments using real network traffic that contains worms.
Annotations
This paper proposed a distributed worm detection system which can detect worms fast and accurately. It evaluates similarities between flow payloads based on histograms of the appearance frequencies of 256 character codes.
Prerequisites:
(1) Flow payloads ( a flow payload is the aggregated payload of all packets in a TCP connection) generated by the same worm are similar.
(2) Normal flows which have similar payloads are not likely generated from different networks at the same time, whereas similar payloads from the same kind of worms can be observed over different networks.
Evaluate Payload Similarity based on the Similarity of Code Histograms: consider a flow payload as a set of 8-
bit codes. Occurrence frequencies of 256 codes expressed as a vector h is shown:
h = (h0, h1, h2, · · · , h255)
Flows with payload histograms deviated from the normal
payload histogram are considered anomalous.
- Histograms of worm and normal flows.:
- Distances between h vectors among worm–worm, worm–normal, normal-normal flows:
I am thinking maybe we can use a similar idea to show the similarity between repetitive substrings got from different tap traffic.