Automated Attack Community Graph Construction

Hugo Gascon

Abstract

The goal of this project is to implement a Splunk application that can be deployed on a central server to automatically generate community attack graphs from a set of honeypot sources distributed across networks. An attack graph is a collection of scenarios showing how a malicious agent can compromise the integrity of a target system. When built from a wide range of sensors, it can provide a comprehensive view of attackers behavior at a large scale.

Additional Information

The final goal of this project is to implement a Splunk application (http://www.splunk.com/)  that can be deployed on a central server to generate community attack graphs from a set of honeypot sources distributed across networks. An attack graph is a collection of scenarios showing how a malicious agent can compromise the integrity of a target system. As different types of honeypots provide precise information about attackers actions against different services, attack graphs built from a wide range of sensors can provide a comprehensive view of attackers behavior at a large scale. With a suitable and centralized architecture, we can use graph theory techniques to generate attack graphs automatically. In the typical case, nodes represent network states and edges represent attack actions.

For a better understanding of the complete system behavior, let’s picture it as a process:

1. A lightweight version of Splunk is installed on every available honeypot source. In Splunk jargon this is called a “forwarder” and its only function is to monitor the specific log file or set of files created from attack traces and forward the data across the network to a central Splunk component called “indexer”.

2. At the “indexer”, two tasks are carried out: parsing and indexing. Parsing, also called “event processing”, involves processing the forwarded data from sensors. This tasks includes text-based file processing and heavy use of regular expressions. As different honeypot sources produce different file formats, the “event processing” phase is able to homogenize all attack traces according to similar parameters and relationships so that they can be later correctly indexed.

3. Once all information is indexed, the analytic part, implemented in python as part of the Splunk application, is able to perform the right searches and, relying on the NetworkX library and community finding algorithms, construct an attack graph.

4. The attack graph and its representations according to various clustering strategies and source selections is displayed through the application GUI providing the analyst with a tool to understand different attack scenarios, their risk and probability, at a large scale.

Code samples