Blog

Hunting torrent machines with network analytics

Sep 12, 2017 12:11:37 PM / by Corey Thuen

Our previous post highlighted some new features in our relationship graphing capabilities by analyzing text json data from Reddit. Here we swing the pendulum and examine an area many IT departments struggle with -- low level binary network packet searching and analysis. In this case study, we work with a customer who is trying to identify bittorrent users on an active network by using Gravwell network analytics capabilities.

 

The problem

 

Corporate documents were discovered doing some Open Source Intelligence on P2P sharing networks. Security personnel hypothesize that a machine on the network is accessing and leaking data in a way that violates company security policy. Gravwell has been ingesting network traffic (pcaps) for a day and is poised to provide a rapid answer to the question.

 

The Gravwell Solution

 

Encrypted bittorrent traffic makes it difficult to look for signatures in the protocol itself so instead we have to turn to network flow analysis. Bittorrent downloads files in an atypical manner. Instead of connecting to a single server and downloading a file, bittorrent connects to a large number of peers and downloads small pieces from each peer. With this understanding, we can identify the metacharacteristics of bittorrent transfers by searching for the client in the peer discovery phase when it attempts to communicate with a large number of peers in a short period of time.

 

The customer settled on the following query to isolate and view the number of new connections that are coming out of the machines on their network based on the network addresses and ports used.

 

image1.png

tag=pcap packet ipv4.DstIP !~ 10.10.10.0/24 ipv4.SrcIP tcp.DstPort | count by SrcIP,DstIP,DstPort | count by SrcIP,DstIP,DstPort | chart count by SrcIP limit 20

 

The search results make it easy to see the bittorrent traffic and identify the machine responsible.

 

Further insights can be gained by zooming in on the area of interest. This search can even single out the peer discovery and file transfer aspects to the bittorrent protocol.

image2.png

 

This query solved the problem but the customer went on to refine the query for use on a dashboard display. By charting on the statistical variance of these connections instead of the unique count, it further isolates systems that are rapidly creating new connections to new network addresses.

 

image3.png

tag=pcap packet ipv4.DstIP !~ 10.10.10.0/24 ipv4.SrcIP tcp.DstPort | count by SrcIP,DstIP,DstPort | variance count by SrcIP | chart variance by SrcIP limit 20

 

Query Breakdown

 

The query

 

Let’s breakdown the search query into parts and talk about why each part is used. If you are following along doing your own network analytics, you could issue a search (adding one query module at a time) to follow along with the results at each step.

 

Query Module

Purpose

tag=pcap

This customer puts all pcap traffic into a separate Well. This improves performance because searches that do not include the pcap tag do not read those records for analysis. Gravwell best practices use tags as the first “filter” against different data types.

packet ipv4.DstIP !~ 10.10.10.0/24 ipv4.SrcIP tcp.DstPort

This invokes the packet module, telling the Gravwell pipeline to do packet processing on these records. Remember, all Gravwell processing is done at search time, not at ingest time, so you don’t have to know what you wanted to search before hand.


Packet pulls out the destination IP and applies a filter to specify that only records with a destination IP not within the customer's network should be included. The customer is looking for hosts communicating outside their subnet.


Then they direct the packet module to also extract the Source IP and Destination port as enumerated values and put them in the pipeline. This “trifecta” of network packet information is a great key for identifying unique connections.

count by SrcIP,DstIP,DstPort

The count module is used here to condense all packets based on the network trifecta of Source IP, Destination IP, and Destination Port. The pipeline results include the trifecta and a number of times packets from that connection show up.

count by SrcIP,DstIP,DstPort

The count module is invoked again to get a “unique” view for charting. The customer isn’t interested in the number of packets on a given trifecta, we are interested in how many new connections are occurring at a time.

Chart count by SrcIP limit 20

Finally we hook the pipeline up to the chart renderer. For every Source IP address (host on our network) we want to graph the number of unique “trifecta” network connections that exist.

 

The entire query together serves to highlight machines on the network that are making a large number of new outbound connections in a short period of time.

 

Adding Statistical Variance

 

The customer then modified the query to use:

 

Query Module

Purpose

tag=pcap packet ipv4.DstIP !~ 10.10.10.0/24 ipv4.SrcIP tcp.DstPort | count by SrcIP,DstIP,DstPort

The first part of the query is the same. They want the number of network “trifectas”.

variance count by SrcIP

By invoking the “variance” module they do statistical variance on the number of new network trifecta connections in a timespan. A large change in this count will impact the variance.

chart variance by SrcIP limit 20

Finally, the variance is charted based on the source address, as before.

 

Final Thoughts

 

This use case is a great example of the binary data analytics power that Gravwell can bring to the table. Before running this query, the customer had no idea they would ever want to look for bittorrent machines, nor did Gravwell have to know what bittorrent was. No analysis was required at time of ingest which is one of the reasons why we made Gravwell. In this instance, using only packet metadata, the customer was able to resolve their problem in only 20 minutes.

 

What we might change

 

Torrent traffic stands our like a sore thumb on graphs of this kind. If you have a large number of users all actively browsing the web, even legitimate traffic could sometimes appear as if it is torrent traffic (e.g. a single website might load images from a large number of different servers). By adding in a filter on the DstPort you could remove this potential noise.

 

You could do this with the packet module or using basic grep on the enumerated values from the query

packet ipv4.DstPort != 80 | packet ipv4.DstPort != 443

Or

grep -v -e DstPort 443 | grep -v -e DstPort 80

 

Try Gravwell Yourself

 

If you are interested in seeing the power that Gravwell can provide for your hunt teams, contact us at info@gravwell.io for more information or apply for a trial license.

Request Sandbox Trial

Topics: Network Analytics, Case study

Corey Thuen

Written by Corey Thuen

Co-Founder of Gravwell