Little brother, a script for analysing who accesses your website

Tags: programming, projects

Published on
« Previous post: Creating heat maps of git commits — Next post: Postincrement vs. preincrement in C++ »

The German Überwach project aims at logging the access of certain governmental institutions, such as the German intelligence service. In an attempt to answer the question Quis custodiet ipsos custodes? with Me, of course, I desired to roll out a log file analysis tool—I wanted to find out if there are any computers of interested that access my web server. Starting with Wikipedia’s list of sensitive IP addresses, I quickly obtained a nice collection of candidates. The result is Little Brother, a small Python script for checking if an Apache log file contains IP addresses from a predefined list of networks.

Usage

The data format is straightforward: Each non-empty line shall contain an IPv4 address or an IPv4 network specification and a description. These two fields shall be separated by at least one whitespace character. For example:

156.33.0.0/16       United States Senate
138.162.0.0/16      United States Department of the Navy and United States Marine Corps

The script is able to scan an Apache log file, or any log file that starts with a valid IPv4 address. The following lines will be parsed correctly, for example:

192.0.2.0.42 - - [01/Jan/2015:04:04:04 +0200] "GET / HTTP/1.1" 200 3834347 "-" "Foo"
192.0.2.0.23 - - [01/Jan/2015:05:05:05 +0200] "GET / HTTP/1.1" 200 3834347 "-" "Bar"
192.0.2.0.5  Random information that is going to be ignored anway 

A full analysis session works like this:

$ ./lb.py test.log IP_networks.txt
Counted 1 visits from 192.0.2.0.1 (TEST-NET-1)
Counted 2 visits from 198.51.100.2 (TEST-NET-2)
Counted 3 visits from 203.0.113.3 (TEST-NET-3)

Real example

As it turns out, there are indeed some interesting IP addresses in the server logs for my personal website. Here is an excerpt of some real data from the last month:

Counted 17 visits from 131.136.242.1 (Canadian Department of National Defence)
Counted 11 visits from 138.162.0.41 (United States Department of the Navy and United States Marine Corps)
Counted 13 visits from 216.81.81.84 (United States Department of Homeland Security)

Apparently, I have some sort of following in the military and the DHS. I feel strangely honoured and promise that I will remain as moto as possible.

Code

The code is released under the “MIT Licence”. You may download Little Brother from its git repository.

Seeing that the “USMC” is indeed visiting my website, I feel that there is only one way to end the post:

Oorah!