Visualizing a security disaster, part I

Tags: research, security, programming

Published on
« Previous post: Fixing fullscreen flash video playback … — Next post: Making logrotate and Webalizer play nice »

The recent security blunder by Adobe (definitely an epic on the fail scale) should have reached everyone by now. In case the consequences are still not clear, I decided to visualize parts of the leaked database dump. More precisely, I wanted to get a sense of the password hints used by people in order to see if some trends emerged. I started out by simply counting the frequencies of words used in the password hints–not removing anything stop words, just a simple frequency analysis. Here’s the (very simple) Python script:

#!/usr/bin/env python

import re
import collections

word_frequencies = collections.Counter()

with open( "cred" ) as f:
  for line in f:
    line   = line.strip()
    line   = line[:-3]

    fields =  re.split( '-\|-', line )

    if len( fields ) >= 5 and fields[4]:
      for word in fields[4].split():
        word_frequencies[ word.lower() ] += 1

for pair in word_frequencies.most_common():
  print pair[0], "\t", pair[1]

Afterwards, I used wordle to generate a word cloud of the password hints. This is the result (click the image for a larger version):

Not surprisingly, people seem to use names very often when generating passwords. This is why spouses are also mentioned very often, along with pets. Interestingly, dog was mentioned more often than cat. Some lone number either refer to a password scheme, i.e. combining passwords 1 and 2 to form a longer one, or to the actual password–I did not check this for the same reason I am not releasing any data other than the word cloud. Those poor users already have enough problems as it is, they do not need one more idiot (yours truly) trying to guess stuff about them.

There are some appalling things, though: First, note how often the word usual appears. This is not a good idea, people! If you use the same “usual” password for multiple services, all it takes is one weak link and attackers will be able to compromise large parts of your digital life, and probably at least some parts of your real life, as well. Furthermore, there are obviously still many people who believe that birthdays or social security numbers are good passwords. In short, they don’t. In fact, professional attackers will check those low-hanging fruits first. Checking names for a hubby, a cat, a dog, or the date of a birthday takes tremendously less time than checking all possible combination in a, say, 52-character alphabet.

The security community thus obviously still has some work to do here. First, one should start with explaining people how to choose secure passwords. Second, incompetence of this scale needs to be punished. To clarify: The fact that Adobe’s data got copied does not bother me as much as their stupidity in storing it! I again refer you to this very enlightening article about the errors in storing passwords (and password hints) the way Adobe did. This is inexcusable for a company this large.