Back to our regularly scheduled program about... spam!

Sunday, 2006-09-10; 05:27:00

Pretty graphs showing the surge in spam e-mail over the last year.

Enough about .mac comments. They're implemented, whoop-de-do, enough of that. (Well, not really, but I'm sure that packet sniffing stuff wasn't all that interesting.) Back to our regularly scheduled supernovae.

Near the end of June last year, I started keeping track of the number of spam e-mails that I was getting at each of my e-mail accounts. (This was for a review on a spam reporting product, and I wanted to see if it really cut down on spam or not.) I only managed to keep doing this manually for about a month and a half before I stopped.

The only reason I really stopped was because I got lazy and didn't want to spend the two minutes to enter in the data to my Keynote graph. And then, of course, a week down the road that hurdle grew to ten minutes, and I was more inclined not to do it. But I have been consciously not deleting my spam e-mail folder since shortly after. I knew that sometime later I'd like to see the trend of spam to my e-mail boxes, and I finally ended up writing a quick little program to get all the data from the past year into another Keynote graph.

So, without further ado, here are the pretty, pretty graphs. Clicky clicky for larger pictures:

Total Spam Per Day

There's some interesting things to note. First, note the big discrepancy between the amount of spam that comes to both accounts: my .mac account gets significantly less spam. In the whole year, only last month had the level breached 20 spam e-mails per day. In contrast, my Stanford spam has regularly been above 45 spam e-mails per day, and recently breached the 80 e-mails per day barrier, also last month.

What's kind of weird is the fact that I should probably be getting more spam at my .mac account. I've had my e-mail address since January of 2000 when iTools came out, while I've had my Stanford address since September 2001. Furthermore, I don't nearly give my address out as much as I do my address. Wherever I go on the net, when I register for forums or for websites or for promotions or whatever, it's almost always using my address.

There are three potential reasons why my Stanford address gets more spam. First is that, until a while ago, my e-mail address was publicly available via the Stanford people database. I actually never realized that it was in there, until one of my friends suggested that universities often do that. I checked and, sure enough, my e-mail address was available. (Thanks to one simple, well-crafted Spotlight search and iChat logging, I can pinpoint to within 5 minutes when I made that realization: February 6, 2005, between 1:20 and 1:25 AM.)

That's actually before this graph even started, and as you can see, removing my e-mail address from the public database hasn't stopped the surge in spam. The second reason is that big universities like Stanford have a lot of students, and so spammers likely just step through potential e-mail addresses that would go to the university, even if they weren't valid. Since my e-mail address username has only three letters, it would have a very high likelyhood of falling under such a net. Damn, and I thought I was smart for snagging such a short e-mail address. :P

The last reason, which is probably the most likely, is the fact that Apple probably completely screens out some spam e-mails. That is, some spam e-mails never get to me. Stanford uses a different system: they prefix the subject of potential spam e-mails with "[SPAM:#####]", where the number of hashes indicates the likelihood of it being spam. However, they don't prevent any e-mail from coming to you.

Here are the graphs that I made last August. They're kind of interesting too.

Total Spam Per Day
Spam Separated by Account
5-day Running Average of Spam E-mails Per Day

You can see that Stanford spam still comprised the majority of my spam. The start of these graphs mark when I started using the spam reporting program (Sp@mX, for those who are curious -- I really hate that at symbol in the official name). The program doesn't attempt to screen out spam, it tries to actively prevent it: it analyzes the headers of the spam e-mails, and then sends e-mails to the abuse addresses of the appropriate servers so they can shut down the spamming account. In the one-and-a-half months that comprise the period of these graphs, it (surprisingly) seems to have been effective. The second graph breaks the numbers out by account again, which is especially important here: it measures the backlash effect from the spam reporting -- if the server happens to condone the spamming, then it's possible that you'll get more spam as a result of reporting it. In this period, it seems that this effect was nonexistent.

The last graph is really interesting because it smooths out the daily fluctuations, so you can really see the downward trend while using the program. It started to pick up again at the end for unknown reasons, and unfortunately the data doesn't extend much farther past that to see if it was really a trend or a longer monthly fluctuation.

I'm going to start using this program again, seeing as it'd be nice to get my spam down to the levels in these last graphs.

Technological Supernova   Rants   Older   Newer   Post a Comment