Monday, June 6, 2011

On a different type of security entirely

I haven't been paying much attention to recently (first I went for an abortive hike of the Pacific Crest Trail, then I traveled around with just an iPhone for a month, and I can tell you that attempting to manage a database and writing code on an iPhone is not fun). While I was "away" the site started getting spam—posts like
I'm happy very good site Free Forbidden Lolita Pics :[ Lolita The Little Girl 256 Nudist Picture Lolita Girls =)) Nude Lolita Model Index xkjxun Lolita Bbs 10 Yo 3160 Preteen Lolitas Non Nude 8PP Nude Lolitas Modeling Toplist mdqoe Www Majic Lolita Com :-))) 3d Lolita Incest Toons tmdh Fotos Lolitas Dildos Machine :-D Young Girls Art Lolitas xpjzsk Small Lolita Sex Pics vozylh Lolita Preteen Pedo Pics %[[ Little Lolitas Russian Naked bhq Nn Preteen Lolita Models 185847 Lolitas Models Sample Videos qqalx Baby Dorki Little Lolitas >:P Hard Lolit Sex Young >:OO Loli Preten Pussy Pics roe Best Lolita Free Pics %-(((
with a lot of links (which I deleted here of course) became quite common. And if you follow the site regularly, you probably didn't notice. Why? Well, spammers (bots) aren't all that smart (although they do seem to read Nabokov), so while they we able to get the posts to post (I have no anti-spam protection whatsoever) they weren't able to get them to populate many (really, hardly any) of the airport specific pages. Why? Well, they didn't know that several of the fields needed rather specific information to show up in any of the queries, namely, they needed a valid 3-digit IATA airport code.

Now, if I knew how to code, and wanted to exclude international airports, I could enter those codes (there are only 382 airports in the country with more than 10,000 passengers annually—or 30 per day) and exclude others. But I don't—especially traveling around with an iPhone and little else. But, instead, the spam was filtered out, because using only the 26 A-Z letters, there are 17,576 possible three-letter codes, and using numbers, too, there are 46,656. So assuming the codes generated were random, one in every 46 to 182 would have been for a valid airport, and many of those would have been a tiny airports which only have a few flights per day. In other words, the spammers weren't flooding the pages for ATL, ORD and SFO.

The all statuses page did have a bunch of junk, but the spammers also didn't understand how to enter a current date, so many of the entries were for January 1, 2010, meaning they wound up at the bottom of the page. So unless you were (like me) obsessively scanning through that page, you didn't notice it.

Of course, once I got around to it, I spent more time logging in to the admin page for the database than it took to look up the MySQL to search for text within a field and delete the offending entries (how did I do it? I searched for any "notes" with the text "<a href"—there's no point in spamming without links) and exorcise them. I'm planning to find some code to disallow any post with html tags embedded (or, at least, anything with a "<" in it), but for now this works as well.

And thanks to everyone who is submitting real entries. Keep 'em coming!