31 December, 2004

Summat amusing (White House Paranoia remix)

I was poking around on Bruce Schneier's website, which eventually led me (through a convoluted chain) to the White House website's robots.txt file. For the unlettered, robots.txt is a file served by most web sites that contains instructions to automatic web-crawlers on what files within the site they should and should not index. Spiders (web-crawlers) are generally used by search-engines (e.g. Google) and web archives (e.g. archive.org's Wayback Machine) to trawl the internet and keep track of what's what. Spiders are of course free to disregard the instructions in robots.txt, but they're often useful for both parties and almost all spiders will conform to the suggestions as a courtesy to the site. A good number of sites also use robots.txt to keep files from being cached (e.g. by Google).

Case in point, the White House, which has apparently gone to the trouble of excluding every page on the site that might contain text transcripts or might mention Iraq by postfixing 'iraq' or 'text' to every damn page on their site, whether or not the indicated page actually exists. This precludes the slightest possibility of anyone caching anything that even mentions Iraq, to prevent embarassing incidents of your words coming back to bite you on the ass*. This produces some pretty comical results, viz. Barney the Dog's site:
Disallow:	/barney/iraq
Et tu, Barney?

* Yes, Thomas Friedman gets brownie points for this awesome bitch-slap maneuver, but he's still a weenie.

This page is powered by Blogger. Isn't yours?