24 December, 2005

Form, not content

When I first started using e-mail in the early 90s, I knew people who had gotten Pretty Good Privacy encryption software, better known as PGP, and who insisted on sending their messages encrypted. But I quickly realized that the headers were unencrypted -- anyone who could tap a message could still see who was sending what to whom and with what frequency. It seemed to me that knowledge of networks was at least as important as knowledge of the specific words transmitted around those networks. If anything, I told my friends, PGP was a good way to attract attention; there's no better way to silence a room than whispering.

It turns out I was in smart if ignoble company. The New York Times reports Saturday that the National Security Administration spying that has gotten some of us atwitter this week was un-search-warrantable because it was not literally listening in on specific conversations so much as developing network maps.
Officials in the government and the telecommunications industry who have knowledge of parts of the program say the N.S.A. has sought to analyze communications patterns to glean clues from details like who is calling whom, how long a phone call lasts and what time of day it is made, and the origins and destinations of phone calls and e-mail messages. Calls to and from Afghanistan, for instance, are known to have been of particular interest to the N.S.A. since the Sept. 11 attacks, the officials said.

This so-called "pattern analysis" on calls within the United States would, in many circumstances, require a court warrant if the government wanted to trace who calls whom.
The NSA pulled this stunt by working with the telephone companies to get access to the main switches.
A former technology manager at a major telecommunications company said that since the Sept. 11 attacks, the leading companies in the industry have been storing information on calling patterns and giving it to the federal government to aid in tracking possible terrorists.

"All that data is mined with the cooperation of the government and shared with them, and since 9/11, there's been much more active involvement in that area," said the former manager, a telecommunications expert who did not want his name or that of his former company used because of concern about revealing trade secrets.

Such information often proves just as valuable to the government as eavesdropping on the calls themselves, the former manager said.

"If they get content, that's useful to them too, but the real plum is going to be the transaction data and the traffic analysis," he said. "Massive amounts of traffic analysis information - who is calling whom, who is in Osama Bin Laden's circle of family and friends - is used to identify lines of communication that are then given closer scrutiny."

POTS phone calls are not the only place where someone with access to the pipes can assemble a network map of human relationships. E-mail would be even easier. With access to the billing systems at the VoIP providers, someone could include those calls in a map. And on the Web, it is possible to map who goes to what websites, thanks especially to systems like the now-retired Carnivore and whatever secret system replaced it.

The result of all this data could be a gigantic Friendster-style map of who knows whom -- but far better than Friendster or MySpace or Tribe, in that it could recognize one-way relationships as well as quantify the frequency of contacts, the growth rate in contact frequency, and whether contacts seem anomalous (like frequent 3 a.m. phone calls within a time zone, for example).

Of course, it's hard enough to map out MySpace. Mapping all electronic contacts? Sounds like that could cause someone to be late for lunch. Is that where the tens of billions of classified dollars go in the intelligence budget? Perhaps my favorite social networking maven could find out from her new pals at the CIA.

Update: The California state constitution offers a right to privacy. I wonder if a resident of that state could sue the phone company for giving the NSA access to the switch logs.

Update II, the sequel: I'm surprised at how few analysts grasp the notion of social networking. AmericaBlog, which I usually find quite observant (even if I think the authors have a sweetly but dangerously benign image of the USA) is spreading the ridiculous notion that the NSA was taping every conversation in the country. And the usually incisive William Arkin, figures the NSA was looking for suspicious patterns -- like if someone is suddenly making lots of phone calls to Pakistan. That's possible, but it seems like it would suffer from the classic problem of adding dots to be connected, rather than connecting those dots already known. I think it's more likely that the spy campaign was aimed at mapping social networks.


Back in the day when it first came out, lots of leftist friends of mine were appalled at how revealing Friendster was about activist networks and consequently refused to sign up for the service, convinced of its utility to the FBI. I don't doubt they're using that information; I'd almost be wroth if they weren't, it's so obvious. 

Posted by saurabh

Interesting note about the CA constitution.

Actually, Saurabh, I was hoping you could provide some insight into the tractability of the problem. My memory is very foggy, but aren't graphs of networks very difficult to mine and analyze efficiently, unscalably so? The travelling saleperson problem comes to mind. . ..

If everyone's friendster profile winds up as neglected and unkempt as mine, then the FBI is getting wacky information. Random testimonials from people I know almost nothing about mask deep relationships with people I talk to almost every day. The same with email patterns---it seems like some people's practice of mass mailing and replying-to-all would thrown a bit of noise into the data.

It seems less easy to hide the patterns in telephone calls, on the other hand.


Posted by Saheli

Actually, traveling salesman problems are impossible to solve in a mathematical sense, but getting reasonably fast, "good" solutions via greedy algorithms is not that difficult. It won't be the BEST, and finding that is hard, but it will be good enough.

Some kind of graph alignment problems are really difficult, but the sort of stuff needed to do this kind of thing is relatively simple - neighborhoods, adjacency, etc. Actually my first paper did something more or less like this.

As with most data-mining or experimentation more generally, the really informative stuff will come from combining relatively independent measures. I.e. overlaying different sorts of contacts (e.g. phone contacts & Friendster contacts) should be extremely informative. But only for people like us... I really doubt al-Qaeda uses Friendster. 

Posted by saurabh

sorry about my href's..

1. you can avoid traffic analysis by sending emails using mixmaster . see also invisiblog.

in principle, we should support these networks by using them even if we have nothing to hide - otherwise merely using these kinds of strong privacy would raise attention.

2. yeah, travelling salesman is overkill. in general, specifying exaactly what they're looking for would yield an intractable problem, but there are lots of good heuristics, especially since the data is pretty noisy anyway. for example, here a paper on link prediction

Posted by aram

I don't think that the government needs to invest in mapping social networks or myspace, I think private industry itself will create these mapping tools. Google's greatest asset is the data that it mines, and Google is hard at work extracting value from this data, in the form of pattern recognition, and whatnot. As this technology becomes less expensive, the government could then use these tools. 

Posted by echan

This page is powered by Blogger. Isn't yours?