Here's the deal - in spite of any anonymization there is a session created between points with associated (even if encrypted) traffic. Such points as libraries etc. are "anonymized" but significant in terms of time-of-day, length of session, and type of content (such as whether encrypted or not), and if plaintext, such things as keywords (even if attempts are mede to make innocuous).
But the greater skein of connectivity shows 98% are easily put into a framework of contacts, with context, whether news stories, emails, websites of various types, etc.
You don't need 100% coverage -- with partial capture you can fill in some of the blanks. With greater capture you can fill in more of the blanks. The more blanks filled in the easier to fill in the rest, etc.
To say NSA is overwhelmed is ... not true in the usual sense. Even Bamford says several generations ago specific equipment could monitor a million or more simultaneous streams, and many such units a major part of the country.
Only a fraction need be scanned, in any event. And with digitial ascii data it's orders of magnitude faster.
There are other issues that make it easier also, which I'll get into when I get back from some weekend chores ... |