Hashes--- Music Industry Unveils Tracking Methods Wed Aug 27, 5:09 PM ET Add Technology - AP to My Yahoo!
By TED BRIDIS, AP Technology Writer
WASHINGTON - The recording industry provided its most detailed glimpse to date Wednesday into some of the detective-style techniques it has employed as part of its secretive campaign to cripple music piracy over the Internet.
The disclosures were included in court papers filed against a Brooklyn woman fighting efforts to identify her for allegedly sharing nearly 1,000 songs over the Internet. The recording industry disputed her defense that songs on her family's computer were from compact discs she had legally purchased.
Using a surprisingly astute technical procedure, the Recording Industry Association of America (news - web sites) examined song files on the woman's computer and traced their digital fingerprints back to the former Napster (news - web sites) file-sharing service, which shut down in 2001 after a court ruled it violated copyright laws.
The RIAA, the trade group for the largest record labels, said it also found other hidden evidence inside the woman's music files suggesting the songs were recorded by other people and distributed across the Internet.
Comparing the Brooklyn woman to a shoplifter, the RIAA told U.S. Magistrate John M. Facciola that she was "not an innocent or accidental infringer" and described her lawyer's claims otherwise as "shockingly misleading." The RIAA papers were filed in Washington overnight Tuesday and made available by the court Wednesday.
The woman's lawyer, Daniel N. Ballard of Sacramento, Calif., said the music industry's latest argument was "merely a smokescreen to divert attention" from the related issue of whether her Internet provider, Verizon Internet Services Inc., must turn over her identity under a copyright subpoena.
"You cannot bypass people's constitutional rights to privacy, due process and anonymous association to identify an alleged infringer," Ballard said.
Ballard has asked the court to delay any ruling for two weeks while he prepares detailed arguments, and he noted that his client — identified only as "nycfashiongirl" — has already removed the file-sharing software from her family's computer.
The RIAA accused "nycfashiongirl" of offering more than 900 songs by the Rolling Stones, U2, Michael Jackson and others for illegal download, along with 200 other computer files that included at least one full-length movie, "Pretty Woman."
The RIAA's latest court papers describe in unprecedented detail some sophisticated forensic techniques used by its investigators. These disclosures were even more detailed than answers the RIAA provided weeks ago at the request of Sen. Norm Coleman, R-Minn., who has promised hearings into the industry's use of copyright subpoenas to track downloaders.
For example, the industry disclosed its use of a library of digital fingerprints, called "hashes," that it said can uniquely identify MP3 music files that had been traded on the Napster service as far back as May 2000. Examining hashes is commonly used by the FBI (news - web sites) and other computer investigators in hacker cases.
By comparing the fingerprints of music files on a person's computer against its library, the RIAA believes it can determine in some cases whether someone recorded a song from a legally purchased CD or downloaded it from someone else over the Internet.
rest at story.news.yahoo.com ======================== So what are HASHES: ========================= from KarenWare (Long Good article at: karenware.com
<snip> talks about a compression algorithm from RSAS founder-.....
Hashes and Fingerprints So, where does the fabulous MD5 algorithm fit? Is it loss-less? Or is it lossy?
Actually, it's neither. Instead, it belongs to a group of compression algorithms known as "hashes." But they might as well be called "total loss" algorithms. That's because data, once compressed using one of these algorithms, cannot be recovered. That's right. These hashes have no corresponding decompression algorithm!
Yipes! Clearly, hashes aren't the best ways to compress your favorite picture or song. And they're a really poor choice when you need to save space storing your valuable programs, documents, or accounting data. So, what good are they? After all, if you wanted to lose data, you could just buy a shredder or eraser.
Before you decide that hashes are useless, let's take a closer look at how they work. Like all hashes, MD5 compresses files into a fixed size, regardless of the file's original size. You can think of hashes as a machine that converts a file of any size into a binary number of a fixed size.
Now here's the magic. This number can be thought of as the file's "digital fingerprint," a number that uniquely identifies the original file, without containing any of the file's data!
Now, fingerprints are wonderful things. No two people have the same fingerprints. And fingerprint copies are small enough to be kept on file. These two facts allow forensic scientists to positively identify a person, or tell two people apart, just by viewing their fingerprints.
If files had fingerprints, that would come in handy too. No need to compare, bit-by-bit, two million-byte files to see if they contain the same information. Instead, just compare their much smaller fingerprints. If the fingerprints are identical, the files are too. If the fingerprints differ, by just one bit, the files must differ too.
Compare a file's fingerprint today, with one computed hours, days, or even years ago. If the fingerprints are the same, the file has not been changed. Want to know if a friend's copy of your file is intact? Send him your file's fingerprint. If it matches the fingerprint of his copy of the file, all's well. If the two fingerprints differ, something's gone awry.
There's no doubt about it. Good digital fingerprints are useful. But how good is the MD5 digital fingerprint? Is each file's MD5 hash value really unique? Or might two files have the same hash value, what computer scientists call a "collision?"
Collisions No hash algorithm is collision-free. It's always possible, at least in theory, to find two different files with exactly the same digital fingerprint. But MD5 is a what's known as a secure, or cryptographic, hash. In the words of the experts, it's "strongly collision-free." In words you and I might use, it's *very* unlikely two files will ever have the same MD5 hash value.
How unlikely? Let's perform an experiment. We'll have each of the world's 6 billion people sit at their computers, and begin creating disk files. Feel free to type anything you like into each file. Just be sure that each file is unique -- unlike any other file ever created.
To complete the experiment in a reasonable time, let's have each person make 1,000 different disk files every second. They'll need to compute each file's MD5 hash value too. There's no time for sleep. We'll work 24 hours each day, 365 days a year. To obtain accurate results, let's run the experiment for, say, 1 million years ...
[One million years later] All done? Whew! I'm tired! And I'll bet you're tired too. But been worth it. All our hard work has paid off, and 189,345,600,000,000,000,000,000 unique files have been born! And 189,345,600,000,000,000,000,000 MD5 hash values have been computed.
This looks like a very big number, doesn't it? And it is too, in some circles. But there are 340,282,366,920,938,463,463,374,607,431,768,211,455 possible MD5 hash values! That's because there are that many different 128-bit binary numbers.
So even after 1 million years of frantic, round-the-clock file making, less than 1 in 1,797,000,000,000,000 of the possible MD5 hash values have been used. Put another way, the number of MD5 hashes we've computed is less that 0.00000000000006% of the total available.
It's possible, in theory, that two of those hash values will be the same. But as you can see from the numbers, it's very unlikely. The odds are better that everyone on earth will be hit by lightning, on the same day -- the day you win the Irish Sweepstakes. It's not perfect. But in the real world, a file's MD5 hash makes an excellent digital fingerprint.
There's a lot more to say about MD5 and digital fingerprints. But unfortunately, that will have to wait until our next get-together. In the meantime, if you'd like to learn more about the MD5 algorithm, check out the Internet standards document RFC (Request For Comment) 1321. It's available online at:
ietf.org |