SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Amazon.com, Inc. (AMZN)
AMZN 239.12+0.4%Jan 16 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Derrick P. who wrote (17631)9/20/1998 3:26:00 PM
From: Glenn D. Rudolph  Read Replies (2) of 164684
 
New Web technology promises end to mile-long searches

Reuters Story - September 20, 1998 15:01
%DPR %WWW %US %ENT IBM YHOO SEEK XCIT V%REUTER P%RTR

By Duncan Martell

PALO ALTO, Calif., Sept 20 (Reuters) - When IBM scientist Prabhakar Raghavan was planning a vacation to Thailand about a year ago, he did what many of us in the Internet age now do. He turned to cyberspace.

The 37-year-old researcher punched in a search on the World Wide Web for suggestions of hot spots and must-see places and waited for the results.

But unlike everyone else who searches the Web using Internet directories and search engines such as Yahoo! Inc., Raghavan was not deluged with thousands of useless Web sites and pages on tourism in Thailand pitching mail-order brides.

"I was amazed by what I found instantly," Raghavan said. "I zoomed in all the great stuff right away; these university students had put together this wonderful information."

What Raghavan used in separating the fluff from the substantial was an algorithm, a special, little piece of software that lives on the powerful computer servers at International Business Machines Corp.'s Almaden Research Center nestled in the hills above San Jose, Calif.

The algorithm, developed in the past two years by Raghavan and his colleagues and dubbed "Clever" technology, helped spit out about 30 Web sites and pages rather than the hundreds he got doing it the "old-fashioned" way.

Typing in a search on Excite Inc.'s search engine with the words "Thailand and tourism," on the other hand, produced more than 325,000 matches. Score one for Clever.

With more than 270 million documents now on the Web and growing by a million pages a day, cataloging and organizing the electronic compendium of human thought, actions, intellect and all matters banal and serious is more than daunting.

It is virtually impossible.

"The problem is, you're drowning in pages," Jon Kleinberg, an associate professor of computer science at Cornell University in Ithaca, New York said. "But with Clever, what you get is more like a mall shopping-map for the Web."

And consider this, because the Clever algorithm is a piece of software, suddenly conquering and making sense of the Web's scattered and prolific madness becomes more automatic.

Even in an age of increasing automation, sites like Yahoo!, the No. 1 Internet directory, still use humans to categorize sites and pages submitted to them, then writing one- or two-line summaries of what is on them. Other directories, such as Excite and Infoseek Corp., also do manual "taxonomies" of the Web in serving up a semblance of order to Web-surfers.

"In a few days, we could compile a list for each topic that Yahoo! lists," Raghavan said. "Clever just runs off and does its thing and brings back to best 10 or 20 or 50 results it finds."

Indeed, in the course of a single weekend, Raghavan and his fellow researchers produced a directory that had 600 topics, about a quarter the number that Yahoo! currently has.

What Clever does is it lets users automate the process of finding the best, most authoritative Web pages by analyzing both the actual links in Web pages and the immediate words surrounding the link.

Links, which appear as underlined or differently colored pictures and words on Web pages, are what take a surfer from one Web page or site to another, all over the globe.

"What it looks at is not only the links but at the content and what they say about each other," Raghavan said. "We use these to distill not the 100,000 documents that mention aerospace but the 10 best ones."

To illustrate their point and test their work, the IBM researchers used the algorithm to check out fishing. Using traditional search engines, they came up with more than one million "hits" for fishing.

With Clever, the search returned about 30 pages about fishing in general, filtering out vast numbers of useless pages containing phrases such as "fishing for compliments."

Based on HITS technology, or Hypertext-Induced Topic Search, developed at Cornell by Kleinberg, Clever now has has the ability to filter and search by both content and the links that appear on Web pages and sites.

Kleinberg continued his work, beginning to add the content-sensitive searching filtering capabilities during a stint at the Almaden Research Center from the fall of 1996 to summer 1997 at the Almaden Research Center

What shows up at the end of a search is a list of about 15 "hubs" and 15 "authorities." The pages pointing to a site are the hubs, while the site being pointed to is the authority.

The more pages pointing to an authority, the more relevant it is to what you are looking for, Raghavan said.

Even so, while Raghavan's and the other researchers' work has attracted even the attention of IBM Chairman Louis Gerstner, don't expect the world's largest computer maker to make billions from Clever or to set up its own Yahoo! killer Web directory.

But Raghavan says Armonk, New York-based IBM will be unveiling a number of deals during the next several months licensing the technology to search engine companies and large business to set up directories on their corporate Intranets. Perhaps its most immediate and exciting use is to tame the woolly Internet, where finding what you need is more often than not an exercise in futility and frustration.

"I'm constantly amazed at the extremely arcane information on the Web," Prabhakar said.

Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext