SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
SI - Site Forums : Silicon Investor - Welcome New SI Members! -- Ignore unavailable to you. Want to Upgrade?


To: SI Bob who wrote (17919)5/28/2003 11:16:21 PM
From: Joe Lyddon  Read Replies (1) | Respond to of 32871
 
Bob, you could buyout Raging Bull & have the market cornered!! Talk about a real money maker! You could rob from one to pay the other and grab all you could from one to make it all worthwhile! (kidding, but worth a thought).

Have fun,
Joe



To: SI Bob who wrote (17919)5/29/2003 7:32:16 AM
From: TFF  Respond to of 32871
 
Whoa! an honest/complete answer. I like the new mgmt already.



To: SI Bob who wrote (17919)5/29/2003 2:27:13 PM
From: Jon Tara  Read Replies (1) | Respond to of 32871
 
Bob, I hope SI or iHub don't search the database of posts directly. You post implies this might be the case.

The only way to do this effectively is with a search engine with seperate "inverted" database. ("Inverted" refers to the the fact that the documents are turned "inside out", so that it is accessed by words, not by document name). Search engines generally use a proprietary database format.

It's generally not a very difficult task to interface a search engine. You generally need to write some code that will read a document and feed it to the search engine for indexing. (Since on a discussion site, documents aren't generally stored in simple flat files.) And you generally need to code or customize the UI to present results.

Using such a search engine, I see no reason to limit the universe of documents indexed, unless you have a disk space constraint. Speed is much better than linear with respect to the number of documents or words indexed.

The search database will generally take at least as much disk space as the documents themselves - in fact, perhaps more.

There are quite a number of commercial and open-source search engines to choose from.