SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
SI - Site Forums : Silicon Investor - Welcome New SI Members!

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: SI Bob who wrote (17919)5/29/2003 2:27:13 PM
From: Jon Tara  Read Replies (1) of 32871
 
Bob, I hope SI or iHub don't search the database of posts directly. You post implies this might be the case.

The only way to do this effectively is with a search engine with seperate "inverted" database. ("Inverted" refers to the the fact that the documents are turned "inside out", so that it is accessed by words, not by document name). Search engines generally use a proprietary database format.

It's generally not a very difficult task to interface a search engine. You generally need to write some code that will read a document and feed it to the search engine for indexing. (Since on a discussion site, documents aren't generally stored in simple flat files.) And you generally need to code or customize the UI to present results.

Using such a search engine, I see no reason to limit the universe of documents indexed, unless you have a disk space constraint. Speed is much better than linear with respect to the number of documents or words indexed.

The search database will generally take at least as much disk space as the documents themselves - in fact, perhaps more.

There are quite a number of commercial and open-source search engines to choose from.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext