SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
SI - Site Forums : Silicon Investor - Welcome New SI Members!

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: David Lawrence who wrote (17976)5/30/2003 1:11:23 PM
From: SI Bob  Read Replies (1) of 32871
 
Yes, the search database is, in my experience, about the same size as the message table. Sometimes larger. SQL Server lets you exclude "noise" words (I think later versions of Windoze all come with a default "noise.eng" because it's really making use of a Windoze function; not a SQL Server one) like "the", "and", "or", "for", etc, which helps some.

SQL Server has four functions built in to make use of the full-text indexing, CONTAINS() being the one I use.

It has some very big downsides, though. For example, add and populate a field to a 19-million-row table, and the equipment will be busy for days (literally) completely rebuilding the search index from scratch. And that's not the only thing that can trigger a complete rebuild. I've also noticed that it's not consistent about how it handles word delimiters other than spaces. Like punctuation. I've found instances where a message didn't show up in a search because of the character immediately following the word I was looking for.

But overall it's really not too bad. I'm just not sure how well it'll perform with 19 million messages. If it can't do it or do it well, I already have some approaches in mind to try before resorting to a home-grown method.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext