SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
SI - Site Forums : Silicon Investor - Welcome New SI Members! -- Ignore unavailable to you. Want to Upgrade?


To: Green Receipt who wrote (18918)6/26/2003 10:04:38 PM
From: David Lawrence  Read Replies (1) | Respond to of 32871
 
>>Yeah having something else 'indexing' them off line would save a lot of time for the 'end user' who in turn accesses the 'index' but i suspect if you wanted it even faster you could have multiple servers that distribute the request to perhaps (different partitions).

I made a similar suggestion a couple of weeks ago. "Older" parts of the message table are static, so those DBs can be compressed and optimized since there is no chance of record insertions or data changes. It can be partitioned into period oriented subsets, which would be very beneficial if date ranges are allowed in the search criteria.

Also, as I put it to Bob, moving the static search indexes onto other box(es) will keep the cache from getting wiped out on the production machine every time some archaeologist goes searching through ancient posts.

With the above structure, there is no practical reason that the entire message database couldn't be indexed for searching.



To: Green Receipt who wrote (18918)6/27/2003 1:15:51 AM
From: SI Bob  Read Replies (1) | Respond to of 32871
 
I haven't really cared enough to find out, but my uneducated and barely-exposed guess is that a blob holds binary data (like images) and a clob holds a nearly limitless amount of character data. I could be way off-base, but it's my guess.

Similar to SQLServer's Text fields or what I remember as Memo fields in other databases.

I'll be really curious to find out, but I'm "guessing" that even with the enormous amount of data involved, by doing searches on a separate very powerful box, the fact that the cache hits would be relatively low might be a non-issue. Search is something most people can tolerate a bit of a wait for. I'd call anything less than 5 seconds to be more than acceptable performance, and that might be possible with tons of fast memory, very fast processors, and very fast hard drives. Basically another Dell 4600.

Will a Codebase driver improve performance on subsequent hits even on a dynamic database? Will the existence of new messages with "INTC" in them make each retrieval basically a "first" one?