SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (113043)10/10/2000 11:47:57 PM
From: rudedog  Read Replies (1) | Respond to of 186894
 
Jozef - I'll take a stab at it. Modern relational databases are multi-threaded query engines working against data which is always in memory. How does that work? When a user or program makes a request for data, the query processor develops a query plan based on the statistical value of the indexes (i.e which index has the best statistical chance of finding the data with the fewest page reads) and then executes the query against data pages which are assumed to be in RAM. If the page is not in RAM, a page fault occurs and the query thread is suspended. Background processes then fetch the pages from data on disk and restart the query thread which delivers the result.

The bigger the memory, the more likely no page fault will occur and the faster the database will run. The more localized the data is, the more of it will be in RAM. Good indexes also help, as they are preferentially retained in memory.

The essence of a partitioned database is to determine a distribution of data which allows the relevant indexes and data for a section of the data to be consistently processed by a single instance of the database engine in a cluster. An example might be to distribute customer records based on a common variable, say last name, where the rough distribution of the data is predictable. A query designed to pull a single customer - say Bob Jones - will only be routed to the database engine with the data and indexes for last names beginning with "J". But a query against all customers - say a request for anyone who bought more than $100 in the last month - will be processed by all of the database engines in parallel, with each returning the data from the segment under its control. This is very efficient, as the engines in each cluster node only need to retain data in memory for their part of the partition, which makes flushing of the RAM data less likely. Also the available RAM for data is the sum of all RAM in the cluster, which allows very large memory configurations to be directly addressed, even in a 32 bit architecture. The current version of MS SQL supports the 36 bit PAE scheme. Likewise a single query can only run against a single thread even in an SMP system, but in a partitioned system, a thread will execute in any cluster node which has relevant data. In the CPQ system, for example, broad queries would execute in 24 parallel threads, each with its own large memory resident data.

Older database designs use index schemes which are optimized assuming that the data will have to live in a single RAM store, and with table structures which make the same assumptions. Although even these types of systems can benefit from a partitioned parallel query database engine, they really need to be upgraded to get the most benefit from the design.

Many of the database requirements on the web are ideal for parallel engines, since the records are often relatively small but large in number, which means that reasonable statistical predictions about data distribution can be made.

I hope this is not too long an answer - it is a complex subject and I just touched the surface.