SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Xilinx (XLNX) -- Ignore unavailable to you. Want to Upgrade?


To: Bilow who wrote (2160)12/29/1998 7:18:00 AM
From: Bilow  Read Replies (1) | Respond to of 3291
 
Note on the new Virtex carry chains...

My third problem with the tool set (the first is the absence of parameterizable functions and the second is the new CLB "slice" scheme, in particular, the requirement that primitives be RLOCed with a particular slice. It would be cool if they could instead inherit their slice "offset" from higher hierarchy levels along with their row and column offsets...) is the fact that the mapper barfs on the new carry chains.

From the hardware standpoint, the new carry chains are an improvement on the 4000 series, and a huge improvement on the Altera 10KE carry logic. I should explain the fundamental differences between the three:

First Altera 10KE. When a (4-input) LUT is to be used as part of a carry chain, the LUT is split into two 3-input LUTs. One computes the carry to the next logic level, while the other provides the result for the given logic level. Now the carry-in from the previous level uses one input on those 3-input LUTs, so if the function desired is addition, the other two inputs on the 3-input LUTs will be used by the "A[n]" and B[n]" inputs. There is no other functionality possible, you can't change the adder into a subtractor or something like that. One advantage of this technique is that you can make the carry chain do something other than arithmetic. In particular, the carry chain is the fastest way to get from one flip-flop to another, so in high speed design, you see the carry chain abused somewhat by the more imaginative engineers. It also forces the logic to occupy more or less adjacent locations, providing a sort of RLOC function (which Altera sorely lacks). (This technique was shown to me by a younger engineer recently, who I expect to go far, if he wants.)

The Xilinx 4000 series uses a dedicated carry chain that is in hardware separate from the LUT. The carry computed from the carry chain is available to the 4-input LUT, but there is no need to use LUT resources to compute the carry-out. This means that there are more functions available than just addition or subtraction. Given that two inputs to the LUT are going to be A[n] and B[n], and that one input is going to be the carry-in, the other input is available as a mode pin. So you can choose between either of two functions. Incidentally, my guess is that dedicated carry chain logic was chosen by Xilinx because it is faster than a split LUT would be, which was the correct choice on their part, I am sure.

An example of a Xilinx 4000 series carry-chain LUT use would be an accumuator that is synchronously clearable. You use the mode pin so that in one case, it allows the LUT to compute an add, and in the other, it forces the output to zero. Another (common) use, is to have a mode pin that converts the adder into being a subtractor.

The new Virtex carry chain also has dedicated logic to pass along the carry, but since the most common function of the carry is to complement the result of the current bit, they also provide that function with dedicated logic. This means that the carry-in is no longer an input to the 4-input LUT, and this frees up another input to the LUT. Given that two inputs are going to be A[n] and B[n], this means that we can have two mode pins, and choose any of four possible modes, instead of the two possible modes available in the 4000 series. There is another mode, which uses the "MULT_ADD", which I will discuss later.

This Virtex carry mode (i.e. no MULT_ADD) has 5 obvious (maybe I over-looked something) arithmetic modes, and we can choose any four of them, in any order, for access with our two mode inputs. Here, A and B are the inputs, while S is the result, all variables in signed integer notation:

1) S[n:0] = A[n:0] + CIN
2) S[n:0] = A[n:0] + B[n:0] + CIN
3) S[n:0] = A[n:0] - B[n:0] - CIN
4) S[n:0] = A[n:0] + A[n:0] + CIN
5) S[n:0] = A[n:0] - A[n:0] - CIN = - CIN

The last mode, gives all zero or all ones, depending on the value of CIN. The fourth mode, provides a left shift, with carry-in. But remember that we can only pick four of these five modes for any particular CLB slice.

Notice that all of the above modes are based on A[n] having something added to or subtracted from it. The MULT_ADD feature allows us to kill-off the A[n] input. It requires us to use one of our two mode inputs as the line that (when low) zeros the A[n] input. This means that there will still be four possible modes. Two will correspond to that mode line high, and can be chosen from the above list of five modes. The other two will correspond to the mode line low, and are just like the above, but have the first column of A[n:0] zeroed:

1) S[n:0] = CIN
2) S[n:0] = B[n:0] + CIN
3) S[n:0] = -B[n:0] - CIN
4) S[n:0] = A[n:0] + CIN
5) S[n:0] = -A[n:0] - CIN

These are really cool choices.

The bestest thing about the freedom of these mode bits is that I can reduce the number of control lines I need by carefully choosing which modes I use. For instance, if I will always want to clear one register while another is performing an add, I can arrange for a single line to control both registers. This saves routing space.

I can only hope that when the Logiblox get out the door they will have the ability to create all the possible register choices.

My (slightly erroneous) calculation for the number of possible arithmetic functions:

(w/o MULT_ADD) 5 * 4! = 120.

(w/ MLUT_ADD) 10 * 10 * 4! = 2400.

Total: 2520. (The error occurs because A[n:0] + CIN shows up in both lists.)

Anyway, right now the problem I am having with the carry chains is that they are extremely sensitive to being distorted by the "logic trimmer". A logic trimmer removes unneeded logic, and the trimmer has a nasty tendency to screw up my logic if I have any constants input to it. It also screws up my logic if I fail to use my carry-outs. So I am faced with an unappetizing choice of either making special case macros for everywhere I add two numbers together that don't match in size, (i.e. build an adder for the low 12 bits, and an incrementer for the high 4 bits), or mucking around with a "TESTGND" signal, that I arrange to always be zero, but the mapper believes to be a real variable. Yech.

Time for me to go to bed...

-- Carl