EDA Tools: Embedded-core design challenges DFT
When the design team for Motorola's 680X0 family of general-purpose microprocessors began creating a line of embedded processors, they posed a significant challenge to the design-for-test (DFT) team.
The ColdFire microprocessors faced severe market pressures different from those previously encountered. To meet the extremely competitive demands of the embedded-processor market, the design cycle had to be compressed to a matter of months, not years. On top of that, the target market price for most embedded applications is an order of magnitude less than for desktop microprocessors.
To produce a competitive processor core designed for high-performance, low-cost embedded applications, the ColdFire team had to reexamine and, in many cases, modify time-honored design techniques-especially for the DFT methodology. Most modern design methodologies focus on meeting two overall goals: reducing cycle time and optimizing design budgets for area, timing, power and so on. If these methodologies fail to address test-and the cost of test-then they address only half the problem.
Furthermore, DFT issues have a greater impact in embedded designs because a complex microprocessor core, which is difficult to test in the first place, is surrounded by control logic that blocks direct access between the core and package pins. Moreover, some of the ColdFire cores include embedded memory arrays, making test even more complicated.
Other, more general design issues compounded the DFT challenge. Gate sizes of 0.5 micron and below introduced new test-access and timing issues, those associated with the dominance of the interconnect delay (the deep-submicron problem)
Considering these factors, the real DFT challenge is to provide test features in a silicon-optimized form, and make these features accessible when the core is embedded in a customer's application. It was also necessary to accomplish this without noticeably impacting on the design schedule. Because of these constraints, we decided to introduce and address testability issues as early in the design process as possible, while relying on a variety of sophisticated DFT techniques during implementation.
To ensure that DFT issues were fully addressed, we created and executed a "standard-optimized" test methodology. This covers all test environments, including manufacturing-defect testing for general combinational and sequential logic. This strategy calls for on-chip test architectures for scan and memory test, with all of the test features organized, prioritized and controlled by a test-control unit.
Some of the ColdFire standard cores include embedded memory. In previous designs, Motorola had relied on direct-memory access for testing memories. This approach required wire routes from each memory array to the primary pins of the device-something that cannot be done when the ColdFire architecture is embedded in an application. The core itself will have minimal access to the package pins. This means the memory arrays are doubly embedded, with no access to the chip-package pins assigned as memory-test pins.
We adopted a memory-BIST (built-in self-test) methodology to replace the direct-memory-access architecture. This shift eliminated the design-intensive methodology and the routing and timing costs associated with bringing the memories' signals to the package pins.
Memory BIST also facilitated testing of doubly embedded memory arrays by requiring a minimum of an invoke input signal and "done" and " fail" output signals. We used the Mentor Graphics Corp. MBIST Architect to create the BIST controller at the register-transfer level in Verilog and then synthesize it to logic.
There are two basic ways of implementing a memory-BIST test architecture. One method is to support one controller per array, known as BISTed memory arrays. No matter what is done with the memory array, the test circuitry moves with it as a single entity, which makes floor planning easier. The trade-off: more gates, fewer routes.
The other method creates one centralized chip-level controller that governs and tests multiple memories. The trade-off here is fewer gates, more routing. The choice of which of these methods to use depends on the number and size of the memory arrays, the expected routing congestion and the target-device geometry.
The memory-BIST logic was scanned so that it could be fault-tested and static-timing analyzed at the rated frequency. The BIST multiplexers could have a detrimental effect on the memory-access time, creating a frequency-limiting factor that could degrade the performance of the overall design. So instead of placing BIST multiplexers in the logic nearest to the memory, we placed the multiplexers one or more pipeline stages up the data path. The BIST controller accounts for the extra clock cycles when comparing the memory outputs, so no extra delays are added to the memory-access time for test or system operation.
We used scan to test all general combinational and sequential logic. The key issue when creating a scan DFT environment was to lower test cost compared with using functional vectors. The scan approach should yield higher coverage, use fewer clock cycles and require less tester memory.
Therefore, our goal was to produce vectors with a shift depth of about 100 clock cycles. Furthermore, the scan interface and the internal scan circuitry had to operate at or above the functional frequency, because the core needed to be tested at speed.
Another way we cut test costs was to use only one tester-edge test. We achieved single-edge set testing by architecting the scan so that the tester-pin timing in scan mode was identical to the functional mode. To achieve that effect, we borrowed many of the functional pins for the scan interface, maintaining the functional-pin timing during scan mode.
The scan input and output interfaces and the dynamic safe-shifting logic were modeled and synthesized, and the timing was analyzed with the rest of the chip to ensure that the scan interfaces could achieve at-speed timing. The biggest challenge was to make at-speed and single- edge-set scan a simple exercise for the design team, with minimal impact on the schedule. Since the scan enable was a critical timing path, operating at functional speed conditions, it was treated as a clock tree to manage the skew and delay across the chip.
The scan cells were inserted at the net-list-physical level and connected in an optimal fashion based on physical location, ensuring the scan connections would operate at functional speed. This approach had the added benefit of minimizing metal usage and relieving routing congestion.
The scan verification process was done in several stages on the ColdFire products, with all test structures modeled in parallel with the functional design. Very early in the design process, a net-list of the evolving scan architecture was created and run through rapid DFT analysis.
The early-prototype net-list with scan made it possible to use our DFT tools-the Mentor Graphics DFT Advisor and FastScan-to check the compliance of the scan circuitry. The net-list was also used for early vector generation. As the design matured, we repeated that process to develop a prototype scan interface that ultimately became identical to the final interface, with multiple scan chains based on borrowed functional pins.
This rapid DFT analysis also allowed us to run practice net-lists during the overall DFT analysis and test-generation process. This standard procedure is applied to ensure subsequent design changes do not violate the scan rules and to assess the various scan parameters. We used FastScan to analyze these net-lists to verify several important parameters.
First, the tool was used to determine whether the process-control scripts were valid and correct, then to examine the fault coverage to ensure it met specifications and uncovered any design problems. After that, the tool analyzed whether the vector sizing was within tester size and cost goals, giving us a good approximation for the ATPG tool run-time.
As the overall design evolved toward its final architecture, the measurements also approached their final form. These measurements were often used during the design process to adjust the scan architecture itself. For example, as the design matured, the number of flip-flops to be included in the final design became apparent.
The scan architecture was also used for burn-in testing. This was done by connecting all the scan chains into a single chain outside the package and then driving this single scan chain from a pseudo- random pattern-generator LFSR (linear-feedback shift register) and capturing the output data in a multiple-input signature register.
By combining a standard-optimized test methodology with an integrated set of DFT design tools that provided a full DFT strategy, we were able to create a wealth of test features and capabilities on each ColdFire product. |