Good catch, Iceburg. It looks like Brocade, the 1999 King of the degraded loop, continues to give the fibre channel industry a bad reputation with its cheap PR stunts....and poor quality work. Fortunately, that kind of shoddy workmanship always backfires against the culprit in this industry since it involves the information lifeblood of their customers. Brocade will do well to keep on associating itself with the metro optical networking market -- the most crowded part of optical networking -- because once the backlash from the customer begins............
Tick, tock, tick, tock.<g>
The price of high quality work is actually not that high. Outside of EMC and IBM, Mcdata's average revenue per customer is about $70,000. This includes mandatory network planning, which typically includes application analysis, RAID level options (mirrored and parity-based), server consolidation, backup strategies and expansion options. Inrange and Q-Logic are presumably selling their Directors in this manner.
More on the inherent limitations of inter-switch links:
Reliability Lives at the Heart
Because the SAN is the critical connection between servers and storage, redundancy is essential. Without it, servers and their mission-critical applications will fail if connections are broken. To ensure this high level of redundancy, every server and storage device must have at least two independent connections to the SAN. In some cases, storage devices may require more than two connections to accommodate high-volume traffic. Tape libraries and backup devices should also have dual connections to ensure uninterrupted backups ...........
Redundant SAN Configurations
There are two types of high-performance Fibre Channel SAN interconnect devices—Directors and fabric switches. A Director is a class of large switches, offering 32 or more ports, with high- availability and redundant features embedded throughout the design. A fabric switch provides 8 or more ports for connecting servers and storage with little or no built - in redundancy. Redundant SANs can be designed with multiple fabric switches or Directors, each offering different degrees of reliability and availability.
A Director-based SAN scales smoothly from 1 to 32+ ports. Multiple Directors can be inter-connected to support higher port counts. While multiple Directors are not required for redundancy, an individual Director offers built-in redundancy and non-disruptive serviceability of all components.
In contrast, if a critical component of a single fabric switch fails — such as a CPU or memory board — the entire switch must be replaced, resulting in downtime. Consequently, multiple fabric switches are required to achieve fault tolerance. Each server or storage device must be connected to two different switches, so that if a single switch fails, a server or storage device is not completely isolated. To carry traffic between these switches and ensure continuity of operations, the inter-switch links must also be redundant. Moreover, a spare switch must be on-hand for quickly replacing failed units.
The drawbacks of using inter-switch links are both reduced aggregate bandwidth for applications and fewer available switch ports for connecting servers or storage, as shown in Figure 2.
Figure 2 Multi-Switch Port Utilization
Ports Available Number of Total Inter-Switch for Servers Switches Ports Link Ports and Storage
2 32 12 20 3 48 18 30 4 64 24 40
Director 32 0 32
In addition to device robustness, scalability is another important consideration when designing a SAN. or example, a non-blocking fabric can not be configured with more than four 16-port fabric switches using redundant inter-switch links — there are physically not enough ports on a 16-port switch. A configuration using five or more switches would be required for this fabric, resulting in an awkward data path routing that extends across at least three of the switches. Furthermore, such a complex configuration is impractical to manage, maintain, upgrade, and troubleshoot using the current web- based diagnostic and device management tools.
Planning for Failure
While redundancy ensures continuous operation despite a broken connection, the SAN design should also address handling potential effects of a single part failure — namely, avoiding downtime due to performance degradation. Without consistent, true high availability, a SAN is ineffective. True high availability ensures predictable application behavior even during operation failures. These operation failures include: blocking caused from path failover software, port card failure on a Director, and switch failure.
Servers with redundant connections to the SAN typically run path failover software to detect broken SAN connections and redirect the traffic over remaining connections. While this software is extensively tested, it is still impossible to predict and test for every circumstance considering the number of applications that may run simultaneously on any one server. When a SAN connection fails, the path failover software takes 60-90 seconds to reconfigure itself and re-route traffic. During this delay, some applications may fail due to the resulting blocking. For this reason, it is best to minimize the likelihood of invoking path failover software. The practicality of even using path failover software in heterogeneous operating environments must be evaluated, even though there are a variety of path failover packages on the market.
The port card is the only non-redundant component within a Director and is where four SAN cables attach. If a port card fails, as many as four servers could be affected. Consequently, until the port card is replaced, the SAN’s total throughput is reduced by 4 GB/s —potentially causing 12.5 percent blocking. Based on testing by C.L.A.M. Associates, the estimated time to replace a four-port card is one minute. For calculations in this document, a replacement time of five-minutes will be used, allowing for the SAN manager’s reaction time to the failure message. Best practices for avoiding performance degradation during port card failure or repair dictate that all connections be spread across as many port cards as available.
Although a fabric switch may offer redundant power supplies and cooling fans, the active switching components within the switch are not redundant. If one of those non-redundant components should fail, the entire switch must be replaced and the new one configured. This process affects all servers and storage ports that are directly attached, servers that access the directly attached storage, and servers that access a data path that is routed through that switch via an inter-switch link. Thus, a single switch failure could cause ten or more servers to undergo path failover. So, while redundant power supplies and cooling fans are important, they do not constitute the basis for a highly available SAN infrastructure. An organization must seriously consider this point in planning a SAN.
Until the switch is replaced and back on-line, the total throughput of the SAN is reduced, as shown in Figure 4. The reduced throughput creates a new potential for blocking, in addition to the existing inter-switch link blocking, because the full traffic load is now directed through the remaining ports. Based on testing by C.L.A.M. Associates, it takes approximately one hour to replace, configure, and bring on-line a 16-port fabric switch in ideal laboratory conditions.
Figure 4 Potential Blocking Resulting from Switch Failure
Original Remaining Potential Switches in Total Ports Server & Server & Blocking Config. Ports Lost Storage Storage Perf. Ports Ports Loss
2 32 16 20 10 50% 3 48 16 30 20 33% 4 64 16 40 30 25%
True High Availability - Downtime is not an option mcdata.com |