|
|
|||
![]() |
|||
|
|
CUTTING WAN LOSSES: Overcoming Packet Delivery Challenges That Can Jeopardize Disaster Recovery InitiativesWhile packet loss and out-of-order packets are a mere nuisance when supporting typical data applications like file and email, they are a very serious problem when performing data replication, backup and other disaster recovery (DR) functions across the Wide Area Network (WAN). In the former scenario, the applications can typically recover from lost or out-of-order packets by retransmitting the lost data. Performance might suffer, but the results are not catastrophic. Disaster recovery applications, however, do not have the same luxury. If packets are lost, throughput can be decreased so significantly that the backup/recovery processes cannot be completed in a reasonable timeframe (if at all). Today, packet loss and ordering present packet delivery challenges. What can be done to overcome these packet delivery challenges? KEY WAN CHARACTERISTICS: More and more enterprises are turning to Multi Protocol Label Switching (MPLS) and Internet Protocol Virtual Private Networks (IP-VPNs) as primary technologies for Wide Area Networking. For one, they provide an easy way of supporting a meshed topology, whereby all distributed locations can communicate easily with one another. In addition, because they leverage a shared infrastructure (i.e. the Internet) bandwidth can be delivered at relatively low price points. The same thing that makes MPLS and IP VPN technologies so attractive, however, also creates some challenges. Most notably, shared routers can become oversubscribed in these environments, resulting in high levels of packet loss and out-of-order packet delivery. (Note: out-of-order packets often lead to packets eventually being dropped. As such, it is often lumped under the larger umbrella of "packet loss," as in this article.) These issues often go undetected because the typical service level agreement for MPLS allows the service provider to lose a few tenths of a percent of the packets and still meet their commitment. In addition, packet loss is typically calculated as the arithmetic mean of loss measurements taken over a month and also taken over a large number of circuits. As such, the actual packet loss could be periodically quite high for multiple hours of the day on several circuits even though the service provider has still met their contractual requirements. THE BIGGEST LOSER The affect of packet loss on WAN behavior has been widely analyzed. In MPLS and IP VPN environments, it is common to see averages of 0.5 percent packet loss with peaks reaching five percent. These problems can lead to excessive re-transmissions, which limits the effective throughput of data transfers across the WAN.
Figure 1: Impact of Packet Loss on Throughput Even the slightest amount of loss can have a huge effect on a WAN. This problem gets exacerbated as the amount of traffic being transferred increases, making packet loss big issue on high capacity WAN links. For example, one percent loss on a typical WAN (with 100 ms latency) will result in a maximum throughput of one Mbps - regardless of how big the WAN link actually is. This doesn't present a problem for typical office applications, but is catastrophic to a replication application that needs several 100 Mbps throughput to stay in synch.
Figure 2: Effective throughput is reduced to < 5 Mbps with as little as .075% packet loss TECHNIQUES FOR COPING WITH LOSS Numerous WAN optimization techniques have been developed to mitigate the impact of packet loss, particularly on high bandwidth WAN connections. One such technique is Forward Error Correction (FEC). While FEC has long been used at the physical level to ensure error free transmission with a minimum of re-transmissions, it recently has also been adapted to work at the network layer to improve the performance of applications such as data replication. The basic premise of FEC is that an additional error recovery packet is transmitted for every 'n' packets that are sent. The additional packet enables the network equipment at the receiving end to reconstitute one of the 'n' lost packets and hence negates the actual packet loss. The ability of the equipment at the receiving end to reconstitute the lost packets depends on how many packets were lost and how many extra packets were transmitted. In the case in which one extra packet is carried for every ten normal packets (1:10 FEC), a 1 percent packet loss can be reduced to less than 0.09 percent. If one extra packet is carried for every five normal packets (1:5 FEC), a one percent packet loss can be reduced to less than 0.04 percent. Transmitting a 10 Mbyte file without FEC would take a minimum of 22.3 seconds. Using a 1:10 FEC algorithm would reduce this to 2.1 seconds and a 1:5 FEC algorithm would reduce this to 1.4 seconds. Because FEC adds overhead, for it to be most effective it must adapt to packet loss. For example, if a WAN link is not experiencing packet loss, no extra FEC packets should be transmitted. When loss is detected, the algorithm should begin to carry extra packets and should increase the amount of extra packets as the amount of loss increases. Packet Order Correction (POC) is another technique that is vital to address packet delivery challenges. The goal of POC is to overcome out-of-order packets. It works by tagging packets before they are sent across the WAN so that they can be re-sequenced on the far end of a WAN link, avoiding re-transmissions that occur when packets arrive out of order. POC is performed in real-time and across all IP flows (regardless of transport protocol), making it an effective WAN optimization tool. Packet re-ordering and FEC can both be performed in either the router or in a separate appliance. If the network is supporting just typical data applications, either approach will work. However, data replication applications are more demanding than typical data applications. In a data replication application the data flow is constant and the session never terminates. As such, there is no chance for the router's buffers to handle the volume of data required for FEC and POC. It is therefore typically recommended that FEC and POC be performed in a standalone WAN optimization appliance, especially in disaster recovery and other high volume environments. SUMMARY Most IP networks are designed to support tens or even hundreds of applications, each with varying characteristics. Some applications, such as VoIP, email, SQL and Citrix require a modest amount of bandwidth for only a few minutes. Even most file and web transfers only send data for a bounded amount of time. Data replication and backup applications, on the other hand, are quite different. They require moving massive amounts of information on high-capacity WAN links. In addition, at start-up the typical data replication application spikes to line speed and keeps transmitting at that rate indefinitely. As such, Wide Area Networks must be designed to support these unique requirements. Packet loss and out-of-order packets is one area that requires unique consideration as these issues can severely limit effective throughput, regardless of the actual size of the WAN link. As a result, techniques such as packet re-ordering and adaptive FEC have become a major component of WAN optimization in disaster recovery scenarios. When these techniques are used in conjunction with other WAN optimization tools that address bandwidth and latency challenges (such as data reduction, compression and Quality of Service), enterprises have a complete arsenal for fast and reliable backup, replication and disaster recovery across the WAN. ENS Jeff Aaron is the director of product marketing at Silver Peak Systems. Aaron can be reached at . |
|
|
| |||||||||||||||||||||||||||||||||||||