6.3 Packet switching in the optical domain

Chapters 4 and 5 and Section 6.2 have described two ways of using high-capacity all-optical circuit switches by integrating circuit-switched clouds with the rest of the Internet that uses packet switching. Several researchers have proposed all-optical packet-switched routers instead.

El-Bawab and Shin [68] give an overview of the state of the art in the underlying technologies that used for all-optical packet switching, such as technologies for 3R^6.4 regeneration (SOA-based^6.5 Mach-Zehnder interferometers, soliton transmission, and self-pulsating distributed feedback lasers), packet delineation and synchronization (fiber delay lines), packet header processing (O/E^6.6 conversion, subcarrier multiplexing, and Michelson interferometers), optical buffering (fiber delay lines), optical space switching (SOAs, and LiNbO₃ crossconnects), and wavelength conversion (SOAs with cross-phase or cross-gain modulation, O/E/O conversion, and wave mixing).

El-Bawab and Shin state that major technological challenges need to be overcome before optical packet switching is viable. Many of the enabling technologies are still in the research and exploration stages, and so it is premature to build a commercial all-optical router. Buffering and per-packet processing are the basis for packet switching, and they remain the most important challenge to the implementation of an optical router. Through reflections, refractions and diffractions, we know how to bend, multiplex and demultiplex light, but we (still) do not know how to store as much information in optics as with an electronic DRAM, or how to process information in optics as fast as with an electronic ASIC. Current efforts in high-speed optical storage and processing [109,151,178] are still too crude and complex to be usable. With current optical storage approaches, information degrades fairly rapidly (the longest holding times are around 1 ms), and these approaches can only be tuned for specific wavelengths. In other areas, such as signal regeneration, packet synchronization, space crossconnects and wavelength conversion, progress has been made, but scalability, reliability and cost are still issues that need to be solved. In any case, even if some of the technology on which optical packet switching depends is not here yet, one can still study its performance to see what one can achieve once the technology has been developed.

The family of solutions that does packet switching in optics can be further subdivided into two based on the size of the switching units: Optical Packet Switching (OPS) switches regular IP packets, whereas Optical Burst Switching (OBS) deals with ``bursts'', units that are larger and encapsulate several IP packets.

6.3.1 Optical Packet Switching (OPS)

Optical Packet Switching (OPS) [186,185] is the simplest and most natural extension of packet switching over optics. It consists of sending IP packets directly over an all-optical backbone. The biggest challenge that packets face in an optical switch is the lack of large buffers for times of contention. As a rule of thumb, routers have RTT×bandwidth worth of buffering [182], so that TCP congestion control works well. For an OC-192c link and an average packet length of 500 bytes, this is equivalent to a buffer space of 625,000 packets. In contrast, existing optical buffering techniques based on fiber delay lines can accommodate at most a few tens of packets. With such small buffers, the packet drop rate of an optical packet switch is quite high even for moderate loads.

OPS tries to overcome the lack of buffers by combining two other techniques to solve contention: wavelength conversion and deflection routing. If two packets arrive simultaneously, and there are no local buffers left, the optical packet switch first tries to find another free wavelength in the same fiber, and if it cannot find it, it will try another fiber that does not have contention. The number of wavelengths is expected to be between 4 and 512, and the number of neighboring nodes fewer than 10.

OPS has some shortcomings: one is that we do not have much room to solve the contention. If we multiply the options given by the three dimensions (fiber delay lines, wavelength conversion and path deflection), we have less than (10 - 50 packets/FDL)×(4 - 512 wavelengths/fiber)×(2 - 10 neighbors) = 80 - 256, 000 options. It may seem to be close to the number of choices that we get from the electrical buffers in a router (625,000 packets for a 10-Gbit/s link), but the number of degrees of freedom is in fact much less since there are numerous dependencies that limit the choice. Moreover, packets that are bounced to different paths may cause congestion in other wavelengths or other parts of the network, spreading local congestion across larger areas of network. In addition, packets no longer follow the same path, and so they may arrive out of order, which may be interpreted by TCP as losses due to congestion, and TCP may thus throttle back its rate. Packet reordering within a TCP session also causes unnecessary retransmissions, prevents the congestion window from growing properly and degrades the quality of the RTT estimator in TCP [12,17].

A problem that is perceived with OPS is that IP packet sizes are very short for some optical crossconnects to be rescheduled. A 40-byte packet takes 32 ns to be received on an OC-192c link, and only 8 ns on an OC-768c link. By contrast, MEMS mirrors have tilting times of over 1 ms. For this reason, several researchers have proposed using bigger switching units, called bursts, in an architecture called Optical Burst Switching.

6.3.2 Optical Burst Switching (OBS)

Optical Burst Switching (OBS) was proposed in [155,177], and it is a hybrid between packet switching and circuit switching. OBS pushes buffers to the edges of the network, where electronic switches are, leaving no buffers in the optical core. OBS gathers bursts of data at the ingress nodes of the backbone using large electronic buffers until the node has enough data or a burst formation timeout occurs. At this point, the burst is sent through the all-optical core. In general, the burst is preceded by an out-of-band signaling message that creates a lightweight circuit with an explicit or implicit teardown time, through which the burst is sent, as shown in Figure 6.3. If the circuit is successfully created, the burst traverses the circuit, and then the circuit is destroyed once the burst has finished.

**Figure 6.3:** Sample time diagram of a network using Optical Burst Switching.

If during the circuit establishment there is no bandwidth left for the burst, the node can either temporarily buffer the burst using the limited space of local fiber delay lines or it can try to deflect the burst circuit to another wavelength or another fiber. If none of these three options is available the incoming burst is then dropped at that node. From the point of view of the user flows, the behavior of OBS is closer to OPS than to traditional circuit switching techniques. If there is contention, information from at least one active flow is dropped at the intermediate nodes in OPS; with traditional circuit switching, new flows are blocked (buffered) at the ingress, but old, active flows are unaffected. In traditional circuit switching, once a flow has been accepted, it is guaranteed a data rate and no contention. For this reason, the end user does not perceive OBS as a circuit switched network, but rather as a packet-switched one that switches large packets, as shown in Figure 6.3.

There are different types of OBS, essentially with different degrees of signaling complexity. The high rate of burst formation in the core makes the proposals with the simplest signaling the most interesting (i.e., those with ``best-effort'' reservation that do not wait for confirmations). The two most popular flavors of OBS are called Just-In-Time (JIT), which uses circuits with an open-ended duration and that are closed by an explicit ``release'' message from the ingress node, and Just-Enough-Time (JET), which explicitly specifies the circuit duration when the circuit is created [6].

With OBS, data is sent in batches as opposed to streamed as with regular IP or traditional circuit switching, such as the proposals of Chapters 4 and 5. This has an effect on TCP, since it relies on the packet timing to pace its transmissions. With OBS, delivery is best effort, and so the burst may be lost. Since TCP considers the loss of three consecutive packets as a sign of congestion, when burst sizes are long, the loss of a burst is expensive because it makes TCP sources throttle back their transmission rate. The effect of the burst loss rate is amplified by TCP. These two interactions of OBS with TCP are only noticeable when bursts are very long, when there are several packets belonging to the same user flow in each burst. TCP's flow and error control, thus, will set a limit on the maximum burst size that will depend on the rates under consideration.

OBS uses electrical buffers at the ingress to aggregate regular IP packets destined to the same egress node into bursts. The aggregation reduces the number of forwarding decisions that have to be done by the OBS so that they can be done electronically. The trade-off for this is that OBS requires more buffering at the ingress of the optical backbone than the optical circuit switching solutions because IP packets in OBS have to wait until the next burst departs, whereas with circuit switching, packets belonging to active circuits are sent as soon as they arrive. Furthermore, in TCP Switching, the circuits have the same capacity as the access link, hence they are not the bottleneck in the flow path. Consequently, queueing at the circuit head is unusual.

6.3.3 Performance of OPS/OBS

We can use the ``end-user response time'' to compare the performance of these two related techniques. Let me start with OBS. According to [139], If we ignore retransmission timeouts and operate in the absence of window-size limitations, we can write the average throughput of TCP as:

where RTT is the round-trip time, p is the packet drop probability and b is the number of packets acknowledged per ACK message. The first thing to notice is that the longer the burst size is, the more TCP data and acknowledgement packets get bundled together in bursts of OBS, which makes the value of b increase. In addition, the small amount of buffers in OBS is not enough to solve the contention among bursts, and so the drop rate is larger than with regular packet switching in electronic form. For example, for a system load of 50% and four wavelengths per link, the drop rates for open-loop traffic with OBS are between 2% and 0.1% [186,188], whereas the drop rates of electronic packet switching are typically several orders of magnitude lower. Using an M/M/k/k + d model, where k is the number of wavelengths per link and d the number of fiber delay lines, Yoo et al. [188] show that the drop rate decreases exponentially with the number of wavelengths, k.

Furthermore, the burst-formation time in OBS increases the RTT, which reduces the average throughput of TCP and, thus, increases the user response time.^6.7 Simulations using ns-2 suggest that even when we use a long burst formation latency of 50 ms, OBS leads to response times that are only about 10% slower than electronic packet switching, and so one can conclude that their user response time performance is comparable.

**Figure 6.4:** Topology used in [186] to simulate the effect of Optical Packet and Burst Switching on TCP. The core wavelengths were carrying bursty IP traffic in the background.

The previous arguements about the burst/packet losses in OBS/OPS seem to question the end-user performance of OBS/OPS even under moderate loads because of the high losses in the unscheduled optical cloud. However, some authors [86,188] have analyzed and performed open-loop simulations of OPS/OBS with unscheduled optical clouds, and they have found that the losses of the system are acceptable if enough wavelengths were available. For example, with a system load of 50% when the number of wavelengths per link went from 4 to 32, the packet loss rate when from 2% to 4^.10^-5.

However, the close-loop, multiplicative-decrease-additive-increase congestion control algorithm of TCP can overreact to the clustered losses of OBS/OPS, and it can make TCP cut its transmission rate very aggressively. Moreover, the burst formation time has an important impact on the TCP throughput if it increases the connection RTT [64]. Yao et al. [186] have simulated what happens when an FTP session contends in an OPS/OBS, unscheduled optical core, such as the one shown Figure 6.4. Figure 6.5 shows the response time of file transfers of 1.6 Mbytes. One can see how the response time starts degrading with backbone loads of only 30%, and how, with backbone loads of only 50%, the response times of those FTP sessions using OPS is between 12 to 20 times worse than that of an unloaded network. Figure 6.5 also shows how OBS can achieve a better performance by aggregating packets into bursts, but the performance improvement is not enough to make the system usable under reasonable link loads. However, something should be said about these results; the system under consideration had only four wavelengths per link, so there is still room for improving the performance by adding more wavelengths per link. Today it is possible to switch over 320 wavelengths [173].

**Figure 6.5:** Response time of FTP sessions in Optical Packet and Burst Switching using TCP, as shown in Figure 7 in [186]. The diagram on the left studies the effect of the TCP receiver window size (8, 32 and 64 Kbytes), and the diagram on the right the effect of the burst size (1, 10, 30 and 100 packets). The ``direct'' curve uses regular packet switching with large electronic buffers in all nodes. The other curves use OBS/OPS with fiber loops, wavelength conversion and fiber deflection to resolve contention.

There have been several proposals [188,186] to improve the dismal performance of OSP/OBS by creating several traffic classes with strict priorities or by giving priority to through traffic when it is contending with inbound traffic. The end result is that the high-priority class sees a network load that is much smaller than the total link load. It is as if all traffic of lower priority did not exist for the high priority class. For eight wavelengths per link, the high-priority class gets an acceptable performance (open-loop loss rate $\approx$ 4^.10^-5) at the cost of heavily hurting the low-priority class, which gets an unacceptable performance, with loss rates of 20% for a total network load of 60%.

Even if, on average, link loads are low in the core of the network, it is not a reasonable assumption on certain links (near hot spots) and at certain moments (e.g., after rerouting traffic following a link failure). Furthermore, hotspots and failures happen in unpredictable locations at unpredictable times [90]. OBS/OPS would not be able to provide the maximum performance where it is needed the most, unless the OBS/OPS is extremely overprovisioned by having many wavelengths per link.