Routers with a Single Stage of Buffering

Sigcomm 2002: Script for “Routers with a Single Stage of Buffering”

(Slide 1: Title: Routers with a Single Stage of Buffering)

(Slide 2: Outline)

We usually characterize a router by its capacity. The capacity is the sum of the rates on its line cards. But this full capacity is usually not available to us. In reality there are a number of constraints imposed by a given router architecture such as head of line blocking, output blocking and other structural limitations, which prevents us from having access to all of this capacity. So the available throughput is generally lower.

(Slide 3: How do we design routers)

We can cynically characterize the way we design routers today in the following way: first, we pick the capacity; next, we pick an architecture that we hope will provide this capacity for the technology we have available. For example, we might decide to use an input queued or a combined input and output queued router. Then and only then, we simulate the system, or use a probabilistic model, to find out how well it will perform. We know already that the system won’t be perfect. We produce graphs like this that tell us how far we missed the target, and hope that customers will put up (or put up or shut up) with the unpredictable performance they get.

It won’t be work conserving --- things like HOL blocking, heuristic scheduling algorithms and other deficiencies compromise the throughput. All we will be trying to do is to minimize how un-work conserving it is. We cant guarantee that the router will be work conserving although most routers are built hoping that they are work conserving.

In a sense, this is an open loop design process. We pick the capacity target, our architecture, and then see how well it performs. The simulations and models don’t and can’t tell us how to architect the system.

(Slide 4: How can we re-design routers)

We’re trying to do something different. (Pause)

We’re trying to find a model that helps us architect the router, by capturing the way individual packets are processed. Rather than launch packets into the system and hope they emerge sometime in the future, we’re going to assume that every packet has its own, pre-determined departure time that we must meet because we want to have a work conserving router. For example in a work-conserving FCFS router, the departure time of a packet is known as soon as it arrives.

So, (animate) we will take the DT as a requirement, pick our capacity target and architecture and then see how we can build a router.

(Slide 5: Motivation: What is this talk about)

So, in this talk I will be talking about a different way of thinking about a router in which we start with the departure times of the packets and take them as a requirement and try and figure out the number of resources that we need to ensure that packets depart at their departure time.

(Slide 6: A look at Single Buffered Routers)

So a high level view of a router looks like this. There are a number of memories in the router. Memory BW on a single memory is the main bottleneck and so I will assume a model where the memories themselves need to run at only the line rate.

A packet arrives and the router determines the output port to which the packet is destined. It holds the packet in the buffer till it departs. We know when it must leave. Its departure time is uniquely determined by the scheduling policy; for example FCFS; and the need to make it work conserving.

In our model, we assume that when a packet arrives it is written into a memory, and stays in that memory until its time to leave. Because each packet is buffered only once, we call this class of routers Single Buffered Routers.

As shown here our arriving packet is buffered for 4 time slots because there are three packets waiting ahead of it for the same output.

So what is it that could prevent this packet from leaving at the right time? Perhaps it’s sitting in a memory with another packet that needs to depart for a different output at the same time. The memory can’t deliver both at the same time, and so one of them will miss its departure time. Or perhaps we need to write an arriving packet into the memory, and so the memory is busy receiving a packet, and our packet is again unable to leave. Both ways, our packet misses its departure time, and the router is not work conserving. Our goal is to determine how many memories we need, how they are interconnected, and how fast they must run so that packets depart at the right time.

(Slide 7: Contents)

(Slide 8: An Abstract Model: Pigeonhole Principle)

- Really our job is one accounting. So in order to design our router, we’re going to use a simple counting model. (Read the abstract model from slide).

(Speak colloquially) - Use hand expressions for the pigeonhole.

So lets look at our model again. The basic operation is that when a pigeon arrives we are going to write it into a pigeonhole. Of course we need to know that that the pigeonhole is not busy sending a pigeon and we need to know that no other pigeon is going to that pigeonhole. So we can see that there are some constraints that are going to allow us or prevent us from writing to this pigeonhole.

So one way of looking at this is to say how many pigeonholes P, do we need so that N pigeons can be immediately placed in a pigeon hole when they arrive, and can depart at the right time?

(Slide 9: Solving the abstract model)

1) No on else is writing to that memory at the same time.

2) No one else is reading from that memory at that time.

3) And this takes a little bit more thinking, that is the departure time that we are so interested in, that at this departure time, no one else will be reading from (or writing to) that same memory.

But we know that since the departure time is in the future, the packets that we intend to write to have not yet arrived. In contrast the packets that have to be read from the memories may already be buffered in the router. So we can think that we have three types of constraints here and these constraints are deterministic. So we can use the pigeonhole principle and we can use this directly to analyze this.

So what it tells us is that there is going to be roughly 3N-1 resources. I shall be more specific about that and will give you some examples in a moment.

(Slide 10: A take-away)

One thing to carry from this talk is this way thinking about a router as determining the time at which the packet will depart, defining the constraints for the packet, and then designing a router so that they can depart at the time that they should.

So try and convince you that this is a useful way of thinking about it, I will go through some examples. As far as we know these two architectures have not been studied before.

(Slide 11: Contents)

The first example is an obvious way of thinking about a router and that is the parallel-shared memory router.

(Slide 12: PSM Router and animation that 7 routers don’t suffice)

Explain in words what the router is. Mention multiple parallel memories. Mention 2NR. Say that memory is the main bottleneck. Mention that memories can be individually arbitrarily slow, we just need more of them.

Explain animation. (When you mention the numbers 2,3 mention that it is N, N-1)

(Slide 13: PSM Model: 3N-1 memories suffice)

Speak with animation.

(Slide 14: DSM Model)

Now I am going to explain a slightly more interesting router called the DSM router. The DSM router is one in which there are a number of line cards attaches to a rack just like any other router. Packets arrive to the router, then they are buffered, and then they depart.

- We have just moved the central memories to the line cards.

The interesting thing about the DSM router is that the packet does not necessarily get buffered on the line card on which it arrives. In fact what happens is that this packet comes at an input, it will traverse the switch fabric and get buffered into another memory that happens to be free. At the departure time for this packet, it will be taken from that memory and it will be sent out.

(Slide 15: Why is the DSM Router Interesting)

Why is this interesting? This is interesting for the following 4 reasons:

1) The first one is that the BW into and out of a memory is in principle run at the line rate.

2) The second thing is that each memory is shared across the different inputs and outputs. This allows us to greatly reduce the amount of memory required overall in the router.

3) Third, as we add more memories to the line card, we also add more buffering. This buffer is not just used by the packets arriving or departing at that line card, but it actually used by everybody else.

4) The 4th reason that this is interesting is that the 2nd most commonly used router in the Internet is built this way. This is Juniper’s M40 router architecture.

So lets look at how we can use the Pigeon -hole technique to analyze the DSM router.

(Slide 16: Animation for DSM Router)

- Mention the FIFO case for DSM.

(Slide 17: Previous work; which uses the counting/pigeon-hole principle)

Mention References for PPS, CIOQ.

So I have given two examples for the parallel-shared memory router and the DSM router. There has actually been a trend towards this kind of analysis. Mention CIOQ/PPS.

(Slide 18: Contents)

(Slide 19: Problem with non-FIFO scheduling policies)

- Now there is one thing that we have glossed over until now which is the departure time of the packet. In a work conserving FCFS router the departure time is actually obvious. The departure time of a packet arriving at a router destined to a specific output port is merely the first time that the output is free in the future. While FCFS is still widely used there are many other scheduling policies such as DRR, WRR, strict priority etc. And in none of these cases do we know the departure time of a packet.

Before we see how to solve this problem, let us first use an abstract model, which can capture a number of these scheduling policies. This abstract model is what we call PIFO. PIFO is a scheduling policy, ->

(Slide 20: PIFO)

Give definition.

(Slide 21: Enabling QoS with PIFO queues)

PIFO includes a number of queueing policies. Here is an example of WFQ using PIFO...

-- WFQ slide - 3,2,1 - say one particular order...

"And you probably see that is probably true for strict priorities as well"

The question is if we have got this model of PIFO, how can be do this with the constraint set. It turns out that we have extra constraints when we do PIFO and in order to write this packet we need more memories.

(Slide 22: PSM -animation)

- Here is an example for the PSM router. (Explain)

- The explanation is simplistic and there are more details for PIFO in the paper.

(Slide 23: How many memories are needed in a DSM)

(Read from slide)

(Slide 24: Contents)

(Slide 25: Comparison of FIFO and PIFO)

(Read from Table in Slide.)

Can say that we can characterize the cost of the DSM with FIFO with DSM with PIFO. It is normally not so easy to characterize FIFO with PIFO.

(Slide 26:Comparison with CIOQ)

Say that CIOQ is the most commonly used today. It is interesting to compare these two routers. They are two different approaches. If both these routers were built in an ideal manner, then we are comparing the Cisco vs. Juniper approach.

- Now that we have looked at both we can see immediately what the cost is.

(Slide 27: Conclusion)

We started this work since we noticed that there was a class of routers, which was not analyzed before for example the PSM and DSM routers. We noticed that we couldn’t use techniques that we normally use to study. So we came up with this counting technique based on the Pigeon Hole principle, and we realized that this could be use to analyze any router which has a single stage of buffering. Also surprisingly it can be used to analyze the CIOQ router, which is not an SB router. So if there is a contribution of this paper, we believe that this is this counting technique