Interop Presentation. Slide 1: Hi, a very good morning everybody. My name is Sundar Iyer and I am representing Switchon Networks. My talk for today is "Co- Processors and the Role of Specialized Hardware." We shall begin by looking at a few network applications and their system requirements. I will then present a study of the performance of these applications on legacy architectures. Finally I shall make a case for the use of Network Co-processors based on these performance metrics. Slide 2: Here is a typical network system or box. Shown in this figure is a backplane, which contains a switch fabric, and many line cards sticking off the bus. Lets take a closer look at one of these. These line cards usually consist of a line interface, fabric interface chips, memory and at least one processing element called a network processor. This network processor has the job of receiving the packet, processing it and sending it on its way to the switch fabric. The network processor has to perform this task at very high speeds in order to keep up with the data rate. For example on a 2.5 Gbit interface, the network processor must process 64 byte packets in less than 200 ns. The question we ask is "Can the network processors keep pace with this?" In fact I would like to ask a more precise question. "Can one do a suite of networking applications such as: 1. Routing or forwarding, 2. NAT, 3. Enterprise Functions such as QoS, RMON, 4. Load Balancing, 5. URL Switching and the like at these rates? Slide 3: In order to answer that question, we did a study on some typical applications. The analysis was done on a processor, which had 800 MIPS of processing power. We took the worst-case time taken by the processor for the following applications. Lets have a look at the throughput figures. 1. We start with forwarding. This looks fine as it keeps up with OC48 rates. 2. For address translation the throughput falls considerably. 3. If we look a typical enterprise router which say does Firewalling, QoS, Routing and a couple of RMON functions we see that the throughput is lower than a gigabit. 4. And here comes the surprising part. When applications, which are content aware, such as load balancing and URL switching come into play, the throughput is minimal. Notice that we started off by trying to achieve wire-speed rates, which for this example is a little more than 2.5 Gbit/sec. Why is this the reason behind this phenomenon? It turns out that each of these applications share a common feature. That feature is called content processing. It is this, which turns out to be the bottleneck. The goal of a co-processor is to raise the bar on these applications and eliminate this bottleneck. Slide 4: At this stage I would like to re-iterate that we shall be looking at enabling each of these application at 'wire- speed'. Our goal is to offload this bottlenecked task of content processing, to dedicated co-processors as shown here. These are two examples of co-processors, which sit alongside the network processor. Slide 5: For today's talk I shall concentrate on one co-processing task, which is content processing. So what is content processing? I shall illustrate this with an example. Lets assume that we receive a packet on an interface. In order to understand what is to be done with the packet it is necessary to identify what the packet contains. This involves searching and extracting data and then classifying this data to make a decision. 1. We can begin by authenticating the source MAC address by doing a layer 2 lookup. 2. A layer three lookup is done to identify the subnet from which the packet arrives. We identify that the packet arrives from the marketing network. 3. A layer 4 classification informs us that our VP of Marketing is accessing a Web Server outside. Now comes the tougher part, which can involve looking into the entire remainder the packet. 4. A content lookup tells us that the external server is yahoo. 5. A further peek into the packet identifies that an audio file is being requested. 6. Finally a thorough lookup lets us know that our VP craves the classic, American Pie. 7. We note that a certain external factor may influence our decision. In this case the packet is being sent at 7pm. We allow the packet based on the policies, which are configured in the box? The moral of the story, "The Boss is always right!" Slide 6. So why do the applications shown in the previous slide perform progressively worse without assistance? Here why. Every application, which requires content processing, is configured with specific policies. The mean and lean applications usually have simple policies configured. As we go towards data intensive applications, the policies involved also become progressively complex. Specifically applications such as load balancing and URL switching require a large number of rules as well as high complexity rules. Slide 7: I would like to touch upon the requirements of a content processor. Primarily 1. The content processor should be programmable to support a wide format of rules. This involves. ? Dimensions, which says, "How much can you look at". ? The number of policies, - "How many can you look at" ? Different Operations – "What can you while looking at the packet" ? Priority – "What will you return to me"? 2. Speed – "How fast can you do that"? 3. Dynamic Update – "How fast can I reconfigure that". In many solutions policies new policies need to be added and deleted and hence update speeds should be fast. 4. Minimize CPU Bandwidth – "Do I need to keep nagging you?" 5. Rule Scalability – "Will you be around tomorrow". A content processor should be scalable to next generation speeds i.e. OC192 and OC768. 6. Glueless Interface & Easy Software Integration – Finally the ease of integration into a specific hardware architecture as well as easy software integration help in wide acceptability. Slide 8: Finally I would like to conclude with a simple architecture diagram of a NP/Co-Processor solution. It would suffice to say that Co-processors form one niche in the network processor space and form an integral part of any vendor solution.