You are here: Foswiki>NetFPGA/OneGig Web>ProjectsBJP (01 Sep 2010, Main.AdamC)EditAttach
This design is an implementation of Bounded Jitter Policy (BJP) algorithm [1] for NetFPGA platform. This implementation has been used in a paper titled "Preserving Pacing in Real Networks - An Experimental Study Using NetFPGA" [2].

Multiple stages of buffering, multiplexing and de-multiplexing of flows through the network can affect packet timings. Even a paced flow at the source of the network can become bursty at the destination. BJP aims to conserve the spacing between packets while they are traversing through a network.

Show Contents...Hide Contents...

Project summary

Status
beta version
Version
0.9
Authors
Hamed Tabatabaei (hamedty@cs.toronto.edu) Yashar Ganjali

Download

download source codes here.

Description

What is BJP

Figure 1

When a paced flow passes through a network of routers and switches, the inter-arrival times of packets can change significantly. Therefore, even if packets of a given flow are paced initially, they can become arbitrarily bursty as they go through the network, which means the benefits of pacing will diminish.

In order to solve this problem and preserve pacing of packets throughout the network, Beheshti et al. [1] introduced BJP algorithm, an extremely simple traffic shaping scheme which aims at retaining the smoothness of traffic no matter how many routers it passes. The basic idea behind BJP is to try to add a fixed delay to each packet at each router. In other words, whenever a packet "P" arrives to a given router at time "t", BJP will try to send the packet out at time "t+delta", for a predefined constant "delta". Clearly, this is not always feasible due to contention: different packets arriving at different ingress ports at a given time, "t", and destined to the same egress port need to depart at the same time "t+delta" which is not possible.

To solve this problem, BJP slightly reduces the delay for one of the incoming packets so that there is no contention: the first packet departs at time "t+delta-1" and the second one departs at "t+delta" (here we are assuming that time is slotted, and it takes one unit of time to transfer each packet). At the same time, BJP marks the first packet so that the reduction in delay is compensated later on one of the subsequent routers (if one exists). The goal is to keep the expected delay of a packet which has gone through "N" routers equal to "N*delta" despite of minor setbacks resulting from contention. If a packet cannot be scheduled within the time interval [t+delta-alpha, t+delta] (where "alpha" is another predefined parameter) BJP simply drops the packet.

The above figure illustrates an example of a simple network with ordinary routers and BJP routers. (a) Two packets arrive simultaneously at two input ports. (b) Regular routers send packets out back-to-back immediately.(c) BJP routers try to delay both packets for delta units of time. Since there is congestion, packets are sent out back-to-back, but one of them is marked so that another router on the path delays it with delta-1 rather than delta units of delay.

How does this design works

The major differences between this design and the standard pipeline of NetFPGA are (i) in our design every packet is stored in the SRAM at the arrival time, whereas in the standard design packets are passed through pipeline completely. And all of the stuffs like routing and queuing in switch are performed using packet pointers and some information from their header. But in NetFPGA reference design, packets themselves were passed through packets pipeline. This feature enabled us to do many computation and processes by operating on links rate. (ii) Also in our design every switch activity is logged and log entries are sent-out in a form of packets to PCI slot.

As clear in the image, in our BJP design the first module is input arbiter. The input arbiter continuously fetches ingress ports and when a packet arrives, the exact arrival time will be recorded and the payload of the packet will be copied to SRAM. The input arbiter can serve all of the ingress ports simultaneously; it means when a packet arrives, it should not wait for packets from other ports to be served. The input arbiter also routes the packet based on some bits from packet header using as hop-count. The input arbiter creates an entry for the packet which includes the packet pointer, and passes it to the scheduler of selected port.

For each egress port there is a scheduler which performs BJP procedure. Each scheduler has a table for incoming entries from the input arbiter. When the entry of received packet arrives at a scheduler, the scheduler runs BJP algorithm to find a place for this entry in its scheduling table and if it cannot (based on BJP procedure), the entry (the packet) will be dropped.

Well before sending time of each packet, the schedulers send the entry of the packet to their senders. The senders receive packet pointers from schedulers, read packets from SRAM and exactly at the scheduled time send the packets out.

Also there is a timer module in our design. The timer slots the time domain and also it is connected to all of the other modules to aware them of current time and time-slot number.

When a packet arrives or departs the NetFPGA, there will be a log entry generated either by the input arbiter or the senders. The exact arrival time or departing time of packets will be written in log entries. These log entries will be collected in the logger module. The logger sends out several of log entries in format of a packet to CPU ports of NetFPGA. We used these logs as our measurement tool.

Since there are four senders in this design, we need a SRAM read arbiter, then all of the senders can read from SRAM simultaneously (In the current version we use NetFPGA default SRAM arbiter, then only two egress ports are functional and SRAM reading via "regread" command is disabled).

In the current BJP implementation we use hop counting to choose egress port of each packet. In the header of each packet we specify a field to keep the number of hops each packet should pass and after that the packet would be dropped on the port MAC1 (else would be forwarded on the MAC0) . Also BJP needs a field in the packet header to place the current lag of the packet from desired time. We use first 64 bits of each packet for this purpose.

Limitations in this version

This version uses NetFPGA SRAM-arbiter. Therefore there is only two functional egress port (instead of four) and sram reading via regread command is disabled.

Some Details on Hardware Design

(*many of below parameters can easily be changed by editing the parameter section in the "user_data_path.v")

  1. As it is mentioned above, the BJP slots the time domain. This procedure performed in the timer module. By default each time slot is 1524 clock cycles equal to 12192 ns. These parameters are accessible under the name of SLOT_SIZE.
  2. By default normal delay for each packet in this BJP procedure is 10 time slots.
  3. This design uses first 64 bits of packet headers as below (which are basically using for the destination and source MAC addresses):
    BITS: 63-60 59-55 54-12 11-0
    FIELD: Hop Counting BJP Lag Field UNUSED Packet ID
    PARAMETER: ROUTE_LENGTH LAG_LENGTH PACKET_ID_LENGTH

    Hop Counting: When a packet arrives at the router if this field is larger than zero, the packet would be routed to the port MAC0. And if it is 0, the packet would be routed to the MAC1. At the leaving time of packet the hop counting would be decreased by 1. (note that you can easily change this algorithm by editing the verilog source codes. We set as above for simplicity in our tests)

    BJP Lag Field: During BJP procedure if any packet can not be sent out on the desired time of the BJP procedure, the lag from desired time would be written in this field to be used and compensated by next BJP router.

    Packet ID: This field used only for logging proposes. The ID of the arrived or departed packet is recorded in all of log entries.

    The above fields are used as below in normal Ethernet packets:

    BITS: 63-16 15-0
    FIELD: Destination MAC Address Source MAC Address
  4. The input arbiter module and all of the senders generate log entries for each packet which departs or arrives, as below:
    NUMBER OF BITS: 12 19 19 4 5 2
    FIELD: Packet ID Arrival (Departure)Time Address(Pointer) of the Packet in SRAM Hop Count BJP Lag Arrived (Departed)Port Number
    PARAMETER: PACKET_ID_LENGTH MAIN_TIME_BITS SRAM_ADDR_WIDTH ROUTE_LENGTH LAG_LENGTH PORT_BITS
    The width of these fields can be changed by changing listed parameters. The log entries are collected at the logger module and several of them (by default 10; set by LOG_PACKET_SIZE) are sent out on the CPU0 port, which can be collected by tcpdump command on the nf2c0 port.
  5. The modules within the design (the input arbiter, schedulers and senders) are communicating using the below structure:
    NUMBER OF BITS: 19 5 1
    FIELD: Address(Pointer) of the Packet in SRAM BJP Lag The LSB of the Number of Slot Time Which Packet Arrived in
    PARAMETER: SRAM_ADDR_WIDTH LAG_LENGTH

    The last field is to recognize if the packet arrived at the current time slot or previous one (to be used in schedulers for more accurate timings)

Usage

Download the bit-file on the board:

nf2_download nf2_top_par.bit

Turn the nf2c0 port up:

ifconfig nf2c0 up

Make your desired network by connecting ports of the NetFPGA board to your network, or building your own topology with your NetFPGA routers.

Tcpdump the nf2c0 port to dump the log packets:

tcpdump -i nf2c0 -w test.pcap

Use the log reader software to read the log files. The "read.py" is located in {project_directory}/sw/python, and written in python to read the pcap files. The pcap library for python should be installed.

python read.py test

This command reads test.pcap file and translate it to test.txt; test.txt would be like:

Packet_ID   Time   Pointer_Addr   Hop_Count   BJP_Lag   Port_Number
1      51963   1          14      0   0      packet with ID# 1 and SRAM pointer address 1 arrived @ time 51963 on port 0 (eth0) with initial lag of 0 and 14 remaining hop to pass
2      54959   190      14      0   0
3      57955   379      14      0   0
4      60951   568      14      0   0
5      63947   757      14      0   0
6      66943   946      14      0   0
1      68129   1          14      0   0      packet with ID# 1 and SRAM pointer address 1 leaved the board @ time 68129 on port 0 [because hop count is larger than 0] with lag of 0 and 13 remaining hop to pass
7      69939   1135    14      0   0
2      71177   190      14      0   0

To get real time, multiply the time column by 8 ns. For example, the very first entry means packet with ID 1 arrived at time 415704 ns and left at time 545032 ns. Therefore the packet spent 129,328 ns in the router (+some fixed delays), which is equal to 10.6 time slots. The length of each time slot is 12,192 ns and the normal delay (delta) was set to 10 time slots. The 0.6 extra delay is due to the fact that BJP routers work with slotted time. Hence, the arrival of a packet at BJP procedure is always the start of the next time slot (this concept was completely explained in [2]).

Tests

Debugging

In order to debug our design in ModelSim, we include our verification test in {project_directory}/verif directory. This test can be used to debug custom changes in the design by tracking the incoming and outgoing packets.

Functional

In this section we explain the usage of BJP in a simple network using three NetFPGA cards and run a step-by-step test. We capture the log files, analyze them, and demonstrate the functionality of BJP by plotting relevant graphs using MATLAB.

1- Connect three NetFPGA cards as shown in the figure:

2- Download the packet generator bit-file on the first card and BJP bit-file on second an third cards:

(1st machine): nf2_download packet_generator.bit

(2nd & 3rd machines): nf2_download bjp.bit

3- Bring the nf2c0 ports of BJP switches up, and run tcpdump command to catch the log packets:

(BJP #1): ifconfig nf2c0 up; tcpdump -i nf2c0 -w bjp1.pcap

(BJP #2): ifconfig nf2c0 up; tcpdump -i nf2c0 -w bjp2.pcap

4- There are two pcap files in {project_directory}/regress/pcap directory named "hop_14.pcap" and "hop_1.pcap". These two files contain 400 packets which hop_counts on them are set to 14 and 1. And the packet ID fields in these packets are set from 1 to 400 in an increasing order (by running tcpdump command with read option one can take a look to these pcap files). Therefor, if we inject these two files from Traffic Generator on the ports 2 and 3, 800 packets (if there is no drop) will be sent out to BJP switch #1 on port 0 (because hop counts are larger than 0). On the BJP switch #2, 400 packets will be sent out on the port 0 with hop count of 12 and 400 packets will be sent out on the port 1 with the hop count of 15 (-1 on 4 bits).

./packet_generator.pl -q2 hop_1.pcap -q3 hop_14.pcap

And we expect to receive 400 packets on port 0 and 400 on port 1 as well (if there was no drop in BJP procedure)[in this case we had one packet loss]:

Transmit statistics:
================
MAC Queue 2:
Packets: 400
Completed iterations: 1
MAC Queue 3:
Packets: 400
Completed iterations: 1
Receive statistics:
===============
sh: bc: command not found
sh: bc: command not found
Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137.
Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137.
MAC Queue 0:
Packets: 399
Bytes: 600096
Time: 0.000000000 s
Rate: 0 Kbps (packet data only)
Rate: 0 Kbps (including preamble/inter-packet gap)
sh: bc: command not found
sh: bc: command not found
Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137.
Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137.
MAC Queue 1:
Packets: 400
Bytes: 601600
Time: 0.000000000 s
Rate: 0 Kbps (packet data only)
Rate: 0 Kbps (including preamble/inter-packet gap)
sh: bc: command not found
sh: bc: command not found
Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137.
Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137.
MAC Queue 2:
Packets: 0
sh: bc: command not found
sh: bc: command not found
Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137.
Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137.
MAC Queue 3:
Packets: 0

5- Now interrupt tcpdump commands and they should collect 160 packets ([800 packets arrived in + 800 packets sent out] / 10 entry per log packet) or less if there were drops in the previous stages:

tcpdump: WARNING: nf2c0: no IPv4 address assigned
tcpdump: listening on nf2c0, link-type EN10MB (Ethernet), capture size 96 bytes
^C160 packets captured
160 packets received by filter
0 packets dropped by kernel

6- Use the python read file to convert pcap files to readable log files. From the directory {project_directory}/sw/python, run:

python read.py bjp1

python read.py bjp2

to convert "bjp1.pcap" and "bjp2.pcap" to "bjp1.txt" and "bjp2.txt". These two TXT files should look like:

bjp1.txt:

Packet_ID    Time    Pointer_Addr    Hop_Count    BJP_Lag    Port_Number
1    485177    393217    14    0    3
1    485177    262145    1    0    2
2    488799    393406    14    0    3
2    489673    262334    1    0    2
3    491669    262523    1    0    2
3    492420    393595    14    0    3
4    494290    262712    1    0    2
4    496041    393784    14    0    3
5    499037    262901    1    0    2
5    499662    393973    14    0    3
1    499753    262145    1    1    0      this packet sent out with 1 lag (sooner), because two packets arrived at the same time.
1    501277    393217    14    0    0
6    501908    263090    1    0    2
6    503282    394162    14    0    3
2    504325    393406    14    0    0

bjp2.txt:

Packet_ID    Time    Pointer_Addr    Hop_Count    BJP_Lag    Port_Number
1    381681    262145    0    1    2
1    383205    262334    13    0    2
2    386254    262523    13    0    2
2    387778    262712    0    0    2
3    389302    262901    0    0    2
3    390826    263090    13    0    2
4    392350    263279    0    0    2
4    393874    263468    13    0    2
5    395398    263657    13    1    2
5    396922    263846    0    0    2
1    398637    262334    13    0    0
1    398637    262145    0    0    1
6    399969    264035    0    0    2
6    401493    264224    13    0    2
2    401685    262523    13    0    0

If there are two entry with the same packet ID and address, the first one is showing the packet arrival and the second one is showing the packet departure.

7- Beyond this point we use MATLAB m-files in {project_directory}/sw/matlab to analyze the results.

In {project_directory}/sw/matlab directory, there are several scripts to plot the results. The main file is "log_proc.m" which reads bjp*.txt in MATLAB and plots some useful graphs based on them (note that when these files brought in MATLAB, the log_proc.m changes the time unit from clock-cycle to nano seconds.)

Two of these graphs are shown below. First the CDF of packet inter-arrival times in BJP#1 router ingress ports.

And the second one is the overall delay of packets from source to destination of the flow hop_14.

Related Works

Based on this design, a study performed in University of Toronto. The results of that study is presented at [2].

References

[1] N. Beheshti, Y. Ganjali, A. Goel, and N. McKeown, "Obtaining High Throughput in Networks with Tiny Buffers," Proceedings of 16th International Workshop on Quality of Service (IWQoS), Enschede, Netherlands: 2008.

[2] S. H. Tabatabaei, Y. Ganjali, "Preserving Pacing in Real Networks - An Experimental Study Using NetFPGA," in NetFPGA Workshop 2010.

-- Main.HamedTabatabaei - 29 May 2010

Topic revision: r22 - 01 Sep 2010 - 22:46:39 - Main.AdamC