Weighted Round-Robin Scheduling Module

Lou R. Rohan
Douglas R. Taube
CS/COE 535
Fall 2001
Washington University in Saint Louis

Introduction

This paper presents the design and implementation of a weighted round robin module for intended implementation on a Field-Programmable-Port-Extender (FPX) attached to a Washington University Gigabit Switch (WUGS). The queuing strategy used in this design is based upon input queuing (IQ). IQ structure has been demonstrated to require the least memory bandwidth for buffering ATM cells [1]. Two types of queuing implementations are active in the module. The queues are presented in First-In-First-Out (FIFO) modules to preserve cell order as they are passed through the module. Also, incoming cells are queued on a per-virtual-circuit (per-VC) basis. Queuing in this manner enables bandwidth allocation [1]. Secondly, the queues are assigned a weight to allow for priority scheduling. This method improves Quality of Service (QoS) enforcement and reduces head-of-line (HOL) blocking issues [1].

Background

Applications can be classified into two basic service groups, best-effort service and guaranteed service. A significant deployement of ATM network technology provides the basis of managed and guaranteed service but Internet traffic patterns do not adhere to the standard. Under most Internet traffic patterns, resources are not explicitly reserved, end systems do not pace their transmissions, and most network equipement does not enforce resource usage limits [2][3].

Description of the Project

This project implements a dynamic queuing scheme that intends to provide resource sharing, improve QoS enforcement and reduce HOL blocking. The queuing scheme is designed to register VC upon call establishment by means of a control cell. The VC registration follows in the implementation framework of the NCHARGE control software for FPX.

The module is designed for the FPX to provide IQ traffic per-VC. Each VC can have an assigned priority level (i.e. weight) and a weighted round-robin (WRR) scheduler is implemented to reduce HOL blocking.

Components from the course machine problem 5 were reused for this project. By reusing the SDRAM interface, the cell pacing component, and the push/pull cell components, the overall development of the project managed to the new modules introduced for WRR.

Major Components

Block Diagram

Top Level Block Diagram of the Fair Queueing Module

Module Assignments

Lou Rohan - Queue Selector, Queue Manager
Douglas Taube - Output Scheduler

Details of Implementation

GetCell : This is a state machine. Upon receiving a SOC_in, it gets the complete cell in 32 bits at a time and stores it in temporary registers which are 64 bit wide. Thus it receives the 14 words cell on 32 bit interface and formats it into a 7 word cell with 64 bit words [5].
PushCell: This is a state machine. Once an input cell is completely assembled into the temporary registers, it pushes the cell into the CellFIFO, 64 bits at a time. First this state machine makes a request to the CellFIFO module to push the cell and waits for the grant. The request is kept high until a grant is received.Once the grant is received, the Push state machine pushes the cell into the CellFIFO [5].
Queue Selector : This consists of three main components: a finite state machine that detects control cells on VC 35, a process that selects the input queue for the Queue Manager and registers new VCs, and a process that updates the priority list when control cells indicate a priority registration or a priority change.

Internal components and interconnects of the Queue Selector

The Queue Selector is responsible for responding to two opcodes from a control cell, Add/Change VC Weight, and Change Pacing Value. When an Add/Change Weight command is received, the VC is compared to the list of registered VCs. If the VC is already registered, then the weight is found in the weight list and updated. If the VC is not registered, it does not have a designated queue for storage, so an empty queue is assigned and a corresponding empty weight is updated with the new VC's weight.

Layout of the control cell
Queue Manager : This is a combination of eight finite state machines that are selected by either an incoming cell with InputQueue or an outgoing cell with NextQueue. Each finite state machine controls a FIFO queue that resides in SDRAM. A base address, or offset is calculated inside the FIFO queue finite state machine so that the different queues do not overrun one another.

The logic for selecting the proper finite state machine for the proper read/write command follows this process:
- If ReadRequest is high, then NextQueue, coming from the Output Scheduler, is demultiplexed from a 3 bit vector to selecting the proper FIFO queue finite state machine for output.
- If ReadRequest is not high, then if WriteRequest is high, the InputQueue is demultiplexed from a 3 bit vector to selecting the proper FIFO queue finite state machine for input.
The BUSY signal informs the Output Scheduler when any of the FIFO queue finite state machines are either reading or writing a cell from memory.

Internal components and interconnects in the Queue Manager
Queue Finite State Machine: This is the finite state machine that interfaces with the SDRAM Controller [4], writing or reading cells if it selected by the Queue Selector or the Output Scheduler. The only thing that differs from one queue to another is the base address of the FIFO queue that it stores in SDRAM.

Bubble diagram for each of the FIFO queue finite state machines
Output Scheduler: The output scheduler determines the next queue to be serviced. To select the next queue, the output scheduler requires the overall system to be in a non-volatile state. This occurs when the any read or write events have ended. To complete a full service round, each priority is considered. Any queues with a registered weight greater than or equal to the current priority of the round are scheduled for service. Under worst-case scenarios, a full service round is completed in O(nw) time where n is the number of queues and w is the maximum priority. In order to reduce the occurrences of the worst case scenario, speed-ahead logic was introduced to skip priority levels higher than that of the heaviest queue weight.
PullCell : This is a state machine which pulls a cell out of the FIFO. Then it waits for a grant from the Cell FIFO and upon receiving the grant it starts getting the data from the Cell FIFO 64 bits at a time and puts it in the temporary storage. It is just the read counterpart of the PullCell state machine described above [5].
SendCell : This is a state machine which sends the cell on the output interface of the module. It reads the cell stored in the temporary registers by the PullCell and sends it on the module output 32 bits at a time. It is the Pull counterpart of the PushCell state machine described above [5].

References

H. Duan, J. W. Lockwood, S. M. Kang, J. D. Will, A High-performance OC-12/OC-48 Queue Design Prototype for Input-buffered ATM Switches, IEEE Infocom '97, Kobe, Japan, April 7-11, 1997, pp 20-28.
Y. Chen, J. S. Turner, Design of a Weighted Fair Queuing Cell Scheduler for ATM Networks, GLOBECOM '98.
S. Keshav, An Engineering Approach to Computer Networking, Addison-Wesley, pp. 238-255, 1997.
Sarang Dharmapurikar and John Lockwood, Synthesizable Design of a Multi-Module Memory Controller, Washington University, Department of Computer Science, Technical Report WUCS-01-26, October, 2001.
CS 535 course materials and notes, J. W. Lockwood, Washington University, Fall 2001.
CS 533 course materials and notes, K. Wong, Washington University, Spring 2000.

Results

Application Results

The following results were generated in simulation using the ModelSim tool. A test file of ATM formatted cells was sent through the module. The input and output order of the cells is noted by VCI number. Per the goals of the project, traffic was divided by the module into separate queues. Each queue was registered with a priority (or weight) to determine the amount of service it would receive relative to other queues.

In the first example, only two VC were introduced to the module. The modules cell pacing rate was sent to allow all cells to be buffered before any cells were transmitted. Cells arriving on VCI 23 were assigned a weight of 2 while cells arriving on VCI 27 were given a weight of 8. Under the weighted round-robin implementation, VCI 27 will receive 8 rounds of service for every two rounds of service VCI 23 receives.

Under first-come-first-service (FCFS)scheduling conditions, the cells are scheduled to complete according to their arrival time. This would cause the higher priority cells to be blocked from service until the earlier arriving, lower priority cells complete transmission. The graphs demonstrate that the module time-slice divides the burst traffic based upon the weight of the queue in ues.

The next diagram shows seven VC transmitting cells to the module. The weights assigned to each VCI in the test file start at three on VCI 23 and increase by one for a weight of 9 on VCI 29.

Similar to the previous example, the bursts of traffic are divided according to the weight of the queue associated to the VCI. Again, FCFS scheduling would have protected the burst traffic by blocking other VC from transmitting and provided no means for priority service for specified VC.

Synthesis and Place-and-Route Results

The project was synthesized for the Xilinx Virtex-E XCV1000E device with a frequency target of 10 MHz. The synthesis indicates that the current design of the module requires 25% of the available slices on the FPGA device. The greatest delay in the module is 25 nanoseconds indicating a 40 MHz frequency of the module, which is within the 10MHz target. Additional development in the queue management component will quickly reduce both the size and delay of the implementation.

Conclusions

In this paper we have described the design, and implementation of a weighted round robin module for FPX. The module was designed for input to the WUGS and provides per-VC weighted round robin scheduling and reduces HOL blocking by using SDRAM based FIFO queues and a WRR scheduler. Dynamic scheduling constraints are managed by the modules support of the NCHARGE framework.

Computer simulation results verify the operation of the WRR scheduler to distribute link utilization to multiple queues based upon and establish priority.