Weighted Round-Robin Scheduling Module
- Lou R. Rohan
- Douglas R. Taube
- CS/COE 535
- Fall 2001
- Washington University in Saint Louis
Introduction
This paper presents the design and implementation of a weighted round robin module
for intended implementation on a Field-Programmable-Port-Extender (FPX) attached to
a Washington University Gigabit Switch (WUGS). The queuing strategy
used in this design is based upon input queuing (IQ). IQ structure has been
demonstrated to require the least memory bandwidth for buffering ATM cells [1].
Two types of queuing implementations are active in the module. The queues are
presented in First-In-First-Out (FIFO) modules to preserve cell order as they are
passed through the module. Also, incoming cells are queued on a per-virtual-circuit
(per-VC) basis. Queuing in this manner enables bandwidth allocation [1]. Secondly,
the queues are assigned a weight to allow for priority scheduling. This method
improves Quality of Service (QoS) enforcement and reduces head-of-line
(HOL) blocking issues [1].
Background
Applications can be classified into two basic service groups, best-effort service and guaranteed
service. A significant deployement of ATM network technology provides the basis of managed and
guaranteed service but Internet traffic patterns do not adhere to the standard. Under most Internet
traffic patterns, resources are not explicitly reserved, end systems do not pace their transmissions,
and most network equipement does not enforce resource usage limits [2][3].
Description of the Project
This project implements a dynamic queuing scheme that intends to provide resource sharing, improve
QoS enforcement and reduce HOL blocking. The queuing scheme is designed to register VC upon call
establishment by means of a control cell. The VC registration follows in the implementation
framework of the NCHARGE control software
for FPX.
The module is designed for the FPX to provide IQ traffic per-VC. Each VC can have an assigned
priority level (i.e. weight) and a weighted round-robin (WRR) scheduler is implemented to reduce HOL
blocking.
Components from the course machine problem 5 were reused for this project. By reusing
the SDRAM interface, the cell pacing component, and the push/pull cell components, the overall
development of the project managed to the new modules introduced for WRR.
Major Components
-
Block Diagram
Top Level Block Diagram of the Fair Queueing Module
Module Assignments
- Lou Rohan - Queue Selector, Queue Manager
- Douglas Taube - Output Scheduler
Details of Implementation
- GetCell : This is a state machine. Upon receiving a SOC_in, it gets
the complete cell in 32 bits at a time and stores it in temporary registers
which are 64 bit wide. Thus it receives the 14 words cell on 32 bit interface
and formats it into a 7 word cell with 64 bit words [5].
- PushCell: This is a state machine. Once an input cell is completely
assembled into the temporary registers, it pushes the cell into the CellFIFO,
64 bits at a time. First this state machine makes a request to the CellFIFO
module to push the cell and waits for the grant. The request is kept high
until a grant is received.Once the grant is received, the Push state machine
pushes the cell into the CellFIFO [5].
- Queue Selector : This consists of three main components: a finite
state machine that detects control cells on VC 35, a process that selects the
input queue for the Queue Manager and registers new VCs, and a process that
updates the priority list when control cells indicate a priority registration
or a priority change.
Internal components and interconnects of the Queue Selector
The Queue Selector is responsible for responding to two opcodes from a control
cell, Add/Change VC Weight, and Change Pacing Value. When an Add/Change
Weight command is received, the VC is compared to the list of registered VCs.
If the VC is already registered, then the weight is found in the weight
list and updated. If the VC is not registered, it does not have a designated
queue for storage, so an empty queue is assigned and a corresponding
empty weight is updated with the new VC's weight.
Layout of the control cell
- Queue Manager : This is a combination of eight finite state
machines that are selected by either an incoming cell with
InputQueue or an outgoing cell with NextQueue. Each finite state
machine controls a FIFO queue that resides in SDRAM. A base address, or
offset is calculated inside the FIFO queue finite state machine so that the
different queues do not overrun one another.
The logic for selecting the proper finite state machine for the
proper read/write command follows this process:
- If ReadRequest is high,
then NextQueue, coming from the Output Scheduler, is demultiplexed from
a 3 bit vector to selecting the proper FIFO queue finite state machine for output.
- If ReadRequest is not high, then if WriteRequest is high, the InputQueue is
demultiplexed from a 3 bit vector to selecting the proper FIFO queue finite
state machine for input.
The BUSY signal informs the Output Scheduler when any of the FIFO queue
finite state machines are either reading or writing a cell from memory.
Internal components and interconnects in the Queue Manager
- Queue Finite State Machine: This is the finite state machine that interfaces
with the SDRAM Controller [4], writing or reading cells if it selected by the Queue
Selector or the Output Scheduler. The only thing that differs from one
queue to another is the base address of the FIFO queue that it stores in
SDRAM.
Bubble diagram for each of the FIFO queue finite state machines
- Output Scheduler: The output scheduler determines the next queue
to be serviced. To select the next queue, the output scheduler requires the
overall system to be in a non-volatile state. This occurs when the any read
or write events have ended. To complete a full service round, each priority
is considered. Any queues with a registered weight greater than or equal to
the current priority of the round are scheduled for service. Under worst-case
scenarios, a full service round is completed in O(nw) time where n is the
number of queues and w is the maximum priority. In order to reduce the
occurrences of the worst case scenario, speed-ahead logic was introduced to
skip priority levels higher than that of the heaviest queue weight.
- PullCell : This is a state machine which pulls a cell out of the
FIFO. Then it waits for a grant from the Cell FIFO and upon receiving the
grant it starts getting the data from the Cell FIFO 64 bits at a time and puts
it in the temporary storage. It is just the read counterpart of the PullCell
state machine described above [5].
- SendCell : This is a state machine which sends the cell on the
output interface of the module. It reads the cell stored in the temporary
registers by the PullCell and sends it on the module output 32 bits at a time.
It is the Pull counterpart of the PushCell state machine described above [5].
References
- H. Duan, J. W. Lockwood, S. M. Kang, J. D. Will,
A High-performance OC-12/OC-48 Queue Design Prototype for
Input-buffered ATM Switches, IEEE Infocom '97,
Kobe, Japan, April 7-11, 1997, pp 20-28.
- Y. Chen, J. S. Turner, Design of a Weighted Fair Queuing Cell Scheduler for ATM Networks, GLOBECOM '98.
- S. Keshav, An Engineering Approach to Computer Networking, Addison-Wesley, pp. 238-255, 1997.
- Sarang Dharmapurikar and John Lockwood,
Synthesizable Design of a Multi-Module Memory Controller,
Washington University, Department of Computer Science, Technical Report WUCS-01-26, October, 2001.
- CS 535 course materials and notes, J. W. Lockwood, Washington University, Fall 2001.
- CS 533 course materials and notes, K. Wong, Washington University, Spring 2000.
Results
Application Results
The following results were generated in simulation using the ModelSim tool. A test file of
ATM formatted cells was sent through the module. The input and output order of the cells is
noted by VCI number. Per the goals of the project, traffic was divided by the module into
separate queues. Each queue was registered with a priority (or weight) to determine the amount
of service it would receive relative to other queues.
In the first example, only two VC were introduced to the module. The
modules cell pacing rate was sent to allow all cells to be buffered before
any cells were transmitted. Cells arriving on VCI 23 were assigned a weight
of 2 while cells arriving on VCI 27 were given a weight of 8. Under the
weighted round-robin implementation, VCI 27 will receive 8 rounds of service
for every two rounds of service VCI 23 receives.
Under first-come-first-service (FCFS)scheduling conditions, the
cells are scheduled to complete according to their arrival time. This would
cause the higher priority cells to be blocked from service until the earlier
arriving, lower priority cells complete transmission. The graphs demonstrate
that the module time-slice divides the burst traffic based upon the weight of
the queue in ues.
The next diagram shows seven VC transmitting cells to the module. The weights
assigned to each VCI in the test file start at three on VCI 23 and increase by
one for a weight of 9 on VCI 29.
Similar to the previous example, the bursts
of traffic are divided according to the weight of the queue associated to the
VCI. Again, FCFS scheduling would have protected the burst traffic by blocking
other VC from transmitting and provided no means for priority service for specified
VC.
Synthesis and Place-and-Route Results
The project was synthesized for the Xilinx Virtex-E XCV1000E device with a
frequency target of 10 MHz. The synthesis indicates that the current design of
the module requires 25% of the available slices on the FPGA device. The greatest
delay in the module is 25 nanoseconds indicating a 40 MHz frequency of the module, which
is within the 10MHz target. Additional development in the queue management component
will quickly reduce both the size and delay of the implementation.
Conclusions
In this paper we have described the design, and implementation of a weighted round robin
module for FPX. The module was designed for input to the WUGS and provides per-VC
weighted round robin scheduling and reduces HOL blocking by using SDRAM based FIFO
queues and a WRR scheduler. Dynamic scheduling constraints are managed by the modules
support of the NCHARGE framework.
Computer simulation results verify the operation of the WRR scheduler to distribute
link utilization to multiple queues based upon and establish priority.