DiSp: An Architecture for supporting

Differentiated Services in the Internet¹

Anshul Kantawala (anshul at arl.wustl.edu)

Samphel Norden (samphel at arl.wustl.edu)

Guru Parulkar (guru at arl.wustl.edu)
Applied Research Laboratory
Washington University in St. Louis
St. Louis Mo. 63130, USA
Ph: 314-935-4855, Fax: 314-935-7302
¹This work was supported in part by NSF grant ANI-9714698 and by Intel.

Abstract

In this paper, we propose DiSp (Differentiated Services over IP), a new framework for supporting differentiated services over the Internet. DiSp is different from the current IETF proposal for DiffServ but still maintains the goals of DiffServ where we move complexity from the internal routers out to the edge routers of DiffServ clouds and Autonomous Systems. DiSp supports three classes of services: real-time, statistical bandwidth and best-effort. The admission control policy for the real-time and statistical flows allows fixed delay bound guarantees to be given to QoS applications. We also discuss how our architecture can easily support important applications such as Virtual Private Networks.

Introduction
Overview of DiSp
Architecture
Admission Control
Congestion Handling
Multicast Support
Applications for Statistical Bandwidth class
Related Work
Conclusions
References

1. Introduction

The current Internet supports only best-effort service irrespective of the characteristics of the application that uses the service. But applications such as IP telephony, video-on-demand, video-conferencing and other real-time applications require end-to-end QoS (Quality of Service) support. Furthermore, different applications require different transmission guarantees. For example, video-on-demand applications can tolerate large delays but require bandwidth guarantees. However, IP telephony is delay intolerant and requires more comprehensive guarantees on bandwidth and delay. Thus, there is a need to support service discrimination by explicit resource allocation and scheduling in the network. Current research on QoS based networks has resulted in the development of the Integrated Services (IntServ) architecture using the RSVP signaling protocol for signaling per-flow requirements to the network. IntServ is used to quantify these QoS requirements using an admission-control-based approach. However, IntServ suffers from scalability, complexity and deployment problems.

These deficiencies have led to the development of the beginnings of an alternative QoS delivery model known as Differentiated Services (DiffServ) [1, 2]. To address the scalability issue, DiffServ aggregates flows into service classes rather than maintaining per flow state. Furthermore, QoS requirements are specified out-of-band removing the necessity for a signaling protocol such as RSVP. Packet classification is based on the setting of bits in the TOS byte of the IP header. Flow aggregation in DiffServ has several beneficial consequences. First, DiffServ routers map a large number of flows to a small number of per-hop behaviors. Thus, instead of every router having to manage individual flows, only the edge routers need to be concerned with QoS. Second, aggregation facilitates the construction of ``end-to-end'' services by linking multiple autonomous domains together using simplified service agreements at the boundaries of the domains.

DiffServ is still in its infancy and has not yet matured into a service framework that can satisfy the diverse application requirements. There are many issues to be resolved: 1) precise service class definitions, 2) admission control policies, 3) strategies for policing and shaping of aggregate flows and 4) congestion handling mechanisms. Our architectural framework attempts to tackle some of these issues and supports two fundamental service enhancements: 1) receiver subscriptions, and 2) statistical bandwidth guarantees.

We describe a new architecture called DiSp (Differentiated Services over IP), that builds on the basic DiffServ idea of flow aggregation to provide user-controlled traffic services. DiSp has four key features: 1) it has three service classes: real-time (RT), statistical-bandwidth (SB) and best-effort (BE), with detailed profile specifications; 2) it has mechanisms for policing and shaping aggregate flows; 3) although real-time flows are treated in an aggregate manner, DiSp provides service guarantees on a per-flow basis; (We note that, in keeping with the DiffServ ideals, DiSp does not maintain per-flow state information in ANY router and uses simple priority scheduling mechanisms among the three classes.) 4) DiSp uses efficient monitoring mechanisms that can provide accurate feedback for congestion control and overall network management. The use of our proposed signaling protocol facilitates third party negotiations which are essential for network configuration, management and provisioning.

Our goal is to define and support a model that allows the seamless integration of our proposed DiffServ architecture with IntServ, since both these models are complementary. We highlight the effectiveness of our approach by considering a challenging task of resource allocation for real-time applications using Virtual Private Networks (VPN).

The rest of the paper will be organized as follows. In Section 2, we will present and motivate our proposed approach. Section 3 deals with details of our proposed architecture, followed by the details of the admission control algorithm in Section 4. We discuss our congestion control policies in Section 5 and support for individual real-time multicast flows in Section 6. We then describe issues regarding resource allocation for the statistical bandwidth class and a target application, VPN, in Section 7. We finally present related work and conclude the paper in Sections 8 and 9 respectively.

2. Overview of DiSp

DiSp supports two fundamental service enhancements: 1) receiver subscriptions, and 2) statistical bandwidth guarantees. With receiver subscriptions, each receiver (or set of receivers) negotiates for a fixed delay bound on a flow originating from a remote source. Unlike other DiffServ proposals [1, 4] which are based on source reservations, our model emphasizes support for applications (e.g. Video on Demand (VOD), stock quotes) where receiver reservations are more appropriate. In these real-time applications, the sender should not be forced to reserve and pay for resources when there are no receivers present. Other motivations for adding receiver-based reservation control can be found in [5].

With statistical bandwidth guarantees, each AS can negotiate an aggregate bandwidth profile for its high bandwidth flows. One of the hard problems for supporting such a service class is admission control and resource reservation for the aggregate flows without prior knowledge of the routes taken by individual flows within the aggregate. We envisage such a service class to be useful for providing QoS support for VPNs. DiSp supports three service classes:

The Real-Time (RT) class provides support for flows requiring fixed delay bounds. Each edge router polices and smooths out real-time flows using a modified token bucket with queue. Real-time flows are admitted on a per-flow basis. DiSp uses QoS routing and admission control to insure that RT flows do not encounter congestion (unless there is a hardware failure). DiSp's signaling protocol notifies receivers when there is a need for renegotiation or re-admission. Renegotiation can occur in two situations:

When a flow cannot be admitted, and the application can tolerate a lower QoS.
When a flow's delay bound cannot be met (because of congestion due to a hardware failure).

The Statistical Bandwidth (SB) class provides support for delay-tolerant applications requiring a minimum bandwidth reservation. Statistical bandwidth flows are paced out at the aggregate bandwidth. A fixed size input queue controls the maximum allowable burst.

The Best-Effort (BE) class provides best-effort service similar to the current Internet. BE TCP flows are monitored on a per-link basis and an explicit congestion notification message is used to notify the source of congestion. This message includes a ``current congestion window size'' parameter which is calculated by the link monitor.

Flow scheduling is performed in strict priority order: 1) RT (highest), 2) SB, 3) BE. Thus, routers (edge and internal) need only maintain three output queues per link and do not need complex fair queueing algorithms.

3. Architecture

Our architecture is shown in Figure 1 and consists of:

Edge routers are responsible for policing and shaping flows in different QoS classes and setting the TOS bytes accordingly.
Internal routers are responsible for scheduling packets in strict priority among the service classes. Internal routers also monitor best-effort traffic and generate explicit congestion notification messages during congestion.
The Network Operations Center (NOC) is responsible for admission control and renegotiation of service levels.

The connecting Autonomous Systems (ASs) are responsible for marking packets according to their respective classes. If an edge-router encounters an unmarked packet, it is treated as part of a BE flow. Thus, DiSp provides backward compatibility for legacy IP networks. Since DiSp handles policing and shaping of flows in an aggregated manner, we rely on the connecting ASs to provide flow isolation from misbehaving flows within the aggregate. For example, an AS could be viewed an as IntServ cloud providing per flow QoS internally to all its flows while negotiating aggregate profiles for flows transiting through the DiffServ cloud.

Signaling Protocol in DiSp : SPiD

A crucial component of the DiSp architecture is SPiD (pronounced ``speed''), the signaling protocol employed by DiSp. The current IETF proposal for DiffServ does not incorporate a signaling protocol. This decision was based on scalability concerns. The RSVP protocol has a high overhead in its functionality (two phase approach), where per-flow state information is set up in each router when performing resource reservation. There have been proposals for modifying RSVP for use with aggregated flows [7]. However, aggregation introduces a host of issues (e.g. maintaining per-flow guarantees, isolating flows) which would add to the complexity of RSVP. Also, with regard to multicast, RSVP suffers from problems of handling QoS reservations with heterogeneous reservation styles. SPiD is a lightweight, efficient signaling protocol with the following key features:

Admission control between NOC and user. Renegotiation if the flow experiences congestion.
Interaction between NOC and edge routers for setting up Service Level Agreements (SLA).
Congestion notification messages for different traffic classes.
Cooperation with the source and/or receiver to allow a of traffic shaping mechanism to be used at the edge router for RT flows.

SPiD Control Messages

We envisage the following types of control messages that will be used by SPiD.

Congestion notification to BE TCP flows with window size
Negotiation/renegotiation of SLA
Notification of congestion for RT flows to edge routers
Management related event notification to NOC
Monitoring instructions from NOC to routers
Sender and receiver oriented traffic management facilities

Thus, while SPiD is lightweight , it offers an enhanced set of features to support both hard and soft bandwidth guarantees in DiffServ. In addition, it provides support for network management which is another key component of our DiSp architecture.

Control Traffic

SPiD has several control messages which must receive transmission guarantees to prevent performance degradation. DiSp uses a separate minimum spanning tree control network with statically reserved bandwidth to avoid delays.

Profile specification

Each AS can specify a profile for each RT flow and each SB flow aggregate. Note that RT flows are delay sensitive whereas SB flows are bandwidth sensitive. An RT profile specifies a delay bound for a particular flow through three parameters:

d_max	Maximum tolerable delay between packets
R_RT	Minimum bandwidth
P_max	Maximum packet size
P_loss	Acceptable packet loss probability during severe congestion

An RT profile is specified for each flow and stored in the ingress router of an ISP.

Each AS specifies a single aggregate profile for its SB flow to an ISP. This profile is stored in the ingress router of the ISP receiving the SB flow. An SB profile specifies the minimum bandwidth guarantee for a flow aggregate (not an individual flow) through two parameters:

R_SB Minimum bandwidth

B Maximum burst size

State information storage

Each ingress router of an ISP stores profiles for each real-time flow and each high bandwidth flow aggregate for each connected AS. Although policing of real-time flows from a particular AS will be done in an aggregated manner (as described in the Edge Router Internals section), the edge-router has to adjust the policer according to the individual flow's acceptable loss rate and selectively drop packets in times of severe congestion when the guarantees cannot be met.

The NOC maintains a centralized database that is used for admission control. The database maintains information about the reserved bandwidth, delay and a list of real-time flows with their associated ingress edge-router for each link of each router within the DiSp network. The parameters stored for each link i, with capacity C and n RT flows include:

C_RT Fraction of C reserved for RT flows

C_SB Fraction of C reserved for SB flows

S_iPⁱ_max Sum of max. packet lengths of all RT flows on the link

{A₁, A₂, ... , A_n} List of RT flows associations A_i, where A_i = < src_i,dst_i,router_i >

Each RT flow association A_iidentifies the flow (src_i,dst_i) and the ingress router (router_i) hosting the flow.

Each internal router also stores a running count of number of active best-effort flows on each link. This information is used by the signaling protocol to provide explicit congestion window size feedback to the best-effort sources in times of congestion.

Edge-router internals

The main function of an edge-router is policing and shaping of real-time and high bandwidth flows (Figure 2). For each input link in the edge-router, a modified token bucket scheme is used to police real-time flows. High bandwidth flows are policed using a fixed size queue and a pacer. Each output link has three output queues, one for each service class, which are served in strict priority order: 1)RT, 2) SB, 3) BE.

Real-time flow policer DiSp uses a token bucket with parameters r and b (Figure 3):

r is a set of timers in which the i^th timer expires every T_i seconds, where T_i is dⁱ_max/(hop count) for flow i. Also a queue of size b corrects any jitter that real-time packets may have suffered in the AS network. Packets arriving from real-time flows are queued in FIFO order if there is no token to service them instantly. Incoming packets are dropped if the queue is full, thus policing RT flows with respect to their reservations. When a packet is dispatched on link i, a token of size Pⁱ_max,the largest packet in flow i, is removed from the token bucket.

For example, a token of size Pⁱ_maxwill be generated every dⁱ_max/(hop count) seconds for a real-time flow i. The token supply is reduced by Pⁱ_maxevery time a packet is sent out. Packets are only sent if the token supply is at least Pⁱ_max. This scheme allows the router to police and pace the real-time flows as an aggregate bundle instead of being forced to use a separate token bucket and queue for each real-time flow, thus reducing the overhead at edge-routers. If all flows conform to their reservations, all RT delay guarantees will be met. The only drawback of this scheme is that it cannot isolate non-conforming RT flows, a responsibility of the AS egress routers.

Dynamic, flexible shaping of real-time flows As part of an SLA, an AS can negotiate a set of shaping policies for RT flows. These policies are stored in a table at the edge-router which is indexed by some bits in the TOS byte. For non-RT flows, these bits are ignored. For enhanced flexibility, renegotiation mechanisms dynamically update shaping policies for RT flows.

Statistical bandwidth flow policer/shaper High bandwidth flows are policed using a queue of size equal to B, the maximum allowable burst size. Packets arriving when the queue is full are dropped, thus preventing high bandwidth flows from exceeding their reservations during congestion. To smooth out the burstiness in high bandwidth flows, packets are paced out at rate R, the aggregate bandwidth for all high bandwidth flows. For example, consider a reservation of R = 5Mb/s and B = 10MB for an input link with a packet of size 16Kb at the head of the queue. After servicing the packet, the pacer sets a timer to expire at t = t_current + 3.2 ms (3.2 ms = 16Kb/5Mb). The next packet is serviced only after this timer expires thus making sure that the traffic does not exceed the total allocated bandwidth for that service class.

4. Admission Control

The DiSp admission control algorithm insures that delay and bandwidth guarantees can be met for all accepted connections. The admission control procedure is almost the same for both real-time and statistical bandwidth flows. The difference is that delay and jitter for statistical bandwidth flowsare not checked. When a new connection request is made to an edge router, there are two tests that are performed for RT flows. First, DiSp checks for sufficient bandwidth along the route selected for the RT flow. Second, it checks that the sum of the end-to-end delay and jitter bounds exceed the worst case delay experienced by the packet at each hop along the route. Once a flow satisfies these two checks, all flow associations < src, dst, ingress-router > on the QoS route are added to the NOC database. The parameter that maintains the overall bandwidth of the service class is updated. Finally, the maximum packet size for the flow is computed and updated if larger.

5. Congestion Handling

Real-time and statistical bandwidth flows will not experience any congestion during normal operation, but may experience some congestion when a link goes down and the flows are re-routed. For this particular scenario, we re-route the real-time and statistical bandwidth flows to the next hop router using an alternate path. For example, consider link i in router R_j which connects R_j to R_k goes down. The NOC tries to find an alternate path from R_j to R_k that can accommodate the bandwidth and delay requirements of all flows on link i. If such a route cannot be found, the NOC will signal the originating ASs of the respective flows indicating a need for renegotiation or readmission of the affected flows.

Best-effort flows can encounter congestion even during normal operation because of the aggressive windowing strategy employed by TCP. DiSp monitors active best-effort flows and provides explicit window size parameter feedback to flows going through a congested link. Using the Smart Port Card (SPC) card with an embedded ATM Port Interconnect Chip (APIC) [8], DiSp can snoop all BE flows on a link at gigabit rates and monitor the currently available bandwidth for BE flows. Each router stores the number of active best-effort flows and the source address of each flow for each link. If the link experiences congestion, the router sends feedback messages to the host indicating a smaller TCP congestion window size (number of active flows/currently available bandwidth). We plan to add this enhancement to the current TCP protocol to be able to handle such feedback messages and adjust the congestion window size accordingly. This scheme is an enhancement of the ECN (Explicit Congestion Notification) mechanism proposed for TCP [6]. We will also experiment with varying the holding times of the new congestion window size before allowing TCP to resume it's normal congestion window control algorithm.

6. Multicast Support

Since IETF's DiffServ deals with aggregate flows, resource provisioning for individual multicast flows is not supported. Using the receiver based subscription enhancement, DiSp provides baseline support for multicast RT flows. Consider a multicast group M with source S₁and receivers D₁and D₂, as shown in Figure 4. When a new receiver D₃ wants to join M, only resources from R₄ to D₃ are reserved, since there already exists a virtual path from S₁ to D₁. If the QoS requirements for D₃ are different from the other receivers, the flow S₁ to D₃ will be considered as a new RT flow and go through the admission control and resource reservation process. Non-RT multicast flows will be treated as BE flows. Providing multicast support for aggregated SB flows is an open issue.

7. Applications for Statistical Bandwidth Class

There are two major issues concerning the statistical bandwidth class:

What application(s) can make use of this class?

How do we provision resources for this class?

One application that helps point towards a feasible solution to the above issues is providing VPNs for corporations. Since a VPN is a fairly static network connecting remote sites of a company, DiSp can use this information to reserve resources along the paths connecting the sites. Currently, VPNs only provide a secure network, but no QoS guarantees. Using the statistical bandwidth class, DiSp can provide a VPN with some minimum bandwidth guarantee between the remote sites. The issues that need to be addressed for such an application are:

Provisioning bandwidth in the VPN

Efficient use of network resources

Over-booking Use an over-booking ratio while admitting new flows. Under this scheme, new flows would specify peak bandwidth values but would receive an average bandwidth equal to the (peak rate/overbooking ratio).
Empirical heuristic Measure effects of new flows (which have a random set of destinations) on the link utilizations of the DiffServ cloud.

8. Related Work

Current research in DiffServ has resulted in a number of IETF drafts that attempt to tackle the issues of defining service classes, per-hop behaviours, integration of DiffServ with IntServ and so on. In this section we discuss how our work complements and builds on the existing research in this area and highlight some of the differences between our proposed scheme and the traditional view of DiffServ.

Differences from traditional DiffServ proposals

IETF DiffServ was essentially developed to prevent any complex signaling and allow out-of-band negotiation. Thus the use of the ToS byte information implicitly decided the kind of service a flow would receive at a router. However, there is a need to use a signaling protocol that can perform the negotiation between the user and ISP, allow the service profiles to be disseminated to the various edge routers for indicating admitted flows, allow users to renegotiate the profile (subscriptions), and perform congestion notification. Our proposed signaling protocol SPiD performs the above functions in an efficient manner. We have also utilized an admission policy based approach that can guarantee the services demanded by the various flows.

DiffServ also does not clearly define the types of service classes that are provided. While some of the proposals [4] discuss Premium and Assured service classes, no characterizing parameters are defined apart from bandwidth. We have proposed the use of three service classes (Real-time, Statistical-Bandwidth and Best-effort) and defined their QoS parameters to provide greater flexibility to the user in terms of being able to specify delay and delay jitter, in addition to the standard bandwidth parameter. We also propose the use of a separate control network for which a static spanning tree route is maintained. All control messages are thus guaranteed a minimum bandwidth and do not suffer from problems of insufficient bandwidth due to admission of higher class flows like real-time flows. Related to this, we also utilize QoS routes for choosing the best possible route. By restricting ourselves to a simple three queue approach, we are removing much of the complexity that is involved in using some compute-intensive Weighted Fair Queueing mechanism at the edge router for scheduling the flows.

Although edge routers in DiSp treat flows from different service classes as aggregates (which is similar to traditional DiffServ), we enforce specification and admission of individual real-time flows rather than flow aggregations. In general, aggregation of diverse real-time flow specifications is meaningless. However, we police and shape real-time flows as an aggregate. Also, DiSp's admission control is explicit whereas IETF DiffServs' is implicit (see Section 3).

Multicast flows are treated by DiSp in the same way as any other flow. Thus, DiSp does not require any separate mechanism to handle multicast flows. A new receiver joining a multicast group is handled similar to any other new flow. With heterogeneous multicast traffic, diverse profiles (based on receiver requirements) can be easily supported.

9. Conclusions

We have proposed a new framework for supporting differentiated services over the Internet. This approach is different from the current IETF proposal for DiffServ while still maintaining the goals of DiffServ. Our architectural framework allows distribution of complexity to the edge routers as well as the AS routers. We have proposed services to support real-time and statistical service classes apart from the usual best-effort class. The admission control policy for the real-time and statistical flows allows hard guarantees to be given to QoS applications.

References

1. D. Clark and J. Wroclawski. "An approach to service allocation in the internet", Internet Draft, July 1997.

2. S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. "An architecture for differentiated services", Internet Draft, August 1998.

3. G. Parulkar, D. Schmidt, E. Kraemer, J. Turner, and A. Kantawala. "An architecture for monitoring, visualization and control of gigabit networks", IEEE Network, 11(5):34-43, October 1997.

4. K. Nichols, V. Jacobsen, and L. Zhang. "A two-bit differentiated services architecture for the internet", Internet Draft, November 1997.

5. B. Ohlman. "Receiver control in differentiated services", Internet Draft, March 1998.

6. S. Floyd. "TCP and Explicit congestion notification", ACM Computer Communications Review, 24(5):10-23, October 1994.

7. R.Guerin, S. Blake, and S. Herzog. "Aggregating RSVP based QoS requests", Internet Draft, November 1997.

8. Z. Dittia, G. Parulkar, and J. R. Cox. "The APIC Approach to High Performance Network Interface Design: Protected DMA and Other Techniques," IEEE INFOCOM 97, Kobe, Japan, 1997.

C_RT	Fraction of C reserved for RT flows
C_SB	Fraction of C reserved for SB flows
S_iPⁱ_max	Sum of max. packet lengths of all RT flows on the link
{A₁, A₂, ... , A_n}	List of RT flows associations A_i, where A_i = < src_i,dst_i,router_i >