Solution Overview

In a multicast environment, loss of a packet in the network results in failure to deliver a copy of the packet to all receivers located in the subtree rooted at the branch sprigging from the point of loss. A natural solution to recover the packet is shown in Fig. 2. Two steps are involved:

Step 1: A single request escapes the loss subtree, sent from the closest requestor below the loss to the closest replier above the loss.
Step 2: The replier sends a single reply, which is multicast to the loss subtree.

Note that both steps require some knowledge of topology. We claim that this solution is optimal because, (a) only one request escapes the loss subtree and only one reply is generated, (b) replies are visible only within the affected branch, and (c) requests and replies traverse the shortest possible distance. We attempt to duplicate these steps in our scheme. Since topology information is required, we turn to the network for help. Routers, with their knowledge of topology, are the ideal candidates to provide services to navigate a request upstream until a willing replier is found and multicast replies to the subtree that experienced loss.

In order to provide the above services, we introduce three concepts, depicted in Fig. 3. First, we use routing to calculate a leader, called a replier, for each subtree. We use the replier to respond to requests made by other endnodes in the repliersub tree, thus creating a hierarchy. Second, we use routing to define a turning point. A turning point is the point in the topology at which a request moving upstream is moved downward towards the replier. When a replier sends a retransmission, the retransmission moves up the multicast tree until it reaches the turning point. The router at the turning point performs a directed multicast to multicast the retransmission to the subtree defined by the turning point. We use these three concepts (electing repliers, defining turning points, and using directed multicast for retransmissions) to closely approximate the optimal recovery scenario depicted in Fig. 2.

It is important to see why these three ideas help our scheme do well with respect to all five challenges described in the introduction. The use of a replier hierarchy prevents request implosion (as in tree-based schemes). Duplicate replies are eliminated by allowing only one replier to respond. Our scheme has near-optimal recovery latency, which is typically much smaller than the delay to unicast a retransmission from the source, because back-off delays are not needed and replies come from nearby repliers. The use of turning points and directed multicast help isolate recovery to the subtree affected by the fault. Finally, membership and topology changes are easily adapted to by routers as group membership changes by calculating new multicast trees and repliers for each subtree, if needed, without endnode involvement.

Research Plan Introduction Infocom Paper Presentation People ARL