\( \newcommand{\ListSet}{\textit{list set}} \newcommand{\Forest}{\textit{forest}} \newcommand{\BinaryForest}{\textit{binary forest}} \newcommand{\BalancedForest}{\textit{balanced forest}} \newcommand{\SplayForest}{\textit{splay forest}} \newcommand{\p}{\textit{p}} \newcommand{\firstChild}{\textit{firstChild}} \newcommand{\nextSibling}{\textit{nextSibling}} \newcommand{\root}{\textit{root}} \newcommand{\first}{\textit{first}} \newcommand{\next}{\textit{next}} \newcommand{\firstLeaf}{\textit{firstLeaf}\,} \newcommand{\nextLeaf}{\textit{nextLeaf}\,} \newcommand{\cut}{\textit{cut}} \newcommand{\link}{\textit{link}} \newcommand{\equals}{\textit{equals}} \newcommand{\setEquals}{\textit{setEquals}} \newcommand{\listEquals}{\textit{listEquals}} \newcommand{\combineGroves}{\textit{combineGroves}} \newcommand{\remove}{\textit{remove}} \newcommand{\left}{\textit{left}} \newcommand{\right}{\textit{right}} \newcommand{\insertAfter}{\textit{insertAfter}} \newcommand{\delete}{\textit{delete}} \newcommand{\search}{\textit{search}} \newcommand{\rotate}{\textit{rotate}} \newcommand{\sibling}{\textit{sibling}} \newcommand{\side}{\textit{side}} \newcommand{\join}{\textit{join}} \newcommand{\split}{\textit{split}} \newcommand{\insertByKey}{\textit{insertByKey}} \newcommand{\key}{\textit{key}} \newcommand{\refresh}{\textit{refresh}} \newcommand{\property}{\textit{property}} \newcommand{\rank}{\textit{rank}} \newcommand{\aunt}{\textit{aunt}} \newcommand{\toString}{\textit{toString}} \newcommand{\fromString}{\textit{fromString}} \)

Trees and Forests^©

An undirected tree is simply an undirected graph that is connected and has no cycles. A forest is an undirected graph in which each connected component is a tree. A vertex in a tree with just one edge is called a leaf. In a directed tree, edges are directed outward from a designated root vertex. If a directed tree contains an edge $(u,v)$, then $u$ is referred to as the parent of $v$. The subtree defined by a vertex $u$ and its descendants is referred to as $u$'s subtree. While trees can be represented using the graph data structures already defined, it is often convenient to have a data structure that is designed for the special case of trees and forests.

Forest

The $\Forest$ class defines a collection of directed trees that partition a common vertex set. Each vertex has a defined parent and a first child. Vertices that share a common parent are called siblings and are organized into sibling lists. Each tree is identified by its root. The class includes the methods listed below.

$\p(u)$ returns the parent of vertex $u$.
$\firstChild(u)$ returns the first child of $u$.
$\nextSibling(u)$ returns the sibling of $u$ that comes next in the sibling list containing $u$.
$\root(u)$ returns the root of the tree containing $u$.
$\first(u)$ returns the first vertex in the prefix ordering of the vertices within $u$'s subtree.
$\next(u,t)$ returns the next vertex after $u$ in the prefix ordering of the vertices within $t$'s subtree where $t$ is an ancestor of $u$ (or the entire tree if $t$ is omitted). In a prefix ordering, each node appears immediately before its descendants, and the subtrees of siblings appear consecutively, in the order of their shared sibling list.
$\firstLeaf(u)$ returns the first leaf in the prefix ordering of the vertices within $u$'s subtree.
$\nextLeaf(u,t)$ returns the next leaf after $u$ in the prefix ordering of the vertices within $t$'s subtree (or the entire tree if $t$ is omitted).
$\cut(u)$ cuts the link between $u$ and its parent, making $u$ the root of a separate tree.
$\link(t,v)$ links the tree $t$ to the tree containing $v$, making $v$ the parent of $t$; $t$ is added at the end of the sibling list that contains $v$'s children.
$\setEquals(other)$ is similar to $\equals$ but considers two $\Forest$ objects equal if the trees define the same vertex sets.
$\listEquals(other)$ is similar to $\equals$ but considers two $\Forest$ objects equal if the trees define the same vertex sets and the order of the vertices within two corresponding trees are identical.
$\combineGroves(g_1,g_2)$ combines two groups of trees, referred to as groves, into a single grove.
$\remove(t,g)$ removes tree $t$ from the grove $g$ containing it.

The $\first$ and $\next$ methods can be used to iterate through all vertices in the tree. The time required to visit all vertices is $O(n)$ although individual steps may take more than constant time. The $\firstLeaf$ and $\nextLeaf$ methods can be used to iterate through the leaf vertices in the tree. The last two methods allow subsets of trees to be grouped together. Trees in the same grove are considered siblings of one another and each grove is identified by the first tree in its sibling list.

An abridged Javascript implementation is shown below.

export default class Forest extends Top { Sibs; // lists of siblings (and tree roots) P; // P[u] is parent of u or 0 C; // C[u] is first child of u or 0 p(u) { return this.P[u]; } firstChild(u) { return this.C[u]; } lastChild(u) { return this.Sibs.last(this.C[u]); } nextSibling(u) { return this.Sibs.next(u); } prevSibling(u) { return this.Sibs.prev(u); } root(u) { while (this.P[u]) u = this.P[u]; return u; } first(t) { return t; } next(u,t=0) { if (this.firstChild(u)) return this.firstChild(u); do { if (this.nextSibling(u)) return this.nextSibling(u); u = this.P[u]; } while (u != t); return 0; } firstLeaf(u) { while (this.firstChild(u)) { u = this.firstChild(u); } return u; } nextLeaf(u,t=0) { while (u != t) { if (this.nextSibling(u)) return this.firstLeaf(this.nextSibling(u)); u = this.P[u]; } return 0; } lastLeaf(u) { while (this.lastChild(u)) { u = this.lastChild(u); } return u; } cut(u) { if (this.P[u] == 0) return; let p = this.P[u]; let firstSib = this.firstChild(p); this.C[p] = this.Sibs.delete(u, firstSib); this.P[u] = 0; } link(t, v) { if (t > this.n || v > this.n) { this.expand(Math.max(t, v)); } this.C[v] = this.Sibs.join(this.firstChild(v), t); this.P[t] = v; } combineGroves(t1,t2) { return this.Sibs.join(t1,t2); } remove(t,g) { return this.Sibs.delete(t,g); } }

To demonstrate the data structure in the web app, start with the following script.

let f = new Forest();
f.fromString('{a(b(c d) e(f(g)) h) i(j k(l m) n)}');
f.cut(5); f.combineGroves(1,9);
log(f.toString());

The $\fromString$ operation specifies two trees with roots $a$ and $i$. In the string argument, each node is shown with its descendants immediately following it, enclosed by paretheses. So, initially $a$ has children $b$, $e$ and $h$, with the first two each defining subtrees with two additional descendants. The $\cut$ operation removes $e$'s subtree from $a$ and the $\combineGroves$ operation pairs $i$'s tree with the remainder of $a$'s tree to form the grove shown below.

{[a(b(c d) h) i(j k(l m) n)] e(f(g))}

Adding

f.cut(11) f.link(11,6);

produces

{[a(b(c d) h) i(j n)] e(f(g k(l m)))}

Adding an argument of 0 to the $\toString$ method causes it to omit the tree structure, showing each grove or stand-alone tree as just a list of vertices in prefix order.

{[a b c d h i j n] [e f g k l m]}

Binary Forest

A binary tree is a tree in which each vertex has at most two children, and the children are explicitly identified as either the left child or the right child. Binary trees (and forests) are often used to implement search trees and related data structures. In these applications, the left-to-right ordering of the vertices (more precisely, the infix ordering) plays an important role. The $\BinaryForest$ class includes the methods listed below. More details follow the list.

$\p(u)$ returns the parent of vertex $u$.
$\left(u)$ returns the left child of vertex $u$.
$\right(u)$ returns the right child of vertex $u$.
$\sibling(u)$ returns the sibling of vertex $u$. Several other convenience methods are provided for accessing nearby vertices; these are omitted for brevity.
$\first(u)$ returns the first vertex in $u$'s subtree.
$\next(u,t)$ returns the next vertex following $u$ in the infix ordering within $t$'s subtree (or the entire tree containing $u$ if $t$ is omitted. In the infix ordering, the descendants of a vertex $v$'s left child come immediately before $v$ and the descendants of its right child come after.
$\root(u)$ returns the root of the tree containing $u$.
$\cut(u)$ cuts the link between $u$ and its parent, making $u$ the root of a separate tree.
$\link(u,v,side)$ links the tree with root $u$ to the tree containing $v$, making $v$ the parent of $u$; $u$ is added as the left child of $v$ if $\side=-1$ and as the right child if $\side=+1$.
$\insertAfter(u,t,v)$ inserts singleton $u$ into tree $t$, immediately after vertex $v$ (in the infix ordering).
$\delete(u,t)$ removes vertex $u$ from tree $t$.
$\join(t_1,u,t_2)$ combines two trees $t_1$ and $t_2$ at a singleton $u$. In the resulting tree, the vertices in $t_1$ come before $u$ in the left-right ordering while the vertices in $t_2$ come after.
$\split(u)$ divides the tree containing $u$ into three parts; $t_1$ containing vertices that come before $u$, $t_2$ containing vertices that come after $u$, and $u$. The two subtrees are returned.
$\insertByKey(u,t,key)$ inserts singleton $u$ into tree $t$, based on the value of $\key[u]$, where $\key$ is an array mapping vertices to key values. The order of the keys is assumed to match the left-to-right order of the vertices in the tree.
$\search(k,t,key)$ searches for a vertex with a specific key value $k$ in tree $t$. The array $\key$ maps vertices to key values.
$\rotate(u)$ performs a rotation operation, which can be used to make a tree “more balanced“ while preserving the left-to-right ordering of its vertices.
$\property(t,p)$ is used to get or set a numeric property of a tree $t$. If the optional argument $p$ is present, it is saved as the tree's property. In any case, if a property has been defined for a tree, it is returned; otherwise 0 is returned.

The data structure includes several operations for modifying trees in the forest. The $\cut$ and $\link$ operations are fairly straightforward. Others require more explanation. The figure below illustrates the operation $\insertAfter(c,a,n)$ operation.

The inserted vertex $c$ becomes the leftmost leaf in the right subtree of its specified predecessor $n$.

Deleting a vertex $v$ from a tree is easy if $v$ has no children. Just cut the link to its parent. It is almost as easy if $v$ has one child, since that child can simply take $v$'s place in the tree. If $v$ has two children, its position in the tree is first swapped with the “rightmost vertex” in its left subtree. This vertex is its immediate predecessor in the tree. At that point, it has at most one child, making the deletion easy. This is illustrated below.

The operation $\join(t_1,u,t_2)$ operation is easy. Simply make $t_1$ the left child and $t_2$ the right child of $u$. The operation $\split(u)$ operation is similarly easy if $u$ is a tree root. If not, the split is accomplished by starting at item $u$ and proceeding up the tree, building two subtrees $L$ and $R$ where $L$ contains the vertices to the left of $u$ and $R$ contains the vertices to its right. This is illustrated below.

Observe that the first step completes a split of the subtree of vertex $c$'s parent $b$. The next step completes a split of the next subtree up the path to the root. The third completes the split for the whole tree.

Binary trees are often used to implement search trees in which each vertex $u$ is assigned a value $\key[u]$ and the left-to-right order of the search tree vertices matches the increasing order of the key values, as illustrated below.

The operation $\search(k,t,key)$ uses the provided key values to locate a vertex with the specified key. It starts at the root and at each step compares the key of the current vertex with that of $u$. In the example above $\search(5,c,key)$ follows the path $[c,a,i,h,e]$ from the root, at each step comparing the key of the current vertex to the target key value and selecting the left or right subtree based on that comparison.

The operation $\insertByKey(u,t,key)$ is intended for use in this situation. It starts by using the provided keys to find a location where $u$ can be inserted as a new leaf, while ensuring that the order of the tree's vertices remains consistent with the key values. In the example above, if $\key[m]=7$ the operation $\insertByKey(m,c,key)$ would search the tree looking for a viable location, eventually inserting $m$ as the right child of $e$.

The operation $\rotate(x)$ is illustrated below.

Notice that the operation moves $x$ one step closer to the root and moves its parent one step further away. If applied in a systematic fashion, rotations can be used to reduce the maximum depth of a tree. In particular, the $\BalancedForest$ data structure (see next sub-section) uses rotations to ensure that the tree depth is never more than twice the log of the tree size.

Rotations are often used in pairs. In particular, when $x$ is an outer grandchild, a double-rotation at $x$ consists of a first rotation at its parent, then a rotation at $x$, as illustrated below.

When $x$ is an inner grandchild, both rotations are done at $x$.

An abridged Javascript implementation appears below.

export default class BinaryForest extends Top { Left; // Left[u] is left child of u Right; // Right[u] is right child of u P; // P[u] is parent of u p(u, v=-1) { if (v >= 0) this.P[u] = v; return this.isroot(u) ? 0 : this.P[u]; } left(u, v=-1) { if (v >= 0) this.Left[u] = v; return this.Left[u]; } right(u, v=-1) { if (v >= 0) this.Right[u] = v; return this.Right[u]; } sibling(u) { let p = this.p(u); return (u == this.left(p) ? this.right(p) : this.left(p)); } first(u) { while (this.left(u)) { u = this.left(u); } return u; } next(u,root=0) { if (this.right(u)) return this.first(this.right(u)); let c = u; u = this.p(u); while (u != this.p(root) && c == this.right(u)) { c = u; u = this.p(u); } return u != this.p(root) ? u : 0; } cut(u) { let pu = this.p(u); if (!this.isroot(u) && u == this.left(pu)) this.left(pu,0); if (!this.isroot(u) && u == this.right(pu)) this.right(pu,0); this.p(u,0); return u; } link(u,v,side=0) { if (u) this.p(u,v); if (side < 0) { this.left(v,u); } else if (side > 0) { this.right(v,u); } else { if (this.left(v) && !this.right(v)) this.right(v,u); else if (!this.left(v) && this.right(v)) this.left(v,u); else if (Math.random() < 0.5) this.left(v,u); else this.right(v,u); } } insertAfter(u, t, v, refresh=0) { if (t == u) return u; if (!v) this.link(u, this.first(t), -1); else if (!this.right(v)) this.link(u, v, +1); else this.link(u, this.first(this.right(v)), -1); if (refresh) refresh(u); return this.root(t); } delete(u, t=0, refresh=0) { if (this.singleton(u)) return u; // find a node close to the root if (!t) t = this.root(u); let tt = (u != t ? t : (this.left(u) ? this.left(u) : this.right(u))); if (this.left(u) && this.right(u)) this.swap(u, this.prev(u)); // now, u has at most one child let cu = (this.left(u) ? this.left(u) : this.right(u)); // cu is now the only child that could be non-zero let pu = this.p(u); if (cu != 0) this.p(cu, pu); if (pu != 0) { if (u == this.left(pu)) this.left(pu, cu); else if (u == this.right(pu)) this.right(pu, cu); } this.p(u,0); this.left(u,0); this.right(u,0); if (refresh) refresh(cu,pu); tt = this.root(tt); return tt; } /** Swap the positions of two tree nodes. */ swap(u, v) { // save pointer fields for nodes u and v let lu = this.left(u); let ru = this.right(u); let lv = this.left(v); let rv = this.right(v); let pu = this.P[u]; let pv = this.P[v]; // special handling of parent field to deal with tree property // fixup fields in u's neighbors if (lu != 0) this.p(lu, v); if (ru != 0) this.p(ru, v); if (!this.isroot(u)) { if (u == this.left(pu)) this.left(pu, v); else this.right(pu, v); } // fixup fields in j's neighbors if (lv != 0) this.p(lv, u); if (rv != 0) this.p(rv, u); if (!this.isroot(v) != 0) { if (v == this.left(pv)) this.left(pv, u); else this.right(pv, u); } // update fields in nodes u and v this.left(u, lv); this.right(u, rv); this.P[u] = pv; this.left(v, lu); this.right(v, ru); this.P[v] = pu; // final fixup for the case that u was originally the parent of v if (v == lu) { this.left(v, u); this.p(u, v); } else if (v == ru) { this.right(v, u); this.p(u, v); } } join(t1, u, t2, refresh=0) { this.link(t1,u,-1); this.link(t2,u,+1); this.p(u,0); if (refresh) refresh(u); return u; } split(u, refresh=0) { ea && assert(this.valid(u)); let v = u; let p = this.p(v); let [l,r] = [this.left(v), this.right(v)]; while (p > 0) { let gp = this.p(p); this.p(p,0); // isolate p's subtree if (v == this.left(p)) { r = this.join(r,p,this.right(p),refresh); } else { l = this.join(this.left(p),p,l,refresh); } v = p; p = gp; } this.p(l,0); this.p(r,0); this.left(u,0); this.right(u,0); this.p(u,0); if (refresh) refresh(u); return [l,r]; } insertByKey(u, t, key, compare=((a,b)=>a-b), refresh=0) { if (!t || t == u) return u; let v = t; let pv = 0; while (v != 0) { pv = v; if (compare(key[u],key[v]) <= 0) v = this.left(v); else v = this.right(v); } this.link(u, pv, compare(key[u],key[pv]) <= 0 ? -1 : +1); if (refresh) refresh(u); return this.root(t); } search(k, t, key, compare=((a,b)=>a-b)) { let u = t; while (u != 0 && compare(key[u],k) != 0) { if (compare(k,key[u]) < 0) u = this.left(u); else u = this.right(u); } return u; } rotate(x) { let p = this.P; let left = this.Left; let right = this.Right; let y = p[x]; if (!y) return; p[x] = p[y]; if (y == left[p[y]]) left[p[x]] = x; else if (y == right[p[y]]) right[p[x]] = x; if (x == left[y]) { left[y] = right[x]; if (left[y]) p[left[y]] = y; right[x] = y; } else { right[y] = left[x]; if (right[y]) p[right[y]] = y; left[x] = y; } p[y] = x; } property(t, p=-1) { ea && assert(this.isroot(t), `BinaryForest.property: ${this.x2s(t)} ${this.P[t]}`); if (p >= 0) this.P[t] = -p; return -this.P[t] } }

Notice that most methods that modify a tree's structure provide an optional function argument called $\refresh$. These functions can be used to assist in balancing the data structure, and/or maintaining client data that may be affected by the tree structure.

To demonstrate the data structure, start with the following script.

let bf = new BinaryForest();
bf.fromString('{[(a b -) *c (d e f)] [(g h (- i j)) *k l] [m *n (o p r)]}');
bf.insertAfter(17,11,8);
log(f.toString()); log(f.toString(0));

The $\fromString$ operation specifies three trees with roots $c$, $k$ and $n$. Note that in this case, each tree is surrounded by square brackets and the roots are highlighed with an asterisk (brackets are omtted for single vertex trees). Also, for any non-leaf node a “missing child” is shown explicitly. The $\insertAfter$ operation inserts $q$ in the tree with root $k$, just after $h$. The $\toString$ operations produce the output below.

{[(a b -) *c (d e f)] [(g h (q i j)) *k l] [m *n (o p r)]} 
{[a b c d e f] [g h q i j k l] [m n o p r]}

Adding

bf.split(9); bf.join(3,9,14);

divides the tree containing $i$ into three parts and then joins the trees with roots $c$ and $n$ to $i$. This yields

{[((a b -) c (d e f)) *i (m n (o p r))] [g *h q] [j *k l]}

Adding two rotate operations at $e$ produces

{[((a b -) c d) *e (f i (m n (o p r)))] [g *h q] [j *k l]}

Balanced Forest

While in some applications, the specific structure of the trees in a forest is important, in others the trees are simply used to represent a collection of lists or sets. So they are comparable to the $\ListSet$ data structure, but can implement certain operations more efficiently. For example, they can identify the list containing a certain item in time proportional to the maximum depth of a tree, rather than the length of a list.

The $\BalancedForest$ data structure is designed to handle such applications efficiently by limiting the depth of its trees to at most $2\lg n $. There are many ways to keep trees balanced. The $\BalancedForest$ uses a method described in [Bayer72, Tarjan87], which assigns a $\rank$ to every node, and ensures that the ranks satisfy two properties. $$ \rank(u) \leq \rank(p(u)) \leq \rank(u) +1 \\ \rank(u) < \rank(p^2(u)) $$ where $p^2(u)$ is the grandparent of $u$. Vertices with fewer than two children are assigned a rank of 1. An example of such a tree is shown below.

One can show by induction that any node of rank $k$ has height at most $2k-1$ and has at least $2^k-1$ descendants. This implies that the maximum depth of any node is $\leq 2 \lfloor\lg (n+1)\rfloor$.

Operations that modify a tree must be extended to to ensure that the conditions on the ranks are maintained. Following an insertion of a node $x$, one must compare $\rank(x)$ with $\rank(p^2(x))$. If $\rank(x)$ is smaller, nothing further needs to be done. If the two ranks are equal, but $x$'s aunt (the sibling of its parent) has smaller rank, the rank conditions can be satisfied by doing one or two rotations. Specifically, if $x$ is an outer grandchild, a single rotation at $p(x)$ completes the process. If $x$ is an inner grandchild, a double-rotation at $x$ is required. This is illustrated below.

If $\rank(x)=\rank(p^2(x))=\rank(\aunt(x))$, $\rank(p^2(x))$ is incremented and the checking procedure is repeated at $p^2(x)$

The procedure for balancing after a deletion is a bit more involved. It starts at the vertex $x$ that took the place of the deleted node. Let $r$ be its rank. There are four case to consider. (Case 1) If $x$'s parent has rank $r+2$ then there is a violation of the first rank condition that needs to be fixed. If $x$'s sibling also has rank $r+2$, a rotation is performed at the sibling, as illustrated below.

This rotation does not fix the rank violation at $x$, but because $x$'s new sibling has rank $r+1$, the violation can now be addressed using one of the three remaining cases.

(Case 2) If $x$'s sibling has rank $r+1$ and both of its children have rank $r$, then $\rank(p(x))$ is decremented, eliminating the rank violation at $x$, but possibly creating a new violation between $p(x)$ and its parent. In this case, the checking procedure is repeated, with $p(x)$ replacing $x$.

(Case 3) If $x$'s sibling has rank $r+1$ and $x$'s nephew (the more distant child of $x$'s sibling) has rank $r+1$, a rotation is performed at the sibling and the ranks of $x$'s parent and its new grandparent are changed to $r+1$ and $r+2$ respectively.

(Case 4) Finally, if $x$'s sibling has rank $r+1$ and its niece has rank $r+1$, a double rotation is performed at the niece and the ranks of $x$'s parent and grandparent are changed to $r+1$ and $r+2$.

Note that the last two cases eliminate the violation at $x$ and create no new violations, so the rebalancing process ends at that point. Also, if an application of the first case is followed by an application of the second case, there can be no remaining violation of the rank condition at that point.

The operation $join(t_1,i,t_2)$ can be performed as before, if $\rank(t_1)=\rank(t_2)$. In this case, $\rank(i)$ is set to $\rank(t_1)+1$. If $\rank(t_1) \neq \rank(t_2)$, things get more complicated. If $\rank(t_1) > \rank(t_2)$, the first step is to follow right pointers from $t_1$ looking for the first vertex $v$ with $\rank(v)=\rank(t_2)$. The subtree at $v$ is then joined to $i$ and $t_2$ and the resulting subtree takes the original place of $v$ in $t_1$. This is illustrated below.

Since this procedure may produce a violation of the second rank condition, the rebalancing procedure used for inserts is applied at this point, starting at $i$.

The split operation is done as before, but uses the balanced version of the join operation in place of the original.

Here is the abridged Javascript implementation.

export default class BalancedForest extends BinaryForest { Rank; // Rank[u] is an optional field used to maintain tree balance rank(u, r=-1) { if (r >= 0) this.Rank[u] = r; return this.Rank[u]; } insertAfter(u, t, v, refresh=0) { return super.insertAfter(u, t, v, u => { if (refresh) refresh(u); this.rerankUp(u); }); } insertByKey(u, t, key, compare, refresh=0) { return super.insertByKey(u, t, key, compare, u => { if (refresh) refresh(u); this.rerankUp(u); }); } delete(u, t=0, refresh=0) { t = super.delete(u, t, (cu,pu) => { if (refresh) refresh(pu); this.rerankDown(cu,pu); }); this.rank(u,1); return t; } swap(u, v) { super.swap(u,v); let ru = this.rank(u); this.rank(u, this.rank(v)); this.rank(v, ru); } join(t1, u, t2, refresh=0) { let r1 = this.rank(t1); let r2 = this.rank(t2); if (r1 == r2) { let t = super.join(t1,u,t2); if (refresh) refresh(u); this.rank(t, r1+1); return t; } if (r1 > r2) { let v = t1; let pv = 0; // track parent in case t2==0 while (this.rank(v) > r2) { pv = v; v = this.right(v); this.steps++; } // now, rank(v) == rank(t2) super.join(v, u, t2); this.link(u,pv,+1); this.rank(u, r2+1); this.p(t1,0); if (refresh) refresh(u); this.rerankUp(u); let t = this.root(t1); return t; } else { // (r1 < r2) mirror of first case let v = t2; let pv = 0; // track parent in case t1==0 while (this.rank(v) > r1) { pv = v; v = this.left(v); this.steps++; } super.join(t1, u, v); this.link(u,pv,-1); this.rank(u,r1+1) this.p(t2,0); if (refresh) refresh(u); this.rerankUp(u); let t = this.root(t2); return t; } } split(u, refresh=0) { let pair = super.split(u, refresh); this.rank(u,1); return pair; } /** Adjust ranks after a node rank increases. */ rerankUp(x) { let rx = this.rank(x); while (this.gp(x) != 0 && this.rank(this.gp(x)) == rx && this.rank(this.aunt(x)) == rx) { x = this.gp(x); rx = this.rank(x,rx+1); this.steps++; } if (this.gp(x) == 0 || rx != this.rank(this.gp(x))) return; if (this.outerGrandchild(x)) this.rotate(this.p(x)); else this.rotate2(x); } /** Adjust ranks after the rank of a node decreases. */ rerankDown(x, px) { let r = this.rank(x); while (this.rank(px) == r+2) { this.steps++; let sx, nefu, nece; if (x != 0) { sx = this.sibling(x); nefu= this.nephew(x); nece= this.niece(x); } else if (this.left(px) != 0) { // && x == 0 sx = this.left(px); nefu = this.left(sx); nece = this.right(sx); } else { // x == 0 && right(px) != 0 sx = this.right(px); nefu = this.right(sx); nece= this.left(sx); } if (this.rank(sx) == r+2) { this.rotate(sx); // rank(sibling(x)) is now r+1, so on next iteration, // next case applies; // also, rank(p(x)) == rank(gp(x)) == r+2 } else { // rank(sx) == r+1 if (this.rank(nefu) == r && this.rank(nece) == r) { x = px; px = this.p(x); r = this.rank(x,r+1); } else { // rank(nefu) = r+1 or rank(nece) = r+1 if (this.rank(nefu) == r+1) this.rotate(sx); else this.rotate2(nece); this.rank(px, r+1); this.rank(this.p(px), r+2); return; } } } } }

Observe how the $\refresh$ function arguments are used to implement the re-balancing.

To demonstrate the data structure, start with the following script.

let bf = new BalancedForest();
bf.fromString('{[a b c d e] [h i j k l m n o p q r]}');
log(bf.toString(0xc));

The $\fromString$ operation specifies two lists that are implemented as trees by the data structure. Vertices that do not appear in either list ($f$ and $g$, in the example) are treated as singleton lists. However, with the inclusion of the appropriate format argument, the $\toString$ method can show the detailed tree structure as shown below.

{[(a b c) *d:2 e] [(h i:2 j) *k:3 (l m:2 ((n o p) q:2 r))]}

Note that ranks of one are omitted. Adding the operations

let [t1,t2] = bf.split(15); bf.join(4,15,t1);

splits the second tree at $o$ and then joins the tree with root $d$ to the first tree produced by the $\split$ yields

{[(((a b c) d:2 e) o:3 (h i:2 j)) *k:3 (l m:2 n)] [p *q:2 r]}

Deleting $d$ from the first tree yields

{[(((a b -) c:2 e) o:3 (h i:2 j)) *k:3 (l m:2 n)] [p *q:2 r]}

and re-inserting it after $a$ in the second tree yields

{[(((a d b) c:2 e) o:3 (h i:2 j)) *k:3 (l m:2 n)] [p *q:2 r]}

Splay Forest

Another way to limit the depth of binary trees is using splay trees [Tarjan87]. Splay trees require no explicit balance information, but by restructuring trees routinely during normal operations they can perform an arbitrary sequence of $m$ operations in $O(m\log n)$ time. While some individual operations may be expensive, the average time per operation is $O(\log n)$.

The key to this approach is a restructuring operation called a splay. A splay at a vertex $u$ consists of a series of double rotations at $u$, that moves $u$ to the root, or at most one step away. If needed, a final single rotation is used to bring $u$ to the root. An example is shown below.

To understand why the splay is helpful, observe that any double-rotation at a node $u$ brings all descendants of $u$ at least one step closer to the root of the search tree. Consequently, if the nearest common ancestor of $u$ with a node $v$ has depth $d$ before the splay, then after the splay, the depth of $v$ is reduced by at least $\lfloor d/2 \rfloor -2$. For ancestors of $u$, this implies that their depth is roughly halved by the splay operation. Also note that the more expensive the splay is, the greater benefit it delivers.

When a search operation reaches a vertex with the target key value, a splay is performed at that vertex, improving the performance of future operations while only increasing the cost of the search by a constant factor. Similarly, a splay is performed after a vertex is inserted or deleted. No splay is performed during a join, but a split is implemented by first performing a splay at the split vertex $u$ and then simply separating the two subtrees of the new root, $u$.

An amortized complexity analysis can be used to show that the time required to do a sequence of $m$ operations on a collection of splay trees is $O(n + m\log n)$ or $O(m\log n)$ in the common case where $m\geq n$. The analysis uses a set of fictitious “credits” that are allocated for each operation and used to “pay” for computational steps. Credits that are not needed to pay for a particular operation can be saved and used to pay for later operations. The total number of credits allocated to a sequence of operations then serves as an upper bound on the number computational steps. To ensure that there are always enough credits on hand, credits are allocated so as to satisfy the following credit policy.

For each vertex of $u$, maintain $\rank(u)= \lfloor \lg (\textrm{the number of descendants of }u)\rfloor$ credits.

To determine the number of credits that must be allocated to a splay operation at vertex $u$, let's first determine the number needed for one double-rotation. Let $v$ be the parent of $u$ and $w$ its grandparent and let $\rank'$ denote the rank values after the rotation (note that $\rank'(u)=\rank(w)$). The number of credits needed to satisfy the credit policy following a double rotation is \begin{eqnarray*} (\rank'(u)-\rank(u)) &+& (\rank'(v)-\rank(v)) + (\rank'(w)-\rank(w)) \\ &=&(\rank'(w)+\rank'(v)) - (\rank(u)+\rank(v)) \\ &\leq& 2(\rank(w) - \rank(u)) \end{eqnarray*} If $\rank(u) < \rank(w)$ and $3(\rank(w) - \rank(u))$ credits are allocated for the double rotation, there will be at least one credit available to pay for the operation, while still satisfying the credit policy.

Now consider the case where $\rank(u)=\rank(w)=r$. If $u$ is an inner grandchild, then after the double-rotation, $v$ and $w$ are its children, meaning that at least one of them has a smaller rank after the operation. This implies that there is one less credit needed to satisfy the credit policy after the operation, and this credit can be used to pay for it. If $u$ is an outer grandchild, then $u$ has at least $2^r$ descendants and $w$ has fewer than $2^r$ descendants outside of $u$'s subtree. This implies that after the operation $w$ has fewer than $2^r$ descendants, making $\rank'(w) < r$.

So again, one less credit is needed to satisfy the credit policy and that credit can be used to pay for the operation. Summarizing, if $3(\rank(w)-\rank(u))$ credits are allocated to each double rotation, the operation can be paid for while continuing to satisfy the credit policy.

If a single rotation is required to complete a splay, the number of credits needed to satisfy the credit policy after the operation is $$ (\rank'(u)-\rank(u)) + (\rank'(v)-\rank(v)) =\rank'(v) - \rank(u) \leq \rank(v) - \rank(u) $$ If $3(\rank(v) - \rank(u))+1$ credits are allocated for the rotation, it can be paid for while still satisfying the credit policy. These observations are summarized in the following lemma.

Lemma. Let $u$ be a vertex in a tree with root $v$. If $3(\rank(v)-\rank(u))+1$ credits are allocated to a splay operation at $u$, the operation can be fully paid for while continuing to satisfy the credit policy.

Note that this implies that at most $3\lfloor \lg n \rfloor+1$ credits are needed for each splay. The next step in the analysis of splay trees is to determine the number of credits that must be allocated to each operation in addition to those required for splays. For an insert operation, each vertex along the path from the inserted vertex to the root acquires a new descendant and consequently, the ranks will increase for those vertices that have exactly $2^k-1$ descendants before the insertion. There are at most $\lfloor \lg n \rfloor+1$ such vertices, so the total number of credits needed for the insert (including the splay) is $\leq 4(\lfloor \lg n \rfloor)+3$.

The join requires an additional $\lfloor \lg n \rfloor+1$ credits to satisfy the credit policy at the new tree root, and since no splay is involved, $\lfloor \lg n \rfloor+2$ credits are sufficient to pay for the operation while continuing to satisfy the credit policy.

The other operations require just one credit beyond those needed for the splay, so the total number of credits required for $m$ operations is $O(m\log n)$. The initialization can be paid for with $n$ credits and no additional credits are needed to satisfy the credit policy initially. Since every operation is paid for with allocated credits and the number of credits allocated is $O(n + m \log n)$, the time required for the operations is also $O(n + m \log n)$.

The splay lemma is also true using a more general definition of the $\rank$. Specifically, if each vertex $u$ is assigned some arbitrary weight $w(u)$ and $tw(u)$ is defined as the total weight of the subtree with root $u$, then $\rank(u)$ can be defined as $\lfloor \lg tw(u) \rfloor$. The argument used earlier still works with this definition. The more general version is used in a later chapter to analyze the performance of the dynamic trees data structure.

An abridged Javascript implementation is shown below.

export default class SplayForest extends BinaryForest { splay(x) { for (let y = this.p(x); y; y = this.p(x)) { let z = this.p(y); if (z) this.rotate(this.outerGrandchild(x) ? y : x) this.rotate(x); } return x; } find(u, nosplay=false) { return nosplay ? super.root(u) : this.splay(u); } search(k, t, key) { return this.splay(super.search(k, t, key)); } insertAfter(u, t, v) { super.insertAfter(u, t, v); this.splay(u); } insertByKey(u, t, key, compare=((a,b)=>a-b)) { return super.insertByKey(u, t, key, compare, u => this.splay(u)); } delete(u,t=0) { return super.delete(u, t, (cu,pu) => { this.splay(pu); }); } split(u) { this.splay(u); return super.split(u); } }

To demonstrate splay trees, run the following script.

let sf = new SplayForest();
sf.fromString('{[((((((((a b -) c -) d -) e -) f -) g -) h -) i -) *j -]}');
sf.splay(1); log(sf.toString());
sf.splay(3); log(sf.toString());
sf.splay(5); log(sf.toString());

The $\fromString$ operation creates a highly unbalanced tree and each splay improves the balance, as shown below.

{[- *a (((((- b c) d e) f g) h i) j -)]} 
{[(- a b) *c (((- d e) f (g h i)) j -)]} 
{[((- a b) c d) *e ((- f (g h i)) j -)]}

In most situtions, balanced binary trees are more efficient than splay trees, since they requires far fewer rotations. While a single rotation does take constant time, it is a fairly expensive operation. Still, there are some situations where splay trees can out-perform explicitly balanced trees.

References

[Bayer72]: “Symmetric binary B-trees: Data structure and maintenance algorithms,” by R. Bayer. In Acta Informatica 1972.
[Tarjan87]: Network Algorithms and Data Structures by Robert E. Tarjan. Society for Industrial and Applied Mathematics, 1987.