User:Qunwangcs157/sandbox
This page is about proof of O(log*n) Amortized Time [1] of Union Find[2][3][4]
Statement:
Proof
[edit]Fact1: As the find function follows the path along to the root, the rank of node it encounters is strictly increasing
- Proof: claim that as Find and Union operations are applied to the data set, this fact remains true over time. Initially when each node is the root of its own tree, it's trivially true. The only case when the rank of a node might be changed is when the Union by Rank operation is applied. In this case, a tree with smaller rank will be attached to a tree with greater rank, rather than vice versa. And during the find operation, all nodes visited along the path will be attached to the root, which has larger rank than its children, so this operation won't change this fact either.
Fact2: A node u which is root of a subtree with rank r has at least 2r nodes.
- Proof: Initially when each node is the root of its own tree, it's trivially true. Assume that a node u with rank r has at least 2r nodes. Then when two tree with rank r Unions by Rank and form a tree with rank r + 1, the new node has at least 2r + 2r = 2r + 1 nodes
Fact3: The maximum number of nodes of rank r is ≤ n/2r.
- Proof: Since from fact2 we know that a node u which is root of a subtree with rank r has at least 2r nodes. We will get the maximum number of nodes of rank r when all nodes with rank r is root of a tree that has exactly 2r nodes. In this case, the number of nodes of rank r is n/2r
For convenince, we define "bucket" here: a bucket is a set that contains vertices with particular ranks.
We create some buckets and put vertices into buckets according to their ranks. That is, vertices with rank 0 goes into the zeroth bucket, vertices with rank 1 goes into the first bucket, vertices with rank 2 and 3 goes into the second bucket, vertices with rank B to 2B - 1 goes into the Bth bucket.
We can make two observations about the buckets.
- 1. The total number of buckets is ≤ log*n
- Proof: When we go from one bucket to the next, we add one more two to the power, that is, the next bucket to [B, 2B - 1] will be [2B, 22B - 1]
- 2. The maximum number of elements in bucket [B, 2B – 1] is ≤ 2 * n/2B
- Proof: The maximum number of elements in bucket [B, 2B – 1] is ≤ n/2B + n/2B+1 + n/2B+2 + … + n/22^B – 1 ≤ 2 * n/2B
Let T1 = (link to the root)
T2 = (number of links traversed where the buckets are different)
T3 = (number of links traversed where the buckets are different)
Then the total cost of m finds T = T1 + T2 + T3
T1 = O(m). T2 = O(mlog*n),
For T3, suppose we are traversing from u to v, where u and v have rank in [B, 2B - 1]. From fact1, we know that times we traversed a link (u,v) where u and v were in the same bucket is is <= 2B - 1 - B, which is <= 2B.
Therefore, T3 =
From Observation 1 and 2, we can conclude that T3 is <= <= 2nlog*n
Therefore, T = T1 + T2 + T3 = O(mlog*n)
References
[edit]- ^ Raimund Seidel, Micha Sharir. "Top-down analysis of path compression", SIAM J. Computing 34(3):515–525, 2005
- ^ Tarjan, Robert Endre (1975). "Efficiency of a Good But Not Linear Set Union Algorithm". Journal of the ACM. 22 (2): 215–225. doi:10.1145/321879.321884.
- ^ Hopcroft, J. E.; Ullman, J. D. (1973). "Set Merging Algorithms". SIAM Journal on Computing. 2 (4): 294–303. doi:10.1137/0202024.
- ^ Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algorithms. Journal of the ACM, 31(2):245–281, 1984.