Data Structures for Muggles

Prologue

All the following example will be shown in C Programming Language or pseudo code.

This is the note when I was taking the course in NCKU, 2024. Blablabla…

Finally, I would like to declare that almost every photo I use comes from the handouts of my course at NCKU, provided by the professor. If any photo comes from another source, I will give proper credit in the caption or description of the image.

Complexity

Space Complexity

  • The amount of memory that it needs to run to completion.
  • S(P)=c+SP(I)S(P)=c+S_P(I)
  • Total space requirement = Fixed space requirement + Variable space requirements
  • We usually care more about the Variable space requirements.

For example, we have the following code like this.

Iterative

float sum(float list[], int n)
{
    float tempsum=0;
    int i;
    for (i = 0; i < n; i++){
        tempsum += list[i];
    }
    return tempsum;
}

In this code, we DO NOT copy the array list, we just passes all parameters by value, so the variable space requirements Ssum(I)=0S_{sum}(I)=0. Here’s another example, let’s look at the code first.

Recursive

float rsum(float list[], int n)
{
    if (n) return rsum(list,n-1)+list[n-1];
    return 0;
}

In this case, 1 recursive call requires K bytes for 2 parameters and the return address. If the initial length of list[] is N, the variable space requirements Srsum(N)=N×KS_{rsum}(N)=N\times K bytes.

Time Complexity

  • The amount of computer time that it needs to run to completion.
  • T(P)=c+TP(I)T(P)=c+T_P(I)
  • Total time requirement = Compile time + Execution time
  • We usually care more about the Execution time.

For example, this is a segment with execution time independent from the instance characteristics.

float sum(float list[], int n)
{
    float tempsum=0;
    int i;
    for (i = 0; i < n; i++){
		tempsum += list[i];
    }
    return tempsum;
}

Time Complexity

On the other hand, a segment with execution time dependent on the instance characteristics will be like this.

float rsum(float list[], int n)
{
    if (n){
         return rsum(list,n-1)+list[n-1];   
    }
    return 0;
}

Time Complexity

For given parameters, computing time might not be the same. So, we should know the following cases.

  • Best-case count: Minimum number of steps that can be executed.
  • Worst-case count: Maximum number of steps that can be executed.
  • Average count: Average number of steps executed.

Here, I’m going to use insertion sort to show you the difference between the best-case & the worst-case. If you don’t know what it is, you can go check it out here. Following is a snippet of the insertion sort algorithm in C.

for (j = i - 1; j >= 0 && t < a[j]; j--){
	a[j + 1] = a[j];
}

Insertion Sort

Worst-case

  • a[0 : i-1] = [1, 2, 3, 4] and t = 0
    • 4 compares
  • a[0 : i-1] = [1, 2, 3, …, i] and t = 0
    • i compares

For a list in decreasing order, the total compares will be

1+2+3+4++(n1)=n(n1)2=12n212n1+2+3+4+\dots+(n-1)=\frac{n(n-1)}{2}=\frac{1}{2}n^2-\frac{1}{2}n

Best-case

  • a[0 : i-1] = [1, 2, 3, 4] and t = 5
    • 1 compare
  • a[0 : i-1] = [1, 2, 3, …, i] and t = i + 1
    • 1 compare

For a list in increasing order, the total compares will be

1+1+1++1=n11+1+1+\dots+1=n-1

Asymptotic Complexity

Sometimes determining exact step counts is difficult and not useful, so we will use the Big-O notation to represent space and time complexity. Here’s a question, which algorithm is better? The one with 12n212n\frac{1}{2}n^2-\frac{1}{2}n or n2+3n4n^2+3n-4. The answer is, both of them are O(n2)O(n^2)​, so the performances are similar.

Now, we jump back to the example in previous part. What is the time complexity of insertion sort in the Big-O representation?

  • Worst case
    • 12n212nO(n2)\frac{1}{2}n^2-\frac{1}{2}n\Rightarrow O(n^2)
  • Best case
    • n1O(n)n-1\Rightarrow O(n)

Why Algorithm Is Important

This is a table of how various functions grow with nn.

How various function grow with n

As you can see, the exponential one grows exponentially, which means a little variation of n can increase a lot of computing time. In fact, on a computer performing 1 billion (109 ) steps per second. 2402^{40} steps require 18.3 mins, 2502^{50} steps require 13 days, and 2602^{60}​ steps require 310 years! So that’s why improving algorithm may be more useful than improving hardware sometimes.

Summary

  • Space and time complexity
    • Best case, worst case, and average
    • Variable and fixed requirements
  • Asymptotic complexity
    • Big-O
  • Example:
    • Insertion sort
    • Selection sort

More information about complexity will be introduce when I take the algorithm course next year, or you can search for some sources if you really want to learn more about it.

Arrays

Intro

It’s a collection of elements, stored at contiguous memory locations. Each element can be accessed by an index, which is the position of the element within the array. It’s the most fundamental structure and the base of other structures.

1-D Array

1-D Array

This is an 1-D array with length 10. We can access the first element by the index 0 (zero-indexing). But what should we do if the memory size we need is changing and not always the same? Will it be suitable for all kind of situation?

The answer is negative, and that’s why we need something called Dynamic Memory Allocation. When we need an area of memory and do not know the size, we can call a function malloc() and request the amount we need. The following is an code implementation of a dynamic 1-D array storing integers.

int size;
scanf("%d", &size); // Ask an arbitrary positive number as the size of the array
int *array = (int *)malloc(size * sizeof(int)); // Finish the memory allocation

// Do operations to the array here

free(array); // Free the memory after use
  • The free() function deallocates an area of memory allocated by malloc().

So, that’s how to implement the dynamic 1-D array in C. Now, here comes another question. What if we want to store some data like the student ID and his exam scores? Should we store it in an 1-D array?

2-D Array

To answer the question above, we can choose 2-D array as the data structure to store something like that. C uses array-of-array representation to represent multidimensional array. The following is an example diagram to show how the things work in an 2-D array in C.

2-D Array

To be more specific, it shows that the first dimension of the array (blue part) stores the pointers that pointing to the second dimension arrays (orange part), which are storing integer or other data types. Following is an example code of a dynamic 2-D array.

int rows, cols;

// Get the dynamic size
scanf("%d", &cols);
scanf("%d", &rows);

// Allocate the first dimension (blue part in the diagram above)
int **array = (int **)malloc(cols * sizeof(int *));
// For every rows, allocate the rows size for the row (orange part)
for (int i = 0; i < cols; i++) {
    array[i] = (int *)malloc(rows * sizeof(int));
}

Row-Major and Column-Major

To know what is row-major & column-major, we can take a look at this example. If we have a matrix called A.

A=[123456789]A=\left[\begin{array}{lll}1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9\end{array}\right]

  • Column-major
    • A=[1,4,7,2,5,8,3,6,9]A=[1, 4, 7, 2, 5, 8, 3, 6, 9]
    • Elements of the columns are contiguous in memory
    • Top to bottom, left to right
    • Matlab
    • Fortran
  • Row-major
    • A=[1,2,3,4,5,6,7,8,9]A=[1, 2, 3, 4, 5, 6, 7, 8, 9]
    • Elements of the rows are contiguous in memory
    • Left to right, top to bottom
    • C
    • C++

Stacks

Intro

Stack is also an fundamental structure that is opposite from Queue (will be mentioned later), it has some properties

  • Last-In-First-Out (LIFO)
  • Like a tennis ball box
  • Basic operations
    • Push: Insert an element to the top of the stack
    • Pop: Remove and return the top element in the stack
    • Peek/Top: Return the top element in the stack
    • IsEmpty: Return true if the stack is empty
    • IsFull: Return true if the stack is full

The following is the diagram of the push and pop operation of stacks.

Push and Pop in Stack Operations. Source: https://www.programiz.com/dsa/stack

Application: Balanced Brackets

  • Check whether the brackets are balanced.
  • Examples:
    • “]” → False
    • “(a+b)*c-d” → True
  • Steps:
    • Scan the expression from left to right
    • When a left bracket, “[”, “{”, or “(”, is encountered, push it into stack.
    • When a right bracket, “]”, “}”, or “)”, is encountered, pop the top element from stack and check whether they are matched.
      • If the right bracket and the pop out element are not matched, return False.
    • After scanning the whole expression, if the stack is not empty, return False.

Dynamic Stacks

If we want to dynamically allocate the size of the stack, we should still use malloc(). Here’s an example using dynamic arrays.

  • Headers and declarations
int *stack;
int capacity = 1; // Initial capacity
int top = -1; // Representing empty stack
  • Create stack
void createStack() {
    stack = (int*)malloc(capacity * sizeof(int));
    if (stack == NULL) {
        printf("Memory allocation failed\n");
        exit(1);
    }
}
  • Array doubling when stack is full
void stackFull() {
    capacity *= 2;  // Double the capacity
    stack = (int*)realloc(stack, capacity * sizeof(int));
    if (stack == NULL) {
        printf("Memory allocation failed\n");
        exit(1);
    }
    printf("Stack expanded to capacity: %d\n", capacity);
}
  • Check if the stack is empty
int isEmpty() {
    return top == -1;
}
  • Push
void push(int value) {
    if (top == capacity - 1) {  // If the stack is full, double it
        stackFull();
    }
    stack[++top] = value;
}
  • Pop
int pop() {
    if (isEmpty()) {
        printf("Stack underflow. No elements to pop.\n");
        return -1; 
    }
    return stack[top--];
}
  • Peek (or Top)
int peek() {
    if (isEmpty()) {
        printf("Stack is empty.\n");
        return -1; 
    }
    return stack[top];
}

Queues

Intro

As I mentioned, queues are the opposite of stacks. That is because although they are both linear or ordered list, they have totally different properties.

  • First-In-First-Out (FIFO)
  • New elements are added at the rear end
  • Old elements are deleted at the front end.
  • Basic operations
    • Add: Insert element at the rear of a queue
    • Delete: Remove element at the front of a queue
    • IsFull: Return true if the queue is full. (rear == MAX_QUEUE_SIZE - 1)
    • IsEmpty: Return true if the queue is empty (front == rear)

The following is the diagram of the insertion and deletion of the queues.

Operations of the Queue

If now e have a queue that contains 5 elements but has the length 6 (null stands for no element there).

Q=[A,B,C,D,E,null]Q = [A, B, C, D, E, null]

After we delete an element from the front and add another one at the rear, it will be like this.

Q=[null,B,C,D,E,F]Q = [null, B, C, D, E, F]

The queue will be regarded as full since rear == MAX_QUEUE_SIZE, but we know that actually some empty space remains. So, how to fix it? We can think about when a customer at front of the line leaves, what will other customers do?

They move forward. So, we should shift other elements forward after deletion. That way, the queue became

Q=[B,C,D,E,F,null]Q = [B, C, D, E, F, null]

Voila! We fix it! But if we do this at every deletion, the program will be way too slow. So here’s the next question, what should we do to fix it?

Circular Queue

In a circular queue, the front and the rear will have some differences comparing with the normal queue.

  • front: One position counterclockwise from the position of the front element.
  • rear: The position of the rear element.

This is how it looks like.

Circular Queue

The addition or insertion in an circular queue has the following steps:

  1. Move rear one position clockwise:
    rear = (rear + 1) % MAX_QUEUE_SIZE

  2. Put element into queue[rear]

Insertion in an Circular Queue

And the deletion steps are below:

  1. Move front one position clockwise:
    front = (front + 1) % MAX_QUEUE_SIZE

  2. Return and delete the value of queue[front]

Deletion in an Circular Queue

In a circular queue, how to know a queue is full? We can think about when a queue is keep adding elements until it’s full, what will the front and rear go? The answer is, when elements are keep being added to a circular queue, the rear will keep going clockwise and finally equals to the front.

On the other hand, we can also use this to determine if the queue is empty. Because if the elements in the queue are keep being deleted, the front will keep moving clockwise until it equals to rear.

So if the condition to determine the full and empty are the same (front == rear), how can we tell which is which?

Here’s some of the solutions, but there are still a lot of ways to conquer this.

  1. Set a LastMoveIsFront() boolean value to recognize the last move is front or rear.
  2. Set another variable count and modify it everytime we delete or add something.
  3. Save an empty slot in the circular queue, so that when the queue is full, the rear will be at the position “just before the front” (because one empty slot is reserved), thus avoiding the confusion with the condition rear == front which indicates an empty queue.

Linked Lists

Intro

First, I want to talk about the deletion and insertion in sequential representation, which is mentioned a lot in the previous parts. When we deleting or inserting things in a queue or an array, there’re always be some empty spaces created by the operations (See the picture below for details.) And that means we have to move many elements many times to maintain a sequential data structure.

We need to move elements very often

So, to avoid this waste, we want to store data dispersedly in memory, just like the picture below.

Store Data Dispersedly

But now, we are facing another problem. How do we know the order of the data if we store an sequential data like this? We use linked list! Let’s start talking about linked list right away. Here is the features of it.

  • In memory, list elements are stored in an arbitrary order.
  • The fundamental unit is called a node.
  • In a normal linked list, a node contains data and link/pointer.
  • The link is used to point to the next element.

To answer the previous question, we need to talk about why linked list is called linked list. It’s because every element in the list linked to the very next data, so we can know the sequence even thought we didn’t save them sequentially. To let you better understand, the following graph is the illustration of how it works.

Linked List

Another way that usually seen to draw a linked list is like this.

Linked List by Arrows

Operations

Concept of Insertion

To insert K between B and C, we have to follow the steps below.

  • Get an unused node a.
  • Set the data field of a to K.
  • Set the link of a to point to C.
  • Set the link of B to point to a.

Insertion of Linked List

Concept of Deletion

To delete C in the linked list, we have to follow the steps below.

  • Find the element precedes C.
  • Set the link of the element to the position of D.

Deletion in Linked List

Define A Node in C

typedef struct listNode *listPointer;
typedef struct {
    char data;
    listPointer link;
} listNode;

Node

Get(0)

desiredNode = first; // get the first node
return desiredNode->data;

Get(0)

Get(1)

desiredNode = first->link; // get the second node
return desiredNode->data;

Get(1)

Delete “C”

  • Step 1: find the node before the node to be removed
beforeNode = first->link;

Step 1 of Deletion

  • Step 2: save pointer to node that will be deleted
deleteNode = beforeNode->link;

Step 2 of Deletion

  • Step 3: change pointer in beforeNode
beforeNode->link = beforeNode->link->link;
free(deleteNode);

Step 3 of Deletion

Insert “K” before “A"

  • Step 1: get an unused node, set its data and link fields
MALLOC( newNode, sizeof(*newNode));
newNode->data = ‘K’;
newNode->link = first;

Step 1

  • Step2: update first
first = newNode

Step 2

Insert “K” after “B”

Insert After

beforeNode = first->link;
MALLOC( newNode, sizeof(*newNode));
newNode->data = ‘K’;
newNode->link = beforeNode->link;
beforeNode->link = newNode;

Circular Linked List

  • The last node points to the first node.

Circular Linked List

Doubly Linked List

In general linked list, to find an element will always start at the beginning of the list, but it’s not the case in doubly linked list!

  • Right link: forward direction
  • Left link: backward direction

Doubly Linked List

Node in Doubly Linked List

typedef struct node *nodePointer;
typedef struct node{
    nodePointer llink;
    element data;
    nodePointer rlink;
};

Node in Doubly Linked List

Insert “K” after “B”

newNode->llink = node;
newNode->rlink = node->rlink;

![Step 1](https://raw.githubusercontent.com/CX330Blake/MyBlogPhotos/main/image/螢幕擷取畫面 2024-09-19 223329.png)

node->rlink->llink = newNode;
node->rlink = newNode;

Step 2

Linked Stacks & Queues

  • Linked stack
    • Push: add to a linked list
    • Pop: delete from a linked list
  • Linked queue
    • AddQ: add to a rear of a linked list
    • DeleteQ: delete the front of a linked list

Linked Stack

Linked Queue

Summary

  • Linked lists
  • Doubly linked lists
  • Doubly circular linked lists
  • Operations: Get, Insert, and Delete
  • Applications:
    • Linked stacks
    • Linked queues

Trees & Binary Trees

Intro

Tree is an data structure that looks like a genealogy, so there’s a lot of terms that similar to the terms in a family. Now, let’s introduce some terms about the trees.

  • Degree of a node
    • number of subtrees
  • Degree of a tree
    • maximum of the degree of the nodes in the tree
  • Height (depth) of a tree
    • maximum level of any node in the tree
  • Leaf (or Terminal)
    • nodes having degree zeros
  • Internal node
    • not leaf & not root
  • Children
    • roots of subtrees of a node X
  • Parent
    • X is a parent
  • Sibling
    • children of the same parent
  • Ancestor
    • all the nodes along the path from the root to the node

Tree

List Representation

If we want to represent a tree in a list format, we can use this mathod. Following is an example of a tree and its list representation.

Tree

It’s list representation will be like (A (B (E (K, L), F), C (G), D (H (M), I, J)))\text{(A (B (E (K, L), F), C (G), D (H (M), I, J)))}.

Binary Trees

Binary trees are trees that have the following characteristics.

  • Degree-2 tree
  • Recursive definition
    • Each node has left subtree and/or right subtree.
    • Left subtree (or right subtree) is a binary tree.

This is an picture of a binary tree.

Binary Tree

Full Binary Trees

A full binary tree is a tree that has height hh and 2h12^h-1 nodes. This is the graph of the full binary tree.

Full Binary Tree

Please note that each full binary tree of the same height will only have one type. If we numbering the nodes in a full binary tree from top to bottom, left to right, then we will have the following properties.

  • Let nn be the number of nodes in a FULL binary tree
    • Parent node of ii is node floor(i2)floor(\frac{i}{2})
    • Left child of node ii is node 2i2i
    • Right child of node ii is node 2i+12i+1

Skewed Binary Trees

  • At least one node at each of first hh levels
  • Minimum number of nodes for a height hh

Skewed Binary Tree

Number of Nodes and Height

  • Let nn be the number of nodes in a binary tree whose height is hh.
    • hn2h1h\le n\le 2^h-1
    • log2(n+1)h\log_2(n+1)\le h
    • The height h of a binary tree is at least log2(n+1)log_2(n+1).

Complete Binary Trees

How to create a complete binary tree? Just follow the steps and check out the graph below.

  1. Create a full binary tree which has at least nn nodes.
  2. Number the nodes sequentially.
  3. The binary tree defined by the node numbered 11 through nn is the n-node complete binary tree.

Complete Binary Tree

Binary Tree Representations

Array Representation

  • Number the nodes using the numbering scheme for a full binary tree. The node that is numbered i is stored in tree[i].

Array Representation

Since the number should be placed in the FULL binary tree, sometimes there will be some memory waste. For example, if we create a 4-level right-skewed binary tree, it will has the length 15, but only 4 being used.

Worst Case for Required Space

Linked Representation

  • Each binary tree node is represented as an object whose data type is TreeNode.
  • The space required by an n node binary tree is n * (space required by one node).
typedef struct node *treePointer;
typedef struct node{
    char data;
    treePointer leftChild, rightChild;
};

Node

Linked Representation

Binary Tree Traversal

  • Visiting each node in the tree exactly once
  • A traversal produces a linear order for the nodes in a tree
    • LVR, LRV, VLR, VRL, RVL, and RLV
      • L: moving left
      • V: visiting the node
      • R: moving right
    • LVR: Inorder
    • VLR: Preorder
    • LRV: Postorder

Inorder

void inOrder(treePointer ptr){
    if (ptr != NULL){
        inOrder(ptr->leftChild);
        visit(ptr);
        inOrder(ptr->rightChild);
    }
}

Inorder

Preorder

void preOrder(treePointer ptr){
    if (ptr != NULL){
        visit(t);
        preOrder(ptr->leftChild);
        preOrder(ptr->rightChild);
    }
}

Preorder

Postorder

void postOrder(treePointer ptr){
    if (ptr != NULL){
        postOrder(ptr->leftChild);
        postOrder(ptr->rightChild);
        visit(t);
    }
}

Postorder

Difference Between Inorder, Preorder & Postorder

This is an illustration from GeeksforGeeks.

Comparison between different order. Source: GeeksforGeeks

Level-Order

  • Visiting the nodes following the order of node numbering scheme (sequential numbering)

Level-Order

Iterative Traversal Using Stack

The following is an inorder example.

  • Push root into stack
  • Push the left into stack until reaching a null node
  • Pop the top node from the stack
    • If there’s no node in stack, break
  • Push the right child of the pop-out node into stack
  • Go back to step 2 from the right child

Here’s the graph to let you more understand the workflow.

Inorder Iterative Traversal Using Stack

Convert Sequences Back to Trees

If we are given a sequence of data and we try to turn it back to a tree, how should we do? The following is an example of turnning a very short sequence data back into trees.

Example

We can clearly see that if we are only given a sequence in 1 specific order, then we cannot know which side (left or right) is the original tree grows. But if we are given 2 different orders with 1 of them is inorder, then we can do it by the inference.

image-20240926212026845

Step 1

Step 2

Step 3

Binary Search Trees

Intro

The binary search tree, of course, it’s an binary tree. But beside this, it has some unique properties so that we can call it a binary search tree.

  • Each node has a (key, value) pair
  • Keys in the tree are distinct
  • For every node x
    • all keys in the left subtree are smaller than that in x
    • all keys in the right subtree are larger than that in x
  • The subtrees are also binary search tree.

Operations

Search(root, key)

  • k == root’s key, terminate
  • k < root’s key, check left subtree
  • k > root’s key, check right subtree
  • Time complexity is O(height)=O(n)O(height)=O(n), nn is number of nodes

Search

Recursive

if (!root) return NULL;
if (k == root->data.key)
	return root->data;
if (k < root->data.key)
	return search(root->leftChild, k);
return search(root->rightChild,k)

Variable space requirement: O(height)=O(n)O(height) = O(n), nn is the number of nodes.

Iterative

while(tree);
	if (k == tree->data.key)
		return tree->data;
	if (k < tree->data.key)
		tree = tree->leftChild;
	else
		tree = tree->rightChild;
return NULL

Insert(root, key, value)

  1. Search the tree
    • Matched: Do nothing
    • No match: Obtain LastNode, which is the last node during the search.
  2. Add new node
    • Create a new node with (key, value)
    • If key > the key of LastNode, add the new node as right child
    • If key < the key of LastNode, add the new node as left child

Insert

Delete(key)

There’re 4 cases in the deletion.

  • No element with delete key
  • Element is a leaf
  • Element is a degree-1 node
  • Element is a degree-2 node

Since the 1st case means we don’t do anything, we will start from the 2nd case.

Delete a leaf

Delete a Leaf

Delete a Degree-1 Node

Link the single child of the DeletedNode to the parent of DeletedNode.

Delete a Degree-1 Node

Delete a Degree-2 Node

There two ways to do it. We can replace the deleted node with

  1. Largest pair in its left subtree
  2. Smallest pair in its right subtree

These two pairs must be leaf nodes or degree-one nodes. Why?

Step 1

Step 2

Rank of a Node in a BST

Rank of node xx

  • The number of the nodes whose key values are smaller than xx
  • Position of xx inorder
  • Like the index of an array

Heaps

Intro

Imagine that we have a program that records patients waiting in the hospital, if the data structure we use is a queue (FIFO), and a dying person comes, we cannot let him queue up to see the doctor first.

So that’s why we need a priority queue! But how to implement one? Let’s dive into the heaps right away!

Source: https://www.udemy.com/course/js-algorithms-and-data-structures-masterclass/

Min Tree & Max Tree

First, we should know what is min tree and max tree to establish the concept we will use later.

Min Tree & Max Tree

Min Heap & Max Heap

Min Heap & Max Heap

Array Representation

Array Representation of a Heap

Operations

Here I use max heap for example, but it’s very similar with the min heap so you can try it by yourself!

Insert(key)

  1. Create a new node while keep the heap being a complete binary tree.
  2. Compare the key with the parent.
    • If the key < parent, pass
    • If the key > parent, switch place with parent and recursively compare the parent with parent’s parent until it match the definition (key value in any node is the maximum value in the subtree)

Step 1

Step 2

Step 3

This process is also called bubbling up process since it looks like a bubble goes up the water surface.

After we know how to insert things to a heap, we also care about it’s speed. The time complexity of the inseriton is O(logn)O(\log{n}), which nn is the heap size (height/level).

Delete() or Pop()

  • Removing the root of the heap
    • Root is the min element in a min heap
    • Root is the max element in a max heap

Following is the steps to do this operation

  1. Removing the last node and inserting it into the root
  2. Moving the node to a proper position
    • Find the child containing max key value and exchange the position

Step 1

Step 2

Step 3

Step 4

And the time complextity of deletion is O(logn)O(\log{n}), where nn is also the heap size.

Leftist Heaps (Leftist Trees)

Intro

In previous part, we use the situation in hospital as an example to explain why we need priority queue, and now the situation becomes more complicated. In that hospital, 2 doctors are on call, just like the following graph.

Hospital

But doctor1 becomes off duty, the 2 priority queues should be melded together for doctor2. To do this, we need the Leftist heaps!

Extended Binary Trees

Before we step into the leftist tree, I need to introduce extended binary trees, a transformation of binary trees. To turn a binary tree into an extended binary tree, we just simply add an external node to every leaf in the original binary tree, then we obtained an extended binary tree! This is an example.

Extended Binray Tree

Function: Shortest(x)

Let’s say x is a node in an extended binary tree, then function shortest(x) is the length of a shortest path from x to an external node.

Shortest

And here’s the formula for calculate the shortest(x) for all x belongs to internal nodes (since shortest(x) for all x in external nodes are 0).

shortest(x)=1+min(shortest(leftChild(x)),shortest(rightChild(x)))shortest(x)=1+min(shortest(leftChild(x)), shortest(rightChild(x)))

Example

Function: Count(x)

This one will be a little easier to understand. Count means the nubmer of internal nodes. So for all x, which x is an internal node, we can calculate count(x) by the following formula (since count(x) of all x in external nodes are 0).

count(x)=1+count(leftChild(x))+count(rightChild(x))count(x)=1+count(leftChild(x))+count(rightChild(x))

Example

Height-Biased Leftist Trees (HBLT)

Following is the definitioan of a HBLT:

  • An extended binary tree
  • For every internal node xx, shortest(leftChild(x))shortest(rightChild(x))shortest(leftChild(x))\geq shortest(rightChild(x))

In the following graph, B is a HBLT while A isn’t.

B is a HBLT while A isn't

Weight-Biased Leftist Trees (WBLT)

Following is the definition of a WBLT:

  • An extended binary tree
  • For every internal node xx, count(leftChild(x))count(rightChild(x))count(leftChild(x))\geq count(rightChild(x))

In the following graph, the left is a WBLT while the right isn’t.

Example

Properties of Leftist Trees

0x00

  • The rightmost path is a shortest from root to external node, that is shortest(root).

Example

0x01

  • Number of internal nodes is n2shortest(root)1n\geq 2^{shortest(root)}-1.

That’s because the first shortest(root) level of nodes constitute a full binary tree, and for a shortest(root) level full binary tree, the total number of nodes is the following.

i=1shortest(root)2i1=2shortest(x)1\displaystyle\sum_{i=1}^{shortest(root)}2^{i-1}=2^{shortest(x)}-1

And that’s why nn must be larger than 2shortest(root)12^{shortest(root)}-1. Also, we can infer the following by this inequality.

shortest(root)log2(n+1)shortest(root)\le\log_2(n+1)

0x02

  • Length of rightmost path is O(logn)O(\log n), where nn is the number of nodes in a leftist tree.

Leftist Trees as Priority Queue

Remember the what I said in the first of this part? We need to make an priority queue that is easy to be merged with other one! Since we have learned min heaps and max heaps before, and leftist tree is actually a type of heap, we have the following types of leftist trees.

  • Min Leftist Tree
    • Leftist tree that is a min tree. Used as a min priority queue.
  • Max Leftist Tree
    • Leftist tree that is a max tree. Used as a max priority queue.

Operations

Given that the limited space, I will only use min leftist trees to be the examples, and all the operation here will be considered for min leftist trees. Nevertheless, they can all be applied to max leftist tree just by some modification.

Insert(x, T1)

  • Create a min leftist tree T2 containing only x
  • Merge T1 and T2

Following is the animation of inserting a 13 into a min leftist tree.

Insert

DeleteMin(T)

  • Get subtrees of root, T_left and T_right
  • Delete the original root
  • Merge T_left and T_right

Following is the animation of deleting a minimum (6 in this case) from a min leftist tree.

DeleteMin

Meld(T1, T2) or Merge(T1, T2)

  • Phase 1: Top-down process
    • Maintaining the property of min tree
    • Going down along the rightmost paths in T1 or T2 and comparing their roots
    • For the tree with smaller root, going down to its right child
      • If no right child, attaching another tree as right subtree.
      • Else, comparing again
  • Phase 2: Bottom-up process
    • Maintaining the property of leftist tree
    • Climbing up through the rightmost path of the new tree
      • If not meet the definition of a leftist tree (HBLT or WBLT), interchanging the left and right subtrees of the node
  • Time Complexity is O(logm)O(\log m)
    • Length of rightmost path is O(logn)O(\log n), where nn is the number of nodes in a leftist tree
    • A merge operation moves down and climbs up along the rightmost paths of the two leftist trees.
    • mm is number of total elements in 2 leftist trees

The following is the animation of deleting a minimum from a min leftist tree, but please focus on the merge operation.

Merge

Initialize()

  • Create nn single node min leftist trees and place them in a FIFO queue
  • Repeatedly remove two min leftist trees from the FIFO queue, merge them, and put the resulting min leftist tree into the FIFO queue
  • The process terminates when only 1 min leftist tree remains in the FIFO queue
  • Time complexity is n+2×(1×n2+2×n4+3×n8+)=O(n)n+2\times(1\times\frac{n}{2}+2\times\frac{n}{4}+3\times\frac{n}{8}+\dots)=O(n)

Disjoint Sets

Intro

Let’s forget about the hospital scene, you are now a police officer. You know the relationship of 10 gangster, and No.1 & No.9 are suspects. How do you tell if they’re in a same gang? Let’s why we need disjoint sets! We can use this data structure to store the data like this.

There should Dbe no element appears in 2 different sets in the disjoint sets, which means a unique element should only appear once. Here are 2 examples.

S1={0,6,7,8}S2={1,4,9}S3={2,3,5}Disjoint setsS1={0,6,7,8,9}S2={0,1,4,9}S3={1,2,3,5}Not disjoint sets\begin{aligned} &S_1=\{0, 6, 7, 8\}\quad S_2=\{1, 4, 9\}\quad S_3=\{2, 3, 5\}\quad\text{Disjoint sets}\\ &S_1=\{0, 6, 7, 8, 9\}\quad S_2=\{0, 1, 4, 9\}\quad S_3=\{1, 2, 3, 5\}\quad\text{Not disjoint sets}\\ \end{aligned}

Data Representation of Disjoint Sets

To represent a disjoint set, we link the node from the children to the parent, and each root has a pointer to the set name.

Disjoint Set

Operations

Find(i)

  • Find the set containing the targeted element
  • Start at the node representing element ii and climb up the tree until the root is reached. Then return the element in the root
  • Using an integer array to store the parent of each element
  • Time complexity depends on the level of ii

Image

while (parent[i] >= 0){
    i = parent[i]; // move up the tree
}
return i;

Union(i, j)

  • Combine 2 disjoint sets into 1
  • ii & jj are the roots of different trees, so iji\ne j
  • For tree representation, we set the parent field of one root pointing to the other root
  • Which is making one tree as a subtree of the other. parent[j] = i
  • Time complexity is O(1)O(1)

G1={1,3}G2={6,7,8,9,10}G3=G1G2={1,3,6,7,8,9,10}G_1=\{1, 3\}\quad G_2=\{6, 7, 8, 9, 10\}\quad G_3=G_1\cup G_2=\{1, 3, 6, 7, 8, 9, 10\}

Union

Sequence of Union-Find Operations

union(1,0),find(0)union(2,1),find(0)union(N1,N2),find(0)\begin{aligned} &union(1, 0), find(0)\\ &union(2, 1), find(0)\\ &\quad\quad\quad\quad\vdots \\ &union(N-1, N-2), find(0) \end{aligned}

For each find(0), we trace from 0 to the root.

Image

  • Time complexity
    • Time to initailize parent[i] = 0 for all elements is O(n)O(n)
    • n1n-1 times of union(), each time takes O(1)O(1), so total is O(n)O(n)
    • n1n-1 times of find(), each time takes ii, so total is i=2ni=O(n2)\displaystyle\sum^n_{i=2}i=O(n^2)

How to avoid the creation of degenerate tree? Let’s see in next part of this chapter!

Weight Rule for Union(i, j)

  • Make tree with fewer number of elements a subtree of the other tree
  • The count of the new tree is the sum of the counts of the trees that are united

Weight Rule

Height Rule for Union(i, j)

  • Make tree with smaller height a subtree of the other tree
  • The height increases only when two trees of equal height are united

Height Rule

Collapsing Rule

Lemma 1

  • Suppose we start with single element trees and perform unions using either the height rule or the weight rule

  • The height of a tree with pp element is at most floor(log2p)+1floor(\log_2p)+1

  • Processing an intermixed sequence of u1u-1 unions and ff finds, the time complexity will be O(u+flogu)O(u+f\log u)

  • u1u-1 unions part, we generate a tree with uu nodes

  • ff finds part, it requires at most f×[floor(log2u)+1]f\times[floor(\log_2u)+1]

Improving Find(i) with Collapsing Rule

Collapsing rule means:

  • Make all nodes on find path point to tree root
  • Pay some more efforts this time, but the Find() operations in the future could save time
  • Slower this time, faster next time

Here’s an example, if we have a sequence find(7),find(7),find(7),find(7),find(7)find(7),find(7),find(7),find(7),find(7) to this tree.

Tree

  • Without collapsing rule
    • A find(7)find(7) requires climbing up 3 times
    • Total is 5×3=155\times 3=15 moves
  • With collapsing rule
    • The first find(7)find(7) requires climbing up 3 times
    • The remainding find(7)find(7) only needs to climbing up once
    • Total is 3+4×1=73+4\times 1=7 moves

Lemma 2 (By Tarjan and Van Leeuwen)

Let T(f,u)T(f, u) be the maximum time required to process any intermixed sequence of ff finds and uu unions. Assuming that un2u\geq\frac{n}{2}

k1×(n+f×α(f+n,n))T(f,u)k2×(n,f×α(f+n,n))k_1\times(n+f\times\alpha(f+n,n))\le T(f,u)\le k_2\times(n, f\times\alpha(f+n,n))

where k1k_1 and k2k_2 are constants, nn is the number of elements, and α(f+n,n)\alpha(f+n, n) is inverse Ackermann function.

  • Ackermann function
    • A function of 2 parameters whose value grows very, very fast
  • Inverse Ackermann function
    • A function of 2 parameters whose value grows very, very slow

Time Complexity

Those bounds in Lemma 2 can be applied when we start with singleton sets and use either the weight or height rule for unions and collapsing rule for a find. The time complexity will be O(n+f)O(n+f), which is very efficient.

Graphs

Credits