Insertion in a B-tree


B-trees are self-balancing tree data structures that maintain sorted data and allow for efficient insertion, deletion, and searching operations. In a B-tree, each node can hold multiple keys and children, making it efficient for systems that need to handle large amounts of data, such as databases and file systems.

In this section, we will dive deep into the process of insertion in a B-tree, covering the core concepts, steps, and a Python code implementation to demonstrate the process.


Properties of a B-tree

Before we dive into the insertion process, let's review the key properties of a B-tree of order m:

  1. Every node can contain a maximum of m - 1 keys and can have up to m children.
  2. Every non-leaf node must have at least ⌈m / 2⌉ children.
  3. Leaf nodes can hold fewer than ⌈m / 2⌉ keys, but they are allowed to hold at least one key.
  4. All leaf nodes are at the same level (the tree is balanced).

When a new key is inserted into a B-tree, it might cause the node to exceed its maximum allowed number of keys. This triggers a split operation that propagates upwards if necessary.


Insertion Process in a B-tree

Step-by-step Insertion:

  1. Start at the Root:

    • Begin the insertion process at the root node of the B-tree.
  2. Find the Correct Leaf Node:

    • Traverse down the tree to find the appropriate leaf node where the new key should be inserted. This is done by comparing the key to be inserted with the keys already present in the node.
    • Depending on the comparison, move to the left, right, or middle child (if it's an internal node).
  3. Insert the Key in the Leaf Node:

    • If the leaf node where the key needs to be inserted has enough space (i.e., it contains fewer than m-1 keys), simply insert the key into the appropriate position and return.
  4. Split the Node if Full:

    • If the leaf node is full (contains m-1 keys), it must be split. The node is split into two nodes:
      • The middle key is promoted to the parent node.
      • The left half remains in the original node.
      • The right half goes into the new node.
    • This split may propagate upwards, and the parent node might also need to be split if it becomes full.
  5. Repeat Until the Root is Split:

    • If the root node is split, a new root node is created, and the height of the tree increases.

Detailed Algorithm for Insertion

The insertion algorithm for a B-tree can be divided into two main operations:

  1. Inserting into a Non-Full Node:

    • If the node where the key needs to be inserted has space, it can simply be added in the correct position while maintaining the order.
  2. Splitting a Full Node:

    • When a node becomes full (i.e., it has m-1 keys), the node is split into two, and the middle key is promoted to the parent node.
    • If the parent node also becomes full, the same process is repeated recursively.

Python Code for Insertion in a B-tree

Here’s a Python implementation that demonstrates how to insert a key into a B-tree.

class BTreeNode:
    def __init__(self, t, leaf=False):
        self.t = t  # Minimum degree (defines the range for the number of keys)
        self.leaf = leaf  # True if node is leaf, False otherwise
        self.keys = []  # List of keys in the node
        self.children = []  # List of child nodes

class BTree:
    def __init__(self, t):
        self.root = BTreeNode(t, True)  # Initialize root as a leaf node
        self.t = t  # Minimum degree of the B-tree

    # Insertion function that initiates the insertion process
    def insert(self, key):
        root = self.root
        # If the root node is full, split it and create a new root
        if len(root.keys) == (2 * self.t) - 1:
            new_node = BTreeNode(self.t, False)
            new_node.children.append(self.root)  # Make the current root a child of the new node
            self.split(new_node, 0)
            self.root = new_node

        # Insert the key into the non-full root node
        self.insert_non_full(self.root, key)

    # Function to split a full node
    def split(self, parent, index):
        full_node = parent.children[index]
        new_node = BTreeNode(self.t, full_node.leaf)
        mid = self.t - 1
        parent.keys.insert(index, full_node.keys[mid])  # Promote middle key to the parent
        parent.children.insert(index + 1, new_node)

        # Move the right half of the full_node to the new_node
        new_node.keys = full_node.keys[mid + 1:]
        full_node.keys = full_node.keys[:mid]

        # If it's not a leaf node, move the corresponding children
        if not full_node.leaf:
            new_node.children = full_node.children[mid + 1:]
            full_node.children = full_node.children[:mid + 1]

    # Function to insert a key into a node that is not full
    def insert_non_full(self, node, key):
        i = len(node.keys) - 1
        # If the node is a leaf, insert the key directly
        if node.leaf:
            while i >= 0 and key < node.keys[i]:
                i -= 1
            node.keys.insert(i + 1, key)
        else:
            # Find the child to which the key should be inserted
            while i >= 0 and key < node.keys[i]:
                i -= 1
            i += 1
            child = node.children[i]

            # If the child is full, split it
            if len(child.keys) == (2 * self.t) - 1:
                self.split(node, i)
                # After split, the key might be in the newly created child, so we adjust the index
                if key > node.keys[i]:
                    i += 1

            # Recursively insert the key into the child
            self.insert_non_full(node.children[i], key)

    # In-order traversal to print keys in the tree (sorted order)
    def inorder_traversal(self, node):
        if node:
            for i in range(len(node.keys)):
                if not node.leaf:
                    self.inorder_traversal(node.children[i])
                print(node.keys[i], end=" ")
            if not node.leaf:
                self.inorder_traversal(node.children[len(node.keys)])

# Example Usage
btree = BTree(3)  # Create a B-tree with minimum degree 3
keys = [10, 20, 5, 6, 12, 30, 7, 17]

for key in keys:
    btree.insert(key)

# In-order Traversal (prints keys in sorted order)
print("In-order Traversal of B Tree:")
btree.inorder_traversal(btree.root)

Output:

In-order Traversal of B Tree:
5 6 7 10 12 17 20 30

Explanation of the Code

  • BTreeNode Class: Represents a node in the B-tree. It contains a list of keys and a list of child nodes. Each node has a minimum degree t and a boolean flag leaf indicating if it’s a leaf node.

  • BTree Class: Represents the B-tree itself. It has a method insert to insert a key, and if the root is full, it creates a new root. The split method is used to split full nodes, and the insert_non_full method inserts keys into non-full nodes.

  • In-order Traversal: The inorder_traversal method prints the keys in sorted order. Since B-trees maintain sorted data, the in-order traversal of the tree will always print the keys in increasing order.