B-trees are self-balancing tree data structures that maintain sorted data and allow for efficient insertion, deletion, and searching operations. In a B-tree, each node can hold multiple keys and children, making it efficient for systems that need to handle large amounts of data, such as databases and file systems.
In this section, we will dive deep into the process of insertion in a B-tree, covering the core concepts, steps, and a Python code implementation to demonstrate the process.
Before we dive into the insertion process, let's review the key properties of a B-tree of order m
:
m - 1
keys and can have up to m
children.⌈m / 2⌉
children.⌈m / 2⌉
keys, but they are allowed to hold at least one key.When a new key is inserted into a B-tree, it might cause the node to exceed its maximum allowed number of keys. This triggers a split operation that propagates upwards if necessary.
Start at the Root:
Find the Correct Leaf Node:
Insert the Key in the Leaf Node:
m-1
keys), simply insert the key into the appropriate position and return.Split the Node if Full:
m-1
keys), it must be split. The node is split into two nodes:
Repeat Until the Root is Split:
The insertion algorithm for a B-tree can be divided into two main operations:
Inserting into a Non-Full Node:
Splitting a Full Node:
m-1
keys), the node is split into two, and the middle key is promoted to the parent node.Here’s a Python implementation that demonstrates how to insert a key into a B-tree.
class BTreeNode:
def __init__(self, t, leaf=False):
self.t = t # Minimum degree (defines the range for the number of keys)
self.leaf = leaf # True if node is leaf, False otherwise
self.keys = [] # List of keys in the node
self.children = [] # List of child nodes
class BTree:
def __init__(self, t):
self.root = BTreeNode(t, True) # Initialize root as a leaf node
self.t = t # Minimum degree of the B-tree
# Insertion function that initiates the insertion process
def insert(self, key):
root = self.root
# If the root node is full, split it and create a new root
if len(root.keys) == (2 * self.t) - 1:
new_node = BTreeNode(self.t, False)
new_node.children.append(self.root) # Make the current root a child of the new node
self.split(new_node, 0)
self.root = new_node
# Insert the key into the non-full root node
self.insert_non_full(self.root, key)
# Function to split a full node
def split(self, parent, index):
full_node = parent.children[index]
new_node = BTreeNode(self.t, full_node.leaf)
mid = self.t - 1
parent.keys.insert(index, full_node.keys[mid]) # Promote middle key to the parent
parent.children.insert(index + 1, new_node)
# Move the right half of the full_node to the new_node
new_node.keys = full_node.keys[mid + 1:]
full_node.keys = full_node.keys[:mid]
# If it's not a leaf node, move the corresponding children
if not full_node.leaf:
new_node.children = full_node.children[mid + 1:]
full_node.children = full_node.children[:mid + 1]
# Function to insert a key into a node that is not full
def insert_non_full(self, node, key):
i = len(node.keys) - 1
# If the node is a leaf, insert the key directly
if node.leaf:
while i >= 0 and key < node.keys[i]:
i -= 1
node.keys.insert(i + 1, key)
else:
# Find the child to which the key should be inserted
while i >= 0 and key < node.keys[i]:
i -= 1
i += 1
child = node.children[i]
# If the child is full, split it
if len(child.keys) == (2 * self.t) - 1:
self.split(node, i)
# After split, the key might be in the newly created child, so we adjust the index
if key > node.keys[i]:
i += 1
# Recursively insert the key into the child
self.insert_non_full(node.children[i], key)
# In-order traversal to print keys in the tree (sorted order)
def inorder_traversal(self, node):
if node:
for i in range(len(node.keys)):
if not node.leaf:
self.inorder_traversal(node.children[i])
print(node.keys[i], end=" ")
if not node.leaf:
self.inorder_traversal(node.children[len(node.keys)])
# Example Usage
btree = BTree(3) # Create a B-tree with minimum degree 3
keys = [10, 20, 5, 6, 12, 30, 7, 17]
for key in keys:
btree.insert(key)
# In-order Traversal (prints keys in sorted order)
print("In-order Traversal of B Tree:")
btree.inorder_traversal(btree.root)
In-order Traversal of B Tree:
5 6 7 10 12 17 20 30
BTreeNode Class: Represents a node in the B-tree. It contains a list of keys and a list of child nodes. Each node has a minimum degree t
and a boolean flag leaf
indicating if it’s a leaf node.
BTree Class: Represents the B-tree itself. It has a method insert
to insert a key, and if the root is full, it creates a new root. The split
method is used to split full nodes, and the insert_non_full
method inserts keys into non-full nodes.
In-order Traversal: The inorder_traversal
method prints the keys in sorted order. Since B-trees maintain sorted data, the in-order traversal of the tree will always print the keys in increasing order.