[Database] Implementation of an Entendible Hash Index

This blog documents how I went about implementing an exntendible hash index for a database, from scratch in modern C++. Hash table sits in the core of many ascpects of a database, from building index for fast lookup to effecient implementation of the hash join executor.

Per CMU policy, Bustub solution should not be made public. Hence, only code snippets and basic logic are shown here. If you need to see the code for a specific reason, email me: yy4108@nyu.edu.

I am again on a trip, but this time alone. I am waiting for my plane to Changsha. With nothing better to do now, let’s try to stay productive.

I believe that building a hash index is significantly easier than building a B+ tree index. Nevertheless, it takes considerable effort to get it work correctly. The main challenge would be implementing the grow and shrink function of the hash table, and understanding what “extendible” really means.

RAII Read/Write Page Guards

One difficulty when I tried to implement the B+ tree index is that I always forget to unpin/unlatch a page after use. Forget to unlatch a page will cause correctness issue, including a deadlock potentially. Forgetting to unpin a page will cause performance degradation, as less pages can be swapped in. We will effectively have a much smaller buffer pool when we forget to unpin: the buffer pool manager cannot swap out the unused pages anymore.

To mitgate this issue, we need to take advantage of the RAII feature of modern C++. We want to make sure that once a page goes out scope of usage, it is automatically unpinned/unlatched.

Let’s first implement a basic page guard, and later we will extend it to a Read Page Guard and a Write Page Guard, by holding extra read/write latches.

1
2
3
4
5
6
7
8
class BasicPageGuard {
public:
BasicPageGuard() = default;

BasicPageGuard(BufferPoolManager *bpm, Page *page) : bpm_(bpm), page_(page) {}

BasicPageGuard(const BasicPageGuard &) = delete;
auto operator=(const BasicPageGuard &) -> BasicPageGuard & = delete;

In the header, we can see that the BasicPageGuard has the default constructor and an easy customized constructor. The copy constructor and assignment operator have been deleted: now that we are implementing an RAII-style class, enabling the copy constructor will cause additional troubles, rendering the RAII semantics useless. RAII objects can only be moved, but not copied.

Now let’s talk about how to enable the move semantics in the page guard class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
BasicPageGuard::BasicPageGuard(BasicPageGuard &&that) noexcept
: bpm_(that.bpm_), page_(that.page_), is_dirty_(that.is_dirty_) {
that.bpm_ = nullptr;
that.page_ = nullptr;
that.is_dirty_ = false;
}

auto BasicPageGuard::operator=(BasicPageGuard &&that) noexcept -> BasicPageGuard & {
if (this != &that) {
Drop();
// Transfer ownership
bpm_ = that.bpm_;
page_ = that.page_;
is_dirty_ = that.is_dirty_;

// Invalidate the moved-from object
that.bpm_ = nullptr;
that.page_ = nullptr;
that.is_dirty_ = false;
}

return *this;
}

void BasicPageGuard::Drop() {
if (page_ != nullptr && bpm_ != nullptr) {
bpm_->UnpinPage(page_->GetPageId(), is_dirty_);
}
bpm_ = nullptr;
page_ = nullptr;
is_dirty_ = false;
}

BasicPageGuard::~BasicPageGuard() { Drop(); }; // NOLINT

The most important thing when we are implementing move semantics is that we have to make sure that the original object is no longer usable. We need to make sure that the parameter’s attirbutes are changed to nullptr, so that we can no longer access the moved object via the original reference.

Similar procedure is undertaken when impelenting the assignmet move operator. Apart from changing the moved object’s attributes to nullptr, we also need to make sure that the object this is pointing to is deallocated effectively, which, in the context of page, dropped, as we can see from the implementation of the destructor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ReadPageGuard::ReadPageGuard(ReadPageGuard &&that) noexcept : guard_(std::move(that.guard_)){};

auto ReadPageGuard::operator=(ReadPageGuard &&that) noexcept -> ReadPageGuard & {
if (this != &that) {
Drop();
guard_ = std::move(that.guard_);
}
return *this;
}

void ReadPageGuard::Drop() {
if (guard_.page_ != nullptr && guard_.bpm_ != nullptr) {
guard_.page_->RUnlatch();
guard_.Drop();
guard_.page_ = nullptr;
}
}

ReadPageGuard::~ReadPageGuard() { Drop(); } // NOLINT

The implementation of a basic page guard can be easily extended to a ReadPageGuard and a WritePageGuard. The only thing we need to pay attention to is that when we are dropping a read page and a write page, we need to unlatch the page itself. Here I only include the ReadPage, WritePage is very similar. The page should be automatically dropped when the page goes out of scope.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
auto BasicPageGuard::UpgradeRead() -> ReadPageGuard {
if (page_ == nullptr || bpm_ == nullptr) {
throw std::runtime_error("Cannot upgrade a BasicPageGuard that does not guard any page.");
}

ReadPageGuard read_guard(bpm_, page_);
page_->RLatch();

bpm_ = nullptr;
page_ = nullptr;
is_dirty_ = false;

return read_guard;
}

auto BasicPageGuard::UpgradeWrite() -> WritePageGuard {
if (page_ == nullptr || bpm_ == nullptr) {
throw std::runtime_error("Cannot upgrade a BasicPageGuard that does not guard any page.");
}

WritePageGuard write_guard(bpm_, page_);

page_->WLatch();

bpm_ = nullptr;
page_ = nullptr;
is_dirty_ = false;

return write_guard;
}

When we are upgrading a basic page to one that can read/write, we create a new ReadPageGuard/WritePageGuard, make the basic page’s attributes NULL, and take on the corresponding read/write latch. The logic is very similar for both Read and Write pages.

Now that we have implemented the page guard, we can update the buffer pool manager. We can now fetch page using the new page guard. This way the page should be automatically dropped when the page goes out of scope.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
auto BufferPoolManager::FetchPageBasic(page_id_t page_id) -> BasicPageGuard {
Page *fetched_page = FetchPage(page_id);
return {this, fetched_page};
}

auto BufferPoolManager::FetchPageRead(page_id_t page_id) -> ReadPageGuard {
Page *fetched_page = FetchPage(page_id);
fetched_page->RLatch();
return {this, fetched_page};
}

auto BufferPoolManager::FetchPageWrite(page_id_t page_id) -> WritePageGuard {
Page *fetched_page = FetchPage(page_id);
fetched_page->WLatch();
return {this, fetched_page};
}

auto BufferPoolManager::NewPageGuarded(page_id_t *page_id) -> BasicPageGuard {
Page *new_page = NewPage(page_id);
return {this, new_page};
}

This part should be relatively easy. Just fetch the page/new a page, take the corresponding read/write lock if necessary, and return a constructed page guard.

Extendible Hash Table Pages

Now that I am back to Shanghai, my flight to New York will be tomorrow. It is with some trepidation that I set feet on the dangerous soil of NYC. I am too worried to do anything else, so let us document what we have accomplished. The rule is similar, for Bustub, ***only basic logic is shown, in accordance with the open source agreement. Full code will not be provided, only high level idea. My repo is private. If you want to see the code, email me with a reason. ***

Before go on and start implementing the extendible hash table algorithm, we need to first define hash table pages. There are 3 types of hash table pages we need to define:

  • Hash table header page
  • hash table directory page
  • hash table bucket page

How extendible hash table works?

These three page types form a hierarchy of look-up. When a look up key is needed, we first hash it, looking through the header page and find the corresponding directory page with the hash value. Then, using the value we found in the directory page (again with the hashed value of the given key), we go into the corresponding bucket page. With a linear scan (or sth smarter than that) of the bucket page, which is the page storing the actual page reference, we will be able to locate the actual page that store the data we want.

The most challengin part should not be defining the pages, but rather the growing and shrinking of the directory page table. So for this part, it is still relatively easy.

Header Page

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void ExtendibleHTableHeaderPage::Init(uint32_t max_depth) {
// throw NotImplementedException("ExtendibleHTableHeaderPage is not implemented");
max_depth_ = max_depth;
memset(directory_page_ids_, 0xff, sizeof(directory_page_ids_));
}

auto ExtendibleHTableHeaderPage::HashToDirectoryIndex(uint32_t hash) const -> uint32_t {
if (max_depth_ == 0) {
return 0;
}
uint32_t shift = 32 - max_depth_;
uint32_t directory_index = hash >> shift;
// printf("hash: %d, shift: %d, directory_index: %d, max_depth: %d\n", hash, shift, directory_index, max_depth_);
return directory_index;
}

auto ExtendibleHTableHeaderPage::GetDirectoryPageId(uint32_t directory_idx) const -> uint32_t {
return directory_page_ids_[directory_idx];
}

void ExtendibleHTableHeaderPage::SetDirectoryPageId(uint32_t directory_idx, page_id_t directory_page_id) {
directory_page_ids_[directory_idx] = directory_page_id;
}

auto ExtendibleHTableHeaderPage::MaxSize() const -> uint32_t { return 1 << max_depth_; }

The implementation, as I said, should be naive. Above, I provide a high level idea. We use the biggest bits (top bits) to determine the directory page we need. After getting the idx, we use a simple vector to fetch the corresponding page id. Later when we use this code, we follow these steps:

  1. call the hash function to get the hash value of the key
  2. call HashToDirectoryIndex() to get the directory idx
  3. call GetDirectoryPageId() to get the page id of the directory page
  4. Use the buffer pool manager we have implemented before to fetch the actual directory page
  5. Repeat a very similar process within the directory page

Directory Page

The directory page is a little bit more complex. We need to take shrinking and growing into consideration now. We now have a global depth and an array of local depth. Let’s differentiate their usage below

Global depth is used to find the bucket page. We use the global depth as a mask, when masking the hash value, we can find the corresponding bucket page we need.

Here is an example of the usage of global depth.

1
2
3
4
5
auto ExtendibleHTableDirectoryPage::HashToBucketIndex(uint32_t hash) const -> uint32_t {
uint32_t mask = (1 << global_depth_) - 1;
uint32_t bucket_index = hash & mask;
return bucket_index;
}

Local depth is the specific depth of a certain page. When the page is full, we need to increment the local depth. If all local depths are incremented, we need to update the global depth. Local depth is responsible for the extendibility of the hash table. When hash table becomes full, increment local depth to grow. Conversly, decrement to shrink.

Here is how local depth is used in the directory page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
auto ExtendibleHTableDirectoryPage::GetSplitImageIndex(uint32_t bucket_idx) const -> uint32_t {
uint8_t local_depth = local_depths_[bucket_idx];
uint32_t depth_bit = 1 << (local_depth - 1);
uint32_t split_image_index = (bucket_idx ^ depth_bit) & ((1 << global_depth_) - 1);
return split_image_index;
}


void ExtendibleHTableDirectoryPage::IncrGlobalDepth() {
if (global_depth_ >= max_depth_) {
return;
}

for (int i = 0; i < (1 << global_depth_); i++) {
bucket_page_ids_[(1 << global_depth_) + i] = bucket_page_ids_[i];
local_depths_[(1 << global_depth_) + i] = local_depths_[i];
}
global_depth_++;
}

auto ExtendibleHTableDirectoryPage::CanShrink() -> bool {
uint32_t num_buckets = 1 << global_depth_;
bool can_shrink = true;
for (uint32_t i = 0; i < num_buckets; i++) {
if (local_depths_[i] == global_depth_) {
can_shrink = false;
break;
}
}
return can_shrink;
}

There are more, but here, we only focus on the high level idea.


Bucket Page

The bucket page is relatively easy. It is the place where the actual key value pairs should be stored. Before diving deep into details, we can take a look at the memeber variables of this class.

1
2
3
4
private:
uint32_t size_;
uint32_t max_size_;
MappingType array_[HTableBucketArraySize(sizeof(MappingType))];

The most important data structure for a bucket page is the array_. This is where the actual kv pairs are stored. Everthing else follows easily from this fact.

When init the bucket page:

  • set the size to 0
  • set the max size according to the parameter
  • populate the array_ with empty kv pairs

When doing a lookup:

  • A linear scan suffices
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    template <typename K, typename V, typename KC>
    auto ExtendibleHTableBucketPage<K, V, KC>::Lookup(const K &key, V &value, const KC &cmp) const -> bool {
    for (uint32_t i = 0; i < size_; i++) {
    if (cmp(array_[i].first, key) == 0) {
    value = array_[i].second;
    return true;
    }
    }
    return false;
    }

When doing Insert:

  • Return false if the max size is at capacity (so that we can grow the hash table)
  • Return false if there is duplicate key
  • Return true and insert the kv pair at size_++

When removing:

moving everything that comes after one slot forward.


The extendible hash table design and implementation

Author

Yuncheng Yao

Posted on

2024-01-13

Updated on

2024-03-29

Licensed under