[Database] Bustub Buffer Pool Manager Implementation

Implement a buffer pool manager for a DBMS, from scratch, in C++.

Per CMU policy, Bustub solution should not be made public. Hence, only code snippets and logic are shown here. If you need to see the code, email me: yy4108@nyu.edu.

I am on my train to Qingdao right now, to see a beautiful girl. I am not so sure about how I feel, excited, but at the same time nervous. It is even freezing in Shanghai recently, let alone Qingdao. It is my first time, travelling so far, just to see someone.

I want to be productive on a train, but it is not easy. Writing something about buffer pool manager will probably calm me down.

We will follow the guides and instruction of Andy Pavlo, and his CMU 15-445 course, to implement a buffer pool manager for a database management system, in C++.

First, let me prove myself with a Gradescoppe screenshot.

You can indeed be convinced that I know what I will be talking about.

LRU-K Replacement Policy

Implementing an eviction policy is always the first step in any buffer pool manager implementation. The reason being that we always need to figure out a way to decide which page (frame) to evict when the buffer pool is full.

Per the instruction of CMU, here we will be implementing an LRU-K policy. First let’s take a look at what is expected in an LRU replacer.

The LRU-K algorithm evicts a frame whose backward k-distance is maximum of all frames in the replacer. Backward k-distance is computed as the difference in time between current timestamp and the timestamp of kth previous access. A frame with fewer than k historical accesses is given +inf as its backward k-distance. When multiple frames have +inf backward k-distance, the replacer evicts the frame with the earliest overall timestamp (i.e., the frame whose least-recent recorded access is the overall least recent access, overall, out of all frames).

The maximum size for the LRUKReplacer is the same as the size of the buffer pool since it contains placeholders for all of the frames in the BufferPoolManager. However, at any given moment, not all the frames in the replacer are considered to be evictable. The size of LRUKReplacer is represented by the number of evictable frames. The LRUKReplacer is initialized to have no frames in it. Then, only when a frame is marked as evictable, replacer’s size will increase.

With the basic idea in mind, let’s discuss how to implement in detail.

1
LRUKReplacer::LRUKReplacer(size_t num_frames, size_t k) : replacer_size_(num_frames), k_(k) {}

First, implement the constructor. num_frames is the number of total frames available, and the k_ here should be the number of evictable frames.

The main logic of LRU-K is in the Evict function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
auto LRUKReplacer::Evict(frame_id_t *frame_id) -> bool {
std::lock_guard<std::mutex> lock(latch_);

if (node_store_.empty()) {
// there is no evictable frame
return false;
}

size_t max_distance = 0;
frame_id_t candidate = -1;
bool found_infinite = false;

for (auto &pair : node_store_) {
auto &node = pair.second;
if (!node.is_evictable_ || node.history_.size() < k_) {
// if it has less than k history elements, its k distance is set to +inf
if (node.is_evictable_ && node.history_.size() < k_) {
// there hasn't been an element whose k distance is +inf
if (!found_infinite) {
max_distance = std::numeric_limits<size_t>::max();
candidate = node.fid_;
found_infinite = true;
} else {
// based on LRU, if there are multiple frames with +inf k distance
// evict the one with the smallest timestamp
if (node.history_.front() < node_store_[candidate].history_.front()) {
candidate = node.fid_;
}
}
}
// continue the execution, skip this node
continue;
}

size_t distance = current_timestamp_ - node.history_.front();

if (distance > max_distance) {
max_distance = distance;
candidate = node.fid_;
}
}

if (candidate != -1) {
*frame_id = candidate;
node_store_.erase(candidate);
curr_size_--;
return true;
}
return false;
}

The logic is simple:

  1. check all nodes that are marked evictable by the replacer.
  2. If it has less than k elements, then its k distance is set to infinite (the most likely to be replaced).
  3. If we have multiple +inf distance nodes, then fall back to the original LRU policy. Just pick the oldest among these +inf nodes. (This is the purpose that found_infinite is servering)
  4. Otherwise, if all evictable pages have more than k access records, follow the LRU-K policy, and pick the oldest.

RecordAccess function is also important. It should be called anytime we touch a frame (read/write it from the buffer pool manager).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void LRUKReplacer::RecordAccess(frame_id_t frame_id, [[maybe_unused]] AccessType access_type) {
std::lock_guard<std::mutex> lock(latch_);

// If frame id is invalid (ie. larger than replacer_size_), throw an exception.
if (frame_id >= static_cast<int32_t>(replacer_size_)) {
throw Exception(ExceptionType::OUT_OF_RANGE, "frame id is out of range");
}

++current_timestamp_;

// Check if the frame_id exists in node_store_.
auto it = node_store_.find(frame_id);
if (it == node_store_.end()) {
// If not found, initialize a new LRUKNode and insert it into node_store_.
LRUKNode new_node;
new_node.fid_ = frame_id;
new_node.history_.push_back(current_timestamp_);
new_node.k_ = k_; // Assuming k_ is a property you want to set during initialization.
node_store_.emplace(frame_id, std::move(new_node));
} else {
// If found, update the existing node.
auto &node = it->second;
if (node.history_.size() >= k_) {
node.history_.pop_front(); // Ensure only the last k timestamps are kept.
}
node.history_.push_back(current_timestamp_);
}
}
  1. fist check if the page that you are trying to access is actually legal. If not, raise an exception.
  2. If the LRU node is already in the Hash Map, just update the node history (evict the oldest if there is more than K accesses)
  3. if not found, init a new node element in the hash map.

SetEvictable is simple, you just need to check if the frame_id you are setting is legal and can actually be found in the hash map. Then you set evictablity based on the parameter. Finally, remember to update the current_size_, which represents the number of evictable pages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void LRUKReplacer::SetEvictable(frame_id_t frame_id, bool set_evictable) {
std::lock_guard<std::mutex> lock(latch_);

// If frame id is invalid (ie. larger than replacer_size_), throw an exception.
if (frame_id >= static_cast<int32_t>(replacer_size_)) {
throw Exception(ExceptionType::OUT_OF_RANGE, "frame id is out of range");
}

auto it = node_store_.find(frame_id);
if (it == node_store_.end()) {
throw Exception(ExceptionType::OUT_OF_RANGE, "Frame ID does not exist, cannot find the frame.");
}

auto &node = it->second;
if (node.is_evictable_ != set_evictable) {
node.is_evictable_ = set_evictable;
curr_size_ += set_evictable ? 1 : -1;
}
}

The last componenet of an LRU is a Remove function, which removes everything from the replacer related with a certain page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
oid LRUKReplacer::Remove(frame_id_t frame_id) {
std::lock_guard<std::mutex> lock(latch_);

// If frame id is invalid (ie. larger than replacer_size_), throw an exception.
if (frame_id >= static_cast<int32_t>(replacer_size_)) {
throw Exception(ExceptionType::OUT_OF_RANGE, "frame id is out of range");
}

auto it = node_store_.find(frame_id);
if (it == node_store_.end()) {
return;
}

auto &node = it->second;
if (node.is_evictable_) {
curr_size_--;
}
node_store_.erase(it);
}
  1. check legal
  2. find the actual element
  3. update the hash map

Disk Scheduler and Manager

The disk scheduler can be used by other components (in this case, your BufferPoolManager in Task #3) to queue disk requests, represented by a DiskRequest struct (already defined in src/include/storage/disk/disk_scheduler.h). The disk scheduler will maintain a background worker thread which is responsible for processing scheduled requests.

The disk scheduler will utilize a shared queue to schedule and process the DiskRequests. One thread will add a request to the queue, and the disk scheduler’s background worker will process the queued requests. We have provided a Channel class in src/include/common/channel.h to facilitate the safe sharing of data between threads, but feel free to use your own implementation if you find it necessary.

This is actually quite easy, and I think the main challenge is to familarize yourself with the Promise and Future in C++ 17. (Cool stuff, but reminds me deeply of the trauma I had with JavaScript’s Promise and Future).

The main challenge is to implement a thread safe channel construct, which has been provided already. The rest is very straightforward, as long as you are familiar with the move semantics.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void DiskScheduler::Schedule(DiskRequest r) { request_queue_.Put(std::make_optional(std::move(r))); }

void DiskScheduler::StartWorkerThread() {
while (true) {
auto request = request_queue_.Get();
if (!request.has_value()) {
break;
}
if (request->is_write_) {
disk_manager_->WritePage(request->page_id_, request->data_);
} else {
disk_manager_->ReadPage(request->page_id_, request->data_);
}
request->callback_.set_value(true);
}
}
1
2
3
4
struct DiskRequest {
// some other code
std::promise<bool> callback_;
};

Since the DiskRequest contains a Promise object, it cannot be copied, but only moved to the shared queue. After the thread has finished its job, it sets the flag in the callback Promise.

Buffer Pool Manager

This is the main challenge. It is intricate, and delicate, and very hard to debug. It took me several hours to finally get it right (and I an still not sure which part that I touched made things come to the right track).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
auto BufferPoolManager::NewPage(page_id_t *page_id) -> Page * {
std::lock_guard<std::mutex> lock(latch_);

frame_id_t frame_id = -1;

if (!free_list_.empty()) {
// get the page directly from the free list
frame_id = free_list_.front();
free_list_.pop_front();
} else if (replacer_->Evict(&frame_id)) {
Page &page = pages_[frame_id];
if (page.IsDirty()) {
FlushPage(page.GetPageId());
}
page_table_.erase(page.GetPageId());
} else {
return nullptr;
}

*page_id = AllocatePage();

Page &page = pages_[frame_id];
page.ResetMemory();
page.page_id_ = *page_id;
page.is_dirty_ = false;
page.pin_count_ = 1;

page_table_[*page_id] = frame_id;

replacer_->Remove(frame_id);
replacer_->RecordAccess(frame_id, AccessType::Unknown);
replacer_->SetEvictable(frame_id, false);
return &page;
}

This function allocates a new page in the buffer pool manager. The logic is that:

  1. If there is a page in the free list, allocate directly.
  2. otherwise we need to evict a oage from the LRU replacer (Do remember to detach the original evicted page, delete it from page table, and remove it from the replacer, otherwise the LRU-K policy will not be functioning correctly)
  3. don’t forget that if the page is dirty, we need to flush it to disk.
  4. initialize the new page with a pin count, a dirty flag, a new page id(unique to this sepcific buffer pool manager), and touch the page with the recordAccess.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
auto BufferPoolManager::FetchPage(page_id_t page_id, [[maybe_unused]] AccessType access_type) -> Page * {
std::lock_guard<std::mutex> lock(latch_);

auto page_table_it = page_table_.find(page_id);

if (page_table_it != page_table_.end()) {
// page is in the buffer pool
// printf("Fetching page first phase, page found in buffer pool\n");
frame_id_t frame_id = page_table_it->second;
Page &page = pages_[frame_id];
page.pin_count_++;
replacer_->RecordAccess(frame_id, access_type);
replacer_->SetEvictable(frame_id, false);
return &page;
}

printf("Fetching page first phase, page not found in buffer pool\n");

// page is not in the buffer pool
frame_id_t frame_id = -1;

if (!free_list_.empty()) {
// get the page directly from the free list
frame_id = free_list_.front();
free_list_.pop_front();
printf("get the page from the free list\n");
} else if (replacer_->Evict(&frame_id)) {
printf("evict the page from the replacer, frame id is %d\n", frame_id);
Page &page = pages_[frame_id];
if (page.IsDirty()) {
printf("the page evicted is dirty\n");
FlushPage(page.GetPageId());
}
page_table_.erase(page.GetPageId());

// printf("the page is getting del\n");
// DeletePage(page.GetPageId());
// printf("deletion done\n");
} else {
return nullptr;
}

// reset the metadata of the new page
Page &page = pages_[frame_id];
page.ResetMemory();
page.page_id_ = page_id;
page.pin_count_ = 1; // Pin the new page.
page.is_dirty_ = false;

// schedule a read request
DiskRequest read_request;
read_request.is_write_ = false;
read_request.data_ = page.data_;
read_request.page_id_ = page_id;

// create the Promise and get its future
auto promise = disk_scheduler_->CreatePromise();
auto future = promise.get_future();

// set the Promise to the callback
read_request.callback_ = std::move(promise);

// schedule the read request
disk_scheduler_->Schedule(std::move(read_request));

// wait for the read request to finish
future.wait();

page_table_[page_id] = frame_id;

printf("Fetching page last phase\n");

replacer_->Remove(frame_id);
replacer_->RecordAccess(frame_id, access_type);
replacer_->SetEvictable(frame_id, false);
return &page;

return nullptr;
}

FetchPage implementation shares exactly the same logic as the NewPage, the only difference being that after fetching a free page, we need to also fetch the page content from the disk, using the Promise and future. Here we only care about correctness, so wait until IO is finished.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
auto BufferPoolManager::UnpinPage(page_id_t page_id, bool is_dirty, [[maybe_unused]] AccessType access_type) -> bool {
std::lock_guard<std::mutex> lock(latch_);
frame_id_t frame_id = -1;

auto it = page_table_.find(page_id);
if (it == page_table_.end()) {
return false;
}

frame_id = it->second;

Page &page = pages_[frame_id];

if (page.pin_count_ <= 0) {
return false;
}

page.pin_count_--;

if (page.pin_count_ == 0) {
replacer_->SetEvictable(frame_id, true);
}

if (is_dirty) {
page.is_dirty_ = true;
// FlushPage(page_id);
}

return true;
}

The thing that I got wrong with this unpin funciton initially is that here we are not writing to disk, we are only unpinning. To achieve good performance, we want to delay IO as much as possible, instead of writing to the disk everyting we unpins.

Flush page is easy, just write to disk, regardless of the dirty bit. The dirty bit will be checked by upper layers. But we do need to clear the dirty flag after writing to disk.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
auto BufferPoolManager::FlushPage(page_id_t page_id) -> bool {
// std::lock_guard<std::mutex> lock(latch_);
auto it = page_table_.find(page_id);
if (it == page_table_.end()) {
return false;
}

frame_id_t frame_id = it->second;
Page &page = pages_[frame_id];

DiskRequest write_request;
write_request.is_write_ = true;
write_request.data_ = page.GetData();
write_request.page_id_ = page.GetPageId();

// create the Promise and get its future
auto promise = disk_scheduler_->CreatePromise();
auto future = promise.get_future();

// set the Promise to the callback
write_request.callback_ = std::move(promise);

// schedule the write request
disk_scheduler_->Schedule(std::move(write_request));

// wait for the write request to finish
future.wait();
page.is_dirty_ = false;

// page_table_.erase(it);
return true;
}

DeletePage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
auto BufferPoolManager::DeletePage(page_id_t page_id) -> bool {
std::lock_guard<std::mutex> lock(latch_);

auto it = page_table_.find(page_id);
if (it == page_table_.end()) {
return true;
}

frame_id_t frame_id = it->second;
// printf("deletePage, the page id is %d, and the frame id is %d\n", page_id, frame_id);
Page &page = pages_[frame_id];

if (page.pin_count_ > 0) {
return false;
}

if (page.IsDirty()) {
FlushPage(page_id);
}

page.ResetMemory();
page.page_id_ = INVALID_PAGE_ID;
page.pin_count_ = 0;
page.is_dirty_ = false;
page_table_.erase(it);
// replacer_->SetEvictable(frame_id, true);
replacer_->Remove(frame_id);
free_list_.push_back(frame_id);
DeallocatePage(page_id);
return true;
}

DeletePage is also easy, just remember to

  1. not delete if the page is still pinned.
  2. flush the page if it is dirty.
  3. clear all page table and LRU replacer entry.
  4. add the page back to the freelist.

Ending Remark

It looks easy, but the logic can be complicated, a little. And it is very hard to debug. Nevertheless, implementing a buffer pool manager is important for me to understand how a disk backed dbms really worked internally. A good lesson to learn, I would say.

Author

Yuncheng Yao

Posted on

2023-12-18

Updated on

2024-01-08

Licensed under