Hunter's blog

Implementing a Key-Value Store

I've been interested in databases, specifically key-value stores, for a while now. After mulling over the idea, I thought that I understood the idea well enough to try implementing a simple key-value store. I've decided to write about my thought process while the memory is fresh.

I'll be using the Rust programming language, but any language should work just as well. The format of our database is simple: we will store the length of the key followed by the key, followed by the length of the value and the value. As an example, if we wanted to store "key":"value", it would look like this: 3key5value. This is simplified as we actually store the number as an unsigned 64-bit integer, so the length would have a lot of leading zeros.

So, the first step is to open the file we will be using. This is pretty straightforward:

pub struct Db {
    file: Mutex<File>,
}

impl Db {
    pub fn open(path: Path) -> Result<Self> {
        let file = File::open(path)?;
        Ok(Db {
            file: Mutex::new(file),
        })
    }
}
We store the file behind a Mutex to avoid dealing with concurrency issues.

Now that we've opened our database, we need to be able to read and write to it. For this we're going to make use of the byteorder crate, which makes it easy to write a u64 to a file. The same thing could be done with bitshifting, but that's easy to get wrong. We'll implement writing first:
impl Db {
    pub fn write(&mut self, key: &[u8], value: &[u8]) -> Result<()> {
        let key_len = key.len() as u64;
        let value_len = value.len() as u64;

        // Acquire the mutex
        let mut file = self.file.lock().unwrap();

        // Write the key length and the key
        file.write_u64::<LittleEndian>(key_len)?;
        file.write_all(key)?;

        // Write the value length and the value
        file.write_u64::<LittleEndian>(value_len)?;
        Ok(file.write_all(value)?)
    }
}
Not too much to talk about. We want to make sure that there is only one instance of a key in the database, so we'll need to check if the key exists and replace its value if it does. This is more complex so we'll add it later.

Now that we can write to the database, we can implement reading from it. This is a little more complex, but it still isn't too bad.
impl Db {
    pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
        let key_len = key.len() as u64;
        // The current position in the file
        let mut pos = 0;
        let mut file = self.file.lock().unwrap();
        let file_len = file.metadata()?.len();
        // Make sure that we're reading from the start of the file
        file.seek(SeekFrom::Start(pos))?;

        while pos < file_len {
            // Read the length of the next key in the database
            let len = file.read_u64::<LittleEndian>()?;
            // We don't read the next key unless it has the same length as our key
            if len == key_len {
                // Read the next key
                let mut buf = vec![0; len as usize];
                file.read_exact(<mut buf)?;
                // Check if this is the key we're looking for
                if key == buf.as_slice() {
                    // Read the value's length from the file
                    let value_len = file.read_u64::<LittleEndian>()?;
                    // Read the value from the file
                    let mut value = vec![0; value_len as usize];
                    file.read_exact(&mut value)?;
                    return Ok(Some(value));
                }
            }

            // This key wasn't a match, advance our position to the next key

            // Advance to the location of this key's value
            pos += mem::size_of::<u64>() as u64 + len;
            file.seek(SeekFrom::Start(pos))?;

            // Read the value's length
            let value_len = file.read_u64::()?;
            pos += mem::size_of::<u64> as u64 + value_len;
            // Move the file pointer to the location of the next key
            file.seek(SeekFrom::Start(pos))?;
        }

        // No matching key was found
        Ok(None)
    }
}
That's quite a bit of code, but all that we're doing is iterating over the keys in our database until we find a match.

Now we can write a simple test to make sure it's working!
fn main() {
    let mut db = Db::open("test.db").unwrap();
    db.write("key", "value").unwrap();
    db.write("key2", "value2").unwrap();
    // Make sure that we get back what we put in
    assert_eq!(db.get("key").unwrap(), Some(b"value".to_vec()));
    assert_eq!(db.get("key2").unwrap(), Some(b"value2".to_vec()));
    assert_eq!(db.get("key3").unwrap(), None);
}
If we run this and then run hexdump -C test.db we can see that it looks like this:
........key.....
...value........
key2........value2
The dots are the length of the string that follows.

Now we can add key overwriting to our write function:
let mut pos = 0;
let file_len = file.metadata()?.len();
// Move to the start of the file
file.seek(SeekFrom::Start(pos))?;

while pos < file_len {
    let len = file.read_u64::()?;
    if len = key_len {
        let mut buf = vec![0; len as usize];
        file.read_exact(&mut buf)?;

        if key == buf.as_slice() {
            pos += mem::size_of::<u64>() as u64 + len;
            let old_value_len = file.read_u64::<LittleEndian>()?;
            pos += mem::size_of::<u64>() as u64 + old_value_len;
            // Read the file after this key-value pair
            let mut tail = vec![0; (file_len - pos) as usize];
            file.read_exact(&mut tail)?;
            pos -= mem::size_of::<u64>() as u64 + old_value_len;
            file.seek(SeekFrom::Start(pos))?;
            file.write_u64::<LittleEndian>(value_len)?;
            file.write_all(value)?;
            file.write_all(&tail)?;
            return Ok(());
        }
    }

    // Advance to the next key
    pos += mem::size_of::<u64>() as u64 + len;
    file.seek(SeekFrom::Start(pos))?;

    let value_len = file.read_u64::<LittleEndian>()?;
    pos += mem::size_of::<u64>() as u64 + value_len;
    file.seek(SeekFrom::Start(pos))?;
}
                
We add this right after the assignment to file. A lot of this is really similar to our get function. We could even split a good chunk of both functions into a helper function. But from here we have a working key-value store!

There are a lot of really simple improvements we could make. A simple improvement is to return the old value when we replace a value. I've already done this; you can look at my full version of this key-value store here. We could also implement Iterator, which would allow us to work over every key. For more advanced improvements, we might memory-map the file or use a data structure such as a BTree. Resources on implementing database engines seem hard to find but here are a few that I've found:
If you know of any other good resources, feel free to shoot me an email!