Making a HashMap of LinkedLists

Posted in youfolio-interview-questions by Christopher R. Wirz on Thu Jul 19 2012


Note: This post actually went live after my NDA with YouFolio expired and has been modified after the original authorship date. I doubt these interview questions will still be used as the architecture has probably changed a lot.

YouFolio is an online portfolio (e-portfolio) company working with universities and students to showcase their skills and help them land their dream jobs by making their content search engine optimized and verifiable. As an early stage company, it has a strong focus on problem solving and therefore bases its interview questions on real-life YouFolio solutions. If an interviewee solves a problem more efficiently than YouFolio's current implementation, they are often invited to join the team.

Note: YouFolio has had a strong focus on proving concepts work and getting Search Engine Optimized (no database hits and minimal disk I/O for all SEO-d pages!) over algorithmic efficiency. For this reason, YouFolio is seeking talent that can help the company (and its code base) scale.

YouFolio's content database is growing at incredible rates. We store the content in uniformly distributed key-value pairs so it is easy to shard and horizontally scale across several similar servers. The number of servers is a known quantity, N, and the unique ID of each entry resolves to an integer.

We want to be able to interactiwith this data in an easy-to-use manner. Here is the API that the team has chosen. As a side note, it's written in C++ (which gets compiled into a PHP extension).



/**
 * A collection allows for Putting and Getting of data based on its key
 */
template<class T>
class Collection {
    public:
        /**
         * The default constructor 
         */
        Collection();
    
        /**
         * Adds or replaces a key value pair
         * @param key the key of the item to add
         * @param value the value of the item
         */
        void Put(int key, T value);
        
        /**
         * Gets a value based on a key
         * @param key the key
         * @returns the value or null/default
         */
        T Get(int key);
};

The goal is to implement this API in a manner that balances insert penalty and read speed. In our testing, for our size of data, we have found that a HashMap of linked lists works best. Here are the unit tests that should pass upon implementation...


int main()
{
    printf("Example:\n");

    Collection<int> c;
    c.Put(1,2);
    printf("c.Get(1) = %d (should be 2)\n", c.Get(1));
    printf("c.Get(2) = %d (should be 0)\n", c.Get(2));
    c.Put(1,3);
    printf("c.Get(1) = %d (should be 3)\n", c.Get(1));
    printf("c.Get(2) = %d (should be 0)\n", c.Get(2));
    c.Put(2,2);
    printf("c.Get(1) = %d (should be 3)\n", c.Get(1));
    printf("c.Get(2) = %d (should be 2)\n", c.Get(2));
    
    return 0;
}

While a linked list is O(n), a HashMap is a lookup with a time complexity of O(1). Therefore if a HashMap has N linked lists, the actual time complexity is O(n / N). Because N is a constant, it is still technically the same "Big O" complexity, but we know in practice that this approach is about N times faster.

This scales horizontally. Now it's time to implement the code. For high-speed mission critical operations YouFolio uses C/C++:


#include <stdio.h>

/**
 * An item stores data in Key Value pairs
 * and forms a Linked List
 */
template<class T>
struct LinkedItem {
    /**
     * The key
     */
    int key;
    
    /**
     * The value
     */
    T value;
    
    /**
     * Next item in the linked list
     */
    LinkedItem<T>* next = NULL;
    
    /**
     * The last item in the linked list
     */
    LinkedItem<T>* Last(){
        if (this->next == NULL){
            return this;
        }
        return next;
    }
    
    /**
     * The last item less than the key
     */
    LinkedItem<T>* Last(int key){
        if (this->next == NULL){
            return this;
        }
        if (this->next->key > key){
            return this;
        }
        return next;
    }
    
    /**
     * Finds an item based on its key (or returns NULL)
     */
    LinkedItem<T>* Find(int key){
        if (this->key == key){
            return this;
        }
        if (this->key > key){
            // Assuming the list is ordered
            return NULL;
        }
        if (this->next != NULL){
            return this->next->Find(key);
        }
        return NULL;
    }
};


/**
 * A collection holds several linked lists,
 * accessing them through a hash table
 * (N can be tuned for performance)
 */
template<class T, int N>
class Collection{
    public:
        /**
         * The collection of linked item collections
         */
        LinkedItem<T>  *LinkedLists[N];
        
        /**
         * The default constructor (sets list to a collection of NULL)
         */
        Collection(){
            for(int i = 0; i < N; i++){
                this->LinkedLists[i] = NULL;
            }
        }
    
        /**
         * Adds or replaces a key value pair
         * @param key the key
         * @param value the value
         */
        void Put(int key, T value){
            int index = key % N;

            if (LinkedLists[index] == NULL){
				// Add the new item
                LinkedItem<T>* item = new LinkedItem<T>();
                item->value = value;
                item->key = key;
                
                // If there are no items, this is the first
                LinkedLists[index] = item;
                return;
            }
            
            LinkedItem<T>* found = LinkedLists[index]->Find(index);
            if (found != NULL){
                found->value = value;
                return;
            }
            
            LinkedItem<T>* item = new LinkedItem<T>();
            item->value = value;
            item->key = key;
            
            LinkedItem<T>* last = LinkedLists[index]->Last(key);
            if (last->key > key){
                LinkedLists[index] = item;
                item->next = last;
                return;
            }
            item->next = last->next;
            last->next = item;
        }
        
        /**
         * Gets a value based on a key
         * @param key the key
         * @returns the value or null/default
         */
        T Get(int key){
            int index = key % N;
            if (LinkedLists[index] == NULL){
                // If there are no items, this is the 0
                return (T)NULL;
            }
            LinkedItem<T>* found = LinkedLists[index]->Find(key); 
            if (found == NULL){
                // If there are no items, this is the 0
                return (T)NULL;
            }
            return found->value;
        }
};

Time to invoke the code...


int main()
{
    printf("Alright, let's test this...\n");
    printf("First we make a new collection on the stack.\n");
    Collection<int, 15> c;
    printf("Now we add {1,2} using .Put(1,2)\n");
    c.Put(1,2);
    printf("Now we try c.Get(1) = %d\n", c.Get(1));
    printf("Now we try c.Get(2) = %d\n", c.Get(2));
    printf("Now we add {1,3} using .Put(1,3)\n");
    c.Put(1,3);
    printf("Now we try c.Get(1) = %d\n", c.Get(1));
    printf("Now we try c.Get(2) = %d\n", c.Get(2));
    printf("Now we add {2,2} using .Put(2,2)\n");
    c.Put(2,2);
    printf("Now we try c.Get(1) = %d\n", c.Get(1));
    printf("Now we try c.Get(2) = %d\n", c.Get(2));
    
    return 0;
}

If correct, we should be able to Put and Get the values as expected above.

First we make a new collection on the stack.
Now we add {1,2} using .Put(1,2)
Now we try c.Get(1) = 2
Now we try c.Get(2) = 0
Now we add {1,3} using .Put(1,3)
Now we try c.Get(1) = 3
Now we try c.Get(2) = 0
Now we add {2,2} using .Put(2,2)
Now we try c.Get(1) = 3
Now we try c.Get(2) = 2

Success! Want to try for a more efficient algorithm? Here's a link to get started.