Skip to content
Home ยป Hash Table

Hash Table

    The hash table is perfect solution to store dictionary where each data is uniquely identified using a key. The dictionaries are implemented using arrays. The key are stored with the object itself and each data element has its own index value. Therefore, data is stored in the array as key-value pair.

    The hash table uses a hash function which is a mathematical function to compute a hash value H(k) which is the index L of the data element in the array. The hash function uses key k_{i} to compute  H(k_{i}) = L_{i}.

    The hash table allows three types of operations.

    1. Insertion of a new value.
    2. Deletion of an existing value.
    3. Search for a value in the array.

    Direct Addressing Scheme

    The hash table uses a numerical key or characters that are converted to numeric equivalent for computing the hash value H_{k} = L where L is the index of data element in the array.

    Suppose U is the universal set of all keys from where we use k_{i} to map L_{i} from an array of size \lvert U \rvert It means every k_{i} is mapped to every  L_{i} \in U. Also, no two keys have same hash value.

    This type assigning a unique value to each an every data element in array is called direct-addressing.

    Example

    In this example, our universe U = { all English letters } from where we get our keys k_{i}. The size of the array is A[0: L-1] where \lvert U \rvert is equal to N, the size of the array.

    Each letter is a key is converted to a number equivalent to its position in the English alphabets. For example, A = 1, B = 2, and so on.

    KeyHash FunctionHash Value
    AH(A)1
    BH(B)2
    CH(C)3
    DH(D)4
    TH(T)20
    VH(V)22
    XH(X)24
    YH(Y)25
    ZH(Z)26
    Table 1 English letters as key

    The direct-addressing is fine as long as the size of the data elements are small in number. Therefore, using a unique key for each an every element is impractical and difficult to maintain. We can use a small subset of keys K from universal set of keys U where K < U.

    Another problem with direct addressing is computer memory, we cannot store all the values of array because memory size is limited.

    Finally, the direct addressing requires you to use the whole array regardless of whether you want to store anything or not. The empty spaces are wasted.

    Hash Function

    If we are to use a small subset of keys such that K = { k_{0}, k_{1} , . . . , k_{n-1}} maps all elements of array A[0: N-1] where N is the size of the array. Each array position has an index value L_{i}.

    Figure 1 - Hash function takes the key value and compute the index of the array
    Figure 1 – Hash function takes the key value and compute the index of the array

    Hash function is a mathematical function such as a mod function which is a popular method of computing hash value. If range of the array is A[0: N-1], for any key k_{i},

    \begin{aligned}
    &H(k_{i}) = L_{i}\\
    &H(k_{i}) = k_{i} \hspace{2 mm} MOD \hspace{2 mm} N
    \end{aligned}

    The MOD function takes a value as key and gives remainder as output. In this way, more than one key value or index will have same hash value.

    Figure 2 - The key is 24 and hash value is 4 which corresponds to index in the array.
    Figure 2 – The key is 24 and hash value is 4 which corresponds to index in the array.

    In the figure above, we have key k = 24 and N = 10 which is size of the array. A MOD operation on key 24 gives 4 which is the location of the data element in the array.

    Similarly, if key = 34, then hash value is 4again. Therefore, we want a hash function which always gives same results for a key. The key values with same hash value are called synonyms.

    Synonyms

    We can treat hash value or index in an array as a bucket holding slots for each synonym. So, a bucket b can be can have s[0: m-1] slots. Such type of representation requires a two-dimensional array where a row indicates buckets with slots.

    Figure 3 - Each row is a  bucket with m number of slots
    Figure 3 – Each row is a bucket with m number of slots

    The figure above shows that the first column of the two dimensional array is buckets and other columns are slots.

    Example

    Let K = { 20, 54, 32, 64, 21, 87, 42, 60, 77} be set of keys and size of array be N = 10. The hash function we are going to use is H(k) = k MOD N.

    Figure 4 - Each row is a bucket that has 3 slots to store data elements with same hash value.
    Figure 4 – Each row is a bucket that has 3 slots to store data elements with same hash value.

    Each of the row in above figure has 3 slots per bucket to store values that have same hash values. for example,

    \begin{aligned}
    &77\hspace{2 mm} MOD \hspace{2 mm}10 = 7\\
    &87\hspace{2 mm} MOD \hspace{2 mm} 10 = 7
    \end{aligned}

    Since, two slot have same value, it will cause problem during hash table operations – insertion, deletion and search.

    Collision

    When hash value is computed using hash function and if two keys k_{i} and k_{j} has same hash value then it cause a collision.

    \begin{aligned}
    &H(k_{i}) = H(k_{j})
    \end{aligned}

    Therefore, there are two important method of solving the problem of managing synonyms.

    1. Linear open addressing
    2. Chaining

    In the future posts, we will talk about each of these methods in detail.