# Hash Table

The hash table is perfect solution to store dictionary where each data is uniquely identified using a key. The dictionaries are implemented using arrays. The key are stored with the object itself and each data element has its own index value. Therefore, data is stored in the array as key-value pair.

The hash table uses a hash function which is a mathematical function to compute a hash value $H(k)$ which is the index L of the data element in the array. The hash function uses key $k_{i}$ to compute $H(k_{i}) = L_{i}$.

The hash table allows three types of operations.

1. Insertion of a new value.
2. Deletion of an existing value.
3. Search for a value in the array.

The hash table uses a numerical key or characters that are converted to numeric equivalent for computing the hash value $H_{k} = L$ where $L$ is the index of data element in the array.

Suppose $U$ is the universal set of all keys from where we use $k_{i}$ to map $L_{i}$ from an array of size $\lvert U \rvert$ It means every $k_{i}$ is mapped to every $L_{i} \in U$. Also, no two keys have same hash value.

This type assigning a unique value to each an every data element in array is called direct-addressing.

### Example

In this example, our universe U = { all English letters } from where we get our keys $k_{i}$. The size of the array is $A[0: L-1]$ where $\lvert U \rvert$ is equal to $N$, the size of the array.

Each letter is a key is converted to a number equivalent to its position in the English alphabets. For example, A = 1, B = 2, and so on.

The direct-addressing is fine as long as the size of the data elements are small in number. Therefore, using a unique key for each an every element is impractical and difficult to maintain. We can use a small subset of keys $K$ from universal set of keys $U$ where $K < U$.

Another problem with direct addressing is computer memory, we cannot store all the values of array because memory size is limited.

Finally, the direct addressing requires you to use the whole array regardless of whether you want to store anything or not. The empty spaces are wasted.

## Hash Function

If we are to use a small subset of keys such that $K = { k_{0}, k_{1} , . . . , k_{n-1}}$ maps all elements of array $A[0: N-1]$ where $N$ is the size of the array. Each array position has an index value $L_{i}$. Figure 1 – Hash function takes the key value and compute the index of the array

Hash function is a mathematical function such as a mod function which is a popular method of computing hash value. If range of the array is $A[0: N-1]$, for any key $k_{i}$,

\begin{aligned}
&H(k_{i}) = L_{i}\\
&H(k_{i}) = k_{i} \hspace{2 mm} MOD \hspace{2 mm} N
\end{aligned}

The MOD function takes a value as key and gives remainder as output. In this way, more than one key value or index will have same hash value. Figure 2 – The key is 24 and hash value is 4 which corresponds to index in the array.

In the figure above, we have key $k = 24$ and $N = 10$ which is size of the array. A MOD operation on key $24$ gives $4$ which is the location of the data element in the array.

Similarly, if $key = 34$, then hash value is $4$again. Therefore, we want a hash function which always gives same results for a key. The key values with same hash value are called synonyms.

## Synonyms

We can treat hash value or index in an array as a bucket holding slots for each synonym. So, a bucket $b$ can be can have $s[0: m-1]$ slots. Such type of representation requires a two-dimensional array where a row indicates buckets with slots.

The figure above shows that the first column of the two dimensional array is buckets and other columns are slots.

### Example

Let $K = { 20, 54, 32, 64, 21, 87, 42, 60, 77}$ be set of keys and size of array be $N = 10$. The hash function we are going to use is $H(k) = k MOD N$. Figure 4 – Each row is a bucket that has 3 slots to store data elements with same hash value.

Each of the row in above figure has 3 slots per bucket to store values that have same hash values. for example,

\begin{aligned}
&77\hspace{2 mm} MOD \hspace{2 mm}10 = 7\\
&87\hspace{2 mm} MOD \hspace{2 mm} 10 = 7
\end{aligned}

Since, two slot have same value, it will cause problem during hash table operations – insertion, deletion and search.

## Collision

When hash value is computed using hash function and if two keys $k_{i}$ and $k_{j}$ has same hash value then it cause a collision.

\begin{aligned}
&H(k_{i}) = H(k_{j})
\end{aligned}

Therefore, there are two important method of solving the problem of managing synonyms.

2. Chaining

In the future posts, we will talk about each of these methods in detail.  