Hashing is superefficient. It takes O(1) time to search, insert, and delete an element using hashing. An element can be found in O(log n) time in a well-balanced search tree. Is there a more efficient way to search for an element in a container? This chapter introduces a technique called hashing. You can use hashing to implement a map or a set to search, insert, and delete an element in O(1) time.
What Is Hashing?
Hashing uses a hashing function to map a key to an index. Before introducing hashing, let us review map, which is a data structure that is implemented using hashing. Recall that a map (introduced in Section) is a container object that stores entries. Each entry contains two parts: a key and a value. The key, also called a search key, is used to search for the corresponding value. For example, a dictionary can be stored in a map, in which the words are the keys and the definitions of the words are the values.
A map is also called a dictionary, a hash table, or an associative array.
The Java Collections Framework defines the java.util.Map interface for modeling maps. Three concrete implementations are java.util.HashMap, java.util.LinkedHashMap, and java.util.TreeMap. java.util.HashMap is implemented using hashing, java.util.LinkedHashMap using LinkedList, and java.util.TreeMap using red-black trees.
If you know the index of an element in the array, you can retrieve the element using the index in O(1) time. So does that mean we can store the values in an array and use the key as the index to find the value? The answer is yes—if you can map a key to an index. The array that stores the values is called a hash table. The function that maps a key to an index in the hash table is called a hash function. As shown in Figure below, a hash function obtains an index from a key and uses the index to retrieve the value for the key. Hashing is a technique that retrieves the value using the index obtained from the key without performing a search.
How do you design a hash function that produces an index from a key? Ideally, we would like to design a function that maps each search key to a different index in the hash table. Such a function is called a perfect hash function. However, it is difficult to find a perfect hash function. When two or more keys are mapped to the same hash value, we say that a collision has occurred. Although there are ways to deal with collisions, which are discussed later in this chapter, it is better to avoid collisions in the first place. Thus, you should design a fast and easy-to-compute hash function that minimizes collisions.