Implementing Set Using Hashing

Paul Ngugi - Jul 27 - - Dev Community

A hash set can be implemented using a hash map. A set (introduced in the post) is a data structure that stores distinct values. The Java Collections Framework defines the java.util.Set interface for modeling sets. Three concrete implementations are java.util.HashSet, java.util.LinkedHashSet, and java.util.TreeSet. java.util.HashSet is implemented using hashing, java.util.LinkedHashSet using LinkedList, and java.util.TreeSet using red-black trees.

You can implement MyHashSet using the same approach as for implementing MyHashMap. The only difference is that key/value pairs are stored in the map, while elements are stored in the set.

We design our custom Set interface to mirror java.util.Set and name the interface MySet and a concrete class MyHashSet, as shown in Figure below.

Image description

The code below shows the MySet interface.

Image description

The code below implements MyHashSet using separate chaining.

package demo;
import java.util.LinkedList;

public class MyHashSet<E> implements MySet<E> {
    // Define the default hash-table size. Must be a power of 2
    private static int DEFAULT_INITIAL_CAPACITY = 4;

    // Define the maximum hash-table size. 1 << 30 is same as 2^30
    private static int MAXIMUM_CAPACITY = 1 << 30;

    // Current hash-table capacity. Capacity is a power of 2
    private int capacity;

    // Define default load factor
    private static float DEFAULT_MAX_LOAD_FACTOR = 0.75f;

    // Specify a load factor used in the hash table
    private float loadFactorThreshold;

    // The number of entries in the map
    private int size = 0;

    // Hash table is an array with each cell being a linked list
    private LinkedList<E>[] table;

    /** Construct a set with the default capacity and load factor */
    public MyHashSet() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_MAX_LOAD_FACTOR);
    }

    /** Construct a set with the specified initial capacity and
     * default load factor */
    public MyHashSet(int initialCapacity) {
        this(initialCapacity, DEFAULT_MAX_LOAD_FACTOR);
    }

    /** Construct a set with the specified initial capacity
     * and load factor */
    public MyHashSet(int initialCapacity, float loadFactorThreshold) {
        if(initialCapacity > MAXIMUM_CAPACITY)
            this.capacity = MAXIMUM_CAPACITY;
        else
            this.capacity = trimToPowerOf2(initialCapacity);

        this.loadFactorThreshold = loadFactorThreshold;
        table = new LinkedList[capacity];
    }

    @Override /** Remove all of the entries from this map */
    public void clear() {
        size = 0;
        removeElements();
    }

    @Override /** Return true if the element is in the set */
    public boolean contains(E e) {
        int bucketIndex = hash(e.hashCode());
        if(table[bucketIndex] != null) {
            LinkedList<E> bucket = table[bucketIndex];
            for(E element: bucket)
                if(element.equals(e))
                    return true;
        }

        return false;
    }

    @Override /** Add an element to the set */
    public boolean add(E e) {
        if(contains(e)) // Duplicate element not stored
            return false;

        if(size + 1 > capacity * loadFactorThreshold) {
            if(capacity == MAXIMUM_CAPACITY)
                throw new RuntimeException("Exceeding maximum capacity");

            rehash();
        }

        int bucketIndex = hash(e.hashCode());

        // Create a linked list for the bucket if not already created
        if(table[bucketIndex] == null) {
            table[bucketIndex] = new LinkedList<E>();
        }

        // Add e to hashTable[index]
        table[bucketIndex].add(e);

        size++; // Increase size

        return true;
    }

    @Override /** Remove the element from the set */
    public boolean remove(E e) {
        if(!contains(e))
            return false;

        int bucketIndex = hash(e.hashCode());

        // Create a linked list for the bucket if not already created
        if(table[bucketIndex] != null) {
            LinkedList<E> bucket = table[bucketIndex];
            for(E element: bucket)
                if(e.equals(element)) {
                    bucket.remove(element);
                    break;
                }
        }

        size--; // Decrease size

        return true;
    }

    @Override /** Return true if the set contain no elements */
    public boolean isEmpty() {
        return size == 0;
    }

    @Override /** Return the number of entries in this set */
    public int size() {
        return size;
    }

    @Override /** Return an iterator for the elements in this set */
    public java.util.Iterator<E> iterator() {
        return new MyHashSetIterator(this);
    }

    /** Inner class for iterator */
    private class MyHashSetIterator implements java.util.Iterator<E> {
        // Store the elements in a list
        private java.util.ArrayList<E> list;
        private int current = 0; // Point to the current element in list
        private MyHashSet<E> set;

        /** Create a list from the set */
        public MyHashSetIterator(MyHashSet<E> set) {
            this.set = set;
            list = setToList();
        }

        @Override /** Next element for traversing? */
        public boolean hasNext() {
            if(current < list.size())
                return true;

            return false;
        }

        @Override /** Get current element and move cursor to the next */
        public E next() {
            return list.get(current++);
        }

        /** Remove the current element and refresh the list */
        public void remove() {
            // Delete the current element from the hash set
            set.remove(list.get(current));
            list.remove(current); // Remove current element from the list
        }
    }

    /** Hash function */
    private int hash(int hashCode) {
        return supplementalHash(hashCode) & (capacity - 1);
    }

    /** Ensure the hashing is evenly distributed */
    private static int supplementalHash(int h) {
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    /** Return a power of 2 for initialCapacity */
    private int trimToPowerOf2(int initialCapacity) {
        int capacity = 1;
        while(capacity < initialCapacity) {
            capacity <<= 1; // Same as capacity *= 2. <= is more efficient
        }

        return capacity;
    }

    /** Remove all entries from each bucket */
    private void removeElements() {
        for(int i = 0; i < capacity; i++) {
            if(table[i] != null) {
                table[i].clear();
            }
        }
    }

    /** Rehash the map */
    private void rehash() {
        java.util.ArrayList<E> list = setToList(); // Copy to a list
        capacity <<= 1; // Same as capacity *= 2. <= is more efficient
        table = new LinkedList[capacity]; // Create a new hash table
        size = 0; // Reset size to 0

        for(E element: list) {
            add(element); // Add from the old table to the new table
        }
    }

    /**Copy elements in the hash set to an array list */
    private java.util.ArrayList<E> setToList() {
        java.util.ArrayList<E> list = new java.util.ArrayList<>();

        for(int i = 0; i < capacity; i++) {
            if(table[i] != null) {
                for(E e: table[i]) {
                    list.add(e);
                }
            }
        }

        return list;
    }

    @Override /** Return a string representation for this map */
    public String toString() {
        java.util.ArrayList<E> list = setToList();
        StringBuilder builder = new StringBuilder("[");

        // Add the elements except the last one to the string builder
        for(int i = 0; i < list.size() - 1; i++) {
            builder.append(list.get(i) + ", ");
        }

        // Add the last element in the list to the string builder
        if(list.size() == 0)
            builder.append("]");
        else
            builder.append(list.get(list.size() - 1) + "]");

        return builder.toString();
    }
}

Enter fullscreen mode Exit fullscreen mode

The MyHashSet class implements the MySet interface using separate chaining. Implementing MyHashSet is very similar to implementing MyHashMap except for the following differences:

  1. The elements are stored in the hash table for MyHashSet, but the entries (key/value pairs) are stored in the hash table for MyHashMap.
  2. MySet extends java.lang.Iterable and MyHashSet implements MySet and overrides iterator(). So the elements in MyHashSet are iterable.

Three constructors are provided to construct a set. You can construct a default set with the default capacity and load factor using the no-arg constructor (lines 26–28), a set with the specified capacity and a default load factor (lines 32–34), and a set with the specified capacity
and load factor (lines 38–46).

The clear method removes all elements from the set (lines 49–52). It invokes removeElements(), which clears all table cells (line 190). Each table cell is a linked list that stores the elements with the same hash code. The removeElements() method takes O(capacity) time.

The contains(element) method checks whether the specified element is in the set by examining whether the designated bucket contains the element (lines 55–65). This method takes O(1) time.

The add(element) method adds a new element into the set. The method first checks if the element is already in the set (line 69). If so, the method returns false. The method then checks whether the size exceeds the load-factor threshold (line 72). If so, the program invokes rehash() (line 76) to increase the capacity and store elements into the new larger hash table.

The rehash() method first copies all elements in a list (line 197), doubles the capacity (line 198), creates a new hash table (line 199), and resets the size to 0 (line 200). The method then copies the elements into the new larger hash table (lines 202–204). The rehash method takes O(capacity) time. If no rehash is performed, the add method takes O(1) time to add a new element.

The remove(element) method removes the specified element in the set (lines 95–114). This method takes O(1) time.

The size() method simply returns the number of elements in the set (lines 122–124). This method takes O(1) time.

The iterator() method returns an instance of java.util.Iterator. The MyHashSetIterator class implements java.util.Iterator to create a forward iterator. When a MyHashSetIterator is constructed, it copies all the elements in the set to a list (line 141). The variable current points to the element in the list. Initially, current is 0 (line 135), which points to the first element in the list. MyHashSetIterator implements the methods hasNext(), next(), and remove() in java.util.Iterator. Invoking hasNext() returns true if current < list.size(). Invoking next() returns the current element and moves current to point to the next element (line 153). Invoking remove() removes the current element in the iterator from the set.

The hash() method invokes the supplementalHash to ensure that the hashing is evenly distributed to produce an index for the hash table (lines 166–174). This method takes O(1) time.

Table below summarizes the time complexity of the methods in MyHashSet.

Image description

The code below gives a test program that uses MyHashSet.

Image description

The program creates a set using MyHashSet (line 7) and adds five elements to the set (lines 8–12). Line 8 adds Smith and line 12 adds Smith again. Since only nonduplicate elements are stored in the set, Smith appears in the set only once. The set actually has four elements. The program displays the elements (line 14), gets its size (line 15), checks whether the set contains a specified element (line 16), and removes an element (line 18). Since the elements in a set are iterable, a foreach loop is used to traverse all elements in the set (lines 20–21). Finally, the program clears the set (line 23) and displays an empty set (line 24).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .