Introduction to HashMap
Principle
Its in essence a advanced array. It’s implementation is based on an array
- Key is unique and values can be duplicated
key -> hash function -> index -> value
If we can make sure hash function has time complexity of O(1), then we can achieve O(1) time complexity for search, insert and delete operations.
Hash Collision
Its possible that two different keys generate the same index, this is called hash collision.
- Two keys can generate the same index
- Two common ways to resolve collision:
- Chaining: use a linked list to store multiple values at the same index
- Open addressing: find the next available index
Naturally, to avoid hash collision(if we already have an awesome hash function, there are too many kv pairs in the HashMap), we want to have a large array size (resizing). But this will lead to memory waste. So we need to find a balance between size of array and number of kv pairs. This is called load factor.
\[\text{Load Factor} = \frac{\text{Number of Elements}}{\text{Size of Array}}\]Resizing happens when load factor exceeds a certain threshold.
In addition, the load factor for chaining can be infinite, but for open addressing, it must be less than 1.
Additional Notes
- Dont rely on iterative order of HashMap
- Key should be immutable(something that never change)
Implementation
Chaining
TODO
Open addressing
TODO
LinkedHashMap
HashMap is unordered, so what if we want to maintain the insertion order? We can use
LinkedHashMap, which is a combination of HashMap and Doubly Linked List(A Must).
ArrayHashMap
What if we want to add randomKey() function to HashMap, which returns a random key from the HashMap. We can use ArrayHashMap, which is a combination of HashMap and Dynamic Array.
It’s a bit subtle. The Dynamic array will be list of kv nodes. The HashMap will have key to index mapping.
To avoid the cost of O(n) when deleting an element from the dynamic array, we can swap the element to be deleted with the last element, and then pop the last element. This way, we only need to update the index of the swapped element in the HashMap.