Hash Tables
- data structure that allows you to access data efficiently
- in a perfect hash table, no matter how large the data set, any item can be accessed in O(1), constant time
- holds elements in a series of slots or buckets
- each slot has a unique identifying key
- usually have a number of slots equal to a prime number
- to put elements in a hash table efficiently, you perform a hash
- a hash takes a number or group of characters and performs an arithmetic or logical transformation to produce
a value
- modulo division is a very simple hashing technique
- example:
- have: 86, 6, 22, 38, 68
_____________________________ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | _____________________________ | | | | | | | | _____________________________
- to hash 86, perform % 7, where 7 is the number of digits in the hash table
- 86 % 7 = 2, so it is placed in slot 2
- 6 % 7 = 6, 22 % 7 = 1, 38 % 7 = 3, 68 % 7 = 5
_____________________________ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | _____________________________ | |22 |86 |38 | |68 |6 | _____________________________
- when want to find a number you don't have to perform a search, you just hash the number and that gives you it's location
- a hash takes a number or group of characters and performs an arithmetic or logical transformation to produce
a value
- collisions occur if more than one value should be in the same spot
- example: want to add 45 to the above example hash table
- 45 % 7 = 3, but 38 is already in 3
- to resolve the collision place 45 in the next empty spot --- 4
_____________________________ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | _____________________________ | |22 |86 |38 |45 |68 |6 | _____________________________
- this process causes extra complexity, but they are still one of the most efficient structures for storing large data sets
- example: want to add 45 to the above example hash table
- the goal of hash table construction is to use the correct number of slots and a hash function that produces the
fewest collisions
- making number of slots equal to prime numbers helps keep numbers from clumping around common factors
- Python's dictionaries use hash tables behind the scenes