Hash Tables

  • data structure that allows you to access data efficiently
    • in a perfect hash table, no matter how large the data set, any item can be accessed in O(1), constant time
  • holds elements in a series of slots or buckets
  • each slot has a unique identifying key
  • usually have a number of slots equal to a prime number
  • to put elements in a hash table efficiently, you perform a hash
    • a hash takes a number or group of characters and performs an arithmetic or logical transformation to produce a value
      • modulo division is a very simple hashing technique
      • example:
        • have: 86, 6, 22, 38, 68
        _____________________________  
        | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
        _____________________________
        |   |   |   |   |   |   |   |
        _____________________________
        • to hash 86, perform % 7, where 7 is the number of digits in the hash table
          • 86 % 7 = 2, so it is placed in slot 2
        • 6 % 7 = 6, 22 % 7 = 1, 38 % 7 = 3, 68 % 7 = 5
        • _____________________________
          | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
          _____________________________
          |   |22 |86 |38 |   |68 |6  |
          _____________________________
        • when want to find a number you don't have to perform a search, you just hash the number and that gives you it's location
  • collisions occur if more than one value should be in the same spot
    • example: want to add 45 to the above example hash table
      • 45 % 7 = 3, but 38 is already in 3
      • to resolve the collision place 45 in the next empty spot --- 4
      •  _____________________________
         | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
         _____________________________
         |   |22 |86 |38 |45 |68 |6  |
         _____________________________
    • this process causes extra complexity, but they are still one of the most efficient structures for storing large data sets
  • the goal of hash table construction is to use the correct number of slots and a hash function that produces the fewest collisions
    • making number of slots equal to prime numbers helps keep numbers from clumping around common factors
  • Python's dictionaries use hash tables behind the scenes

Copyright © 2022