Regular Expressions

  • a sequence of characters that define a search pattern
  • many programming languages and operating systems support regular expressions
  • can use regular expressions with grep
    • it takes a regular expression and the filepath of the file
    • prints the line/s where it finds a match
  • Python has a built in library (re) that lets you use regular expressions
    • example:
      import re
      
      line = "Beautiful is better than ugly."
      matches = re.findall("Beautiful", line)
      print(matches)
      prints ['Beautiful']
      • can pass re.IGNORECASE as a third argument to the findall method
      • pass re.MULTILINE as third parameter to in findall to search all the lines
  • ^pattern - matches the pattern only if its at the beginning of a line
  • pattern$ - matches patterns at the end of a line
  • can match multiple characters by putting thn in[] brackets --- [abc] will match a, b, or c
  • [[:digit:]] - to find only the digits in a string --- with $ echo '123?34 hello?' | grep '[[:digit:]] would match the numbers in the line and not the other parts, but it would print the whole line becasue grep is a line search
  • \d - to match all the digits in a string --- from line = '123?34 hello?' when would get ['1', '2', '3', '3', '4',]
  • \ - can escape characters in regex --- \$ so it doesn't interpret the $ as meaning match only at the end of the line
  • * - for repetition --- for string 'two twoo not too.' with regex: two* will match anything that starts with 'tw' with any number of Os after it, so we would get two and twoo
    • * is greedy, it will try to match as much text as it can
  • . - matches any character --- string '__hello__there' with regex: __.*__ will match any character between two double underscores (including the underscores)
    • because * is greedy, in '__hi__bye__I__!' it would match everything between the first __ and the last'__'
    • __.*?__ - will find the least number of matches --- grep does not have non-greedy matching, but Python does --- so in Python this regex for '__one__ __two__ __three__ would match '__one__', '__two__', and '__three__' instead of the whole original string
  • can use non-greedy matching to make Mad Libs games
  • (see more about using ? and * in the Wildcards section of the command_line notes)

Copyright © 2022