Regular Expressions
- a sequence of characters that define a search pattern
- many programming languages and operating systems support regular expressions
- can use regular expressions with
grep- it takes a regular expression and the filepath of the file
- prints the line/s where it finds a match
- Python has a built in library (
re) that lets you use regular expressions- example:
printsimport re line = "Beautiful is better than ugly." matches = re.findall("Beautiful", line) print(matches)['Beautiful']- can pass
re.IGNORECASEas a third argument to thefindallmethod - pass
re.MULTILINEas third parameter to infindallto search all the lines
- can pass
- example:
^pattern- matches the pattern only if its at the beginning of a linepattern$- matches patterns at the end of a line- can match multiple characters by putting thn in
[]brackets ---[abc]will match a, b, or c [[:digit:]]- to find only the digits in a string --- with$ echo '123?34 hello?' | grep '[[:digit:]]would match the numbers in the line and not the other parts, but it would print the whole line becasuegrepis a line search\d- to match all the digits in a string --- fromline = '123?34 hello?'when would get['1', '2', '3', '3', '4',]\- can escape characters in regex ---\$so it doesn't interpret the$as meaning match only at the end of the line*- for repetition --- for string'two twoo not too.'with regex:two*will match anything that starts with 'tw' with any number of Os after it, so we would gettwoandtwoo*is greedy, it will try to match as much text as it can
.- matches any character --- string'__hello__there'with regex:__.*__will match any character between two double underscores (including the underscores)- because
*is greedy, in'__hi__bye__I__!'it would match everything between the first__and the last'__' __.*?__- will find the least number of matches --- grep does not have non-greedy matching, but Python does --- so in Python this regex for'__one__ __two__ __three__would match'__one__','__two__', and'__three__'instead of the whole original string
- because
- can use non-greedy matching to make Mad Libs games
- (see more about using
?and*in the Wildcards section of the command_line notes)