Regular Expressions
- a sequence of characters that define a search pattern
- many programming languages and operating systems support regular expressions
- can use regular expressions with
grep
- it takes a regular expression and the filepath of the file
- prints the line/s where it finds a match
- Python has a built in library (
re
) that lets you use regular expressions- example:
printsimport re line = "Beautiful is better than ugly." matches = re.findall("Beautiful", line) print(matches)
['Beautiful']
- can pass
re.IGNORECASE
as a third argument to thefindall
method - pass
re.MULTILINE
as third parameter to infindall
to search all the lines
- can pass
- example:
^pattern
- matches the pattern only if its at the beginning of a linepattern$
- matches patterns at the end of a line- can match multiple characters by putting thn in
[]
brackets ---[abc]
will match a, b, or c [[:digit:]]
- to find only the digits in a string --- with$ echo '123?34 hello?' | grep '[[:digit:]]
would match the numbers in the line and not the other parts, but it would print the whole line becasuegrep
is a line search\d
- to match all the digits in a string --- fromline = '123?34 hello?'
when would get['1', '2', '3', '3', '4',]
\
- can escape characters in regex ---\$
so it doesn't interpret the$
as meaning match only at the end of the line*
- for repetition --- for string'two twoo not too.'
with regex:two*
will match anything that starts with 'tw' with any number of Os after it, so we would gettwo
andtwoo
*
is greedy, it will try to match as much text as it can
.
- matches any character --- string'__hello__there'
with regex:__.*__
will match any character between two double underscores (including the underscores)- because
*
is greedy, in'__hi__bye__I__!'
it would match everything between the first__
and the last'__' __.*?__
- will find the least number of matches --- grep does not have non-greedy matching, but Python does --- so in Python this regex for'__one__ __two__ __three__
would match'__one__'
,'__two__'
, and'__three__'
instead of the whole original string
- because
- can use non-greedy matching to make Mad Libs games
- (see more about using
?
and*
in the Wildcards section of the command_line notes)