Python Text Processing
- a string is like a list, each character is indexed
text_variable[index]to access a character- strings know their length, use
len(string_name) - can iterate through a string by index
- Examples of functions you can call on strings using
x = 'this is a Test '- 'must'know:
- split-->
x.split(' ')['this', 'is', 'a', 'test'] - upper-->
x.upper()'THIS IS A TEST ' - lower-->
x.lower()'this is a test - replace-->
x.replace('is', 'lol')'thlol lol a Test ' - find-->
x.find('is')2 - strip-->
x.strip()'this is a Test
- split-->
- 'good' to know:
- startswith--> x.startswith('th')
True - endswith--> x.endswith('end')
False - title--> x.title()
'This Is A Test ' - isalpha--> x.isalpha()
False - isdigit--> '521'.isdigit()
True - isspace--> ' '.isspace()
True(works with spaces, tabs, or newlines)
- startswith--> x.startswith('th')
- 'must'know:
- strings are not good at being modified, so usually you create a new string to work with them
- example:
printsraw_string = 'My phone number is 6508675309. Please call!' print(just_number(raw_string) def just_number(str): only_number = '' # use to build new string rather than trying to delete from existing string for ch in str: if ch.isdigit(): only_number = only_number + ch return only_number6508675309
- example:
- characters are just a giant enumeration (An enumeration is a complete, ordered listing of all the items in a collection. The term is commonly used in
mathematics and computer science to refer to a listing of all of the elements of a set.)
- big look up table
- ASCII
- ASCII2
- Unicode (bigger ASCII)
- 'A' -> 'Z' are sequential
- 'a' -> 'b' are sequential
- '0' -> '9' are sequential
ord(ch)gives us the number associated with the character- functions which take strings (same example x as above):
- len-->
len(x)15 - ord-->
ord('A')65 - hash-->
hash(x)2466759895439727657 - < -->
'abc' < 'zabc'True - == -->
x == 'this is a TestTrue - in-->
'his' in xTrue
- len-->
- big look up table
- Python strings are immutable
- once a string has been created you cannot set characters
- to change a string:
- create a new string holding the new value you want to give it via concatenation
- see earlier example with the function that only returned the numbers
- reassigning the string variable (that's allowed)
- example:
x = 'abc' x[1] = 'z' # TypeError: 'str' object does not support item assignment x = 'azc' # can reassign the string
- example:
- often build up new string through concatenation
- example:
printsdef main(): s1 = 'CS106' s2 = 'A' s3 = 'I got an ' + s2 + ' in ' + s1 +s2 print(s3)I got an A in CS106A
- example:
- create a new string holding the new value you want to give it via concatenation
- important consequence: if you pass a string to a function, you are guaranteed your string won't be changed
- many string algorithms use the 'loop and construct' method
- 3 examples that give the same result:
def reverse_string(str): result = '' for i in range(len(str)): result = str[i] + result return resultdef reverse_string_v2(str): result = '' for ch in str: result = ch + result return resultdef reverse_string_v3(str): ''' This uses the slice operator in a special way. With no start, no end, and a delta of -1, slice reverses. ''' return strt[::-1]
- 3 examples that give the same result: