Python Text Processing
- a string is like a list, each character is indexed
text_variable[index]
to access a character- strings know their length, use
len(string_name)
- can iterate through a string by index
- Examples of functions you can call on strings using
x = 'this is a Test '
- 'must'know:
- split-->
x.split(' ')
['this', 'is', 'a', 'test']
- upper-->
x.upper()
'THIS IS A TEST '
- lower-->
x.lower()
'this is a test
- replace-->
x.replace('is', 'lol')
'thlol lol a Test '
- find-->
x.find('is')
2
- strip-->
x.strip()
'this is a Test
- split-->
- 'good' to know:
- startswith--> x.startswith('th')
True
- endswith--> x.endswith('end')
False
- title--> x.title()
'This Is A Test '
- isalpha--> x.isalpha()
False
- isdigit--> '521'.isdigit()
True
- isspace--> ' '.isspace()
True
(works with spaces, tabs, or newlines)
- startswith--> x.startswith('th')
- 'must'know:
- strings are not good at being modified, so usually you create a new string to work with them
- example:
printsraw_string = 'My phone number is 6508675309. Please call!' print(just_number(raw_string) def just_number(str): only_number = '' # use to build new string rather than trying to delete from existing string for ch in str: if ch.isdigit(): only_number = only_number + ch return only_number
6508675309
- example:
- characters are just a giant enumeration (An enumeration is a complete, ordered listing of all the items in a collection. The term is commonly used in
mathematics and computer science to refer to a listing of all of the elements of a set.)
- big look up table
- ASCII
- ASCII2
- Unicode (bigger ASCII)
- 'A' -> 'Z' are sequential
- 'a' -> 'b' are sequential
- '0' -> '9' are sequential
ord(ch)
gives us the number associated with the character- functions which take strings (same example x as above):
- len-->
len(x)
15
- ord-->
ord('A')
65
- hash-->
hash(x)
2466759895439727657
- < -->
'abc' < 'zabc'
True
- == -->
x == 'this is a Test
True
- in-->
'his' in x
True
- len-->
- big look up table
- Python strings are immutable
- once a string has been created you cannot set characters
- to change a string:
- create a new string holding the new value you want to give it via concatenation
- see earlier example with the function that only returned the numbers
- reassigning the string variable (that's allowed)
- example:
x = 'abc' x[1] = 'z' # TypeError: 'str' object does not support item assignment x = 'azc' # can reassign the string
- example:
- often build up new string through concatenation
- example:
printsdef main(): s1 = 'CS106' s2 = 'A' s3 = 'I got an ' + s2 + ' in ' + s1 +s2 print(s3)
I got an A in CS106A
- example:
- create a new string holding the new value you want to give it via concatenation
- important consequence: if you pass a string to a function, you are guaranteed your string won't be changed
- many string algorithms use the 'loop and construct' method
- 3 examples that give the same result:
def reverse_string(str): result = '' for i in range(len(str)): result = str[i] + result return result
def reverse_string_v2(str): result = '' for ch in str: result = ch + result return result
def reverse_string_v3(str): ''' This uses the slice operator in a special way. With no start, no end, and a delta of -1, slice reverses. ''' return strt[::-1]
- 3 examples that give the same result: