# this is an example of assigning an object and adding operators to the objects.
= 3.14159
pi = 5
radius = pi * radius**2 # pi is an object * another object (radius) raised to the second power
area print(area)
78.53975
June 10, 2024
Introduction to Python with Professor Sarah Hunter was a workshop taken at ICPSR in the summer of 2024. This course covers the absolute basics.
Google Colab is main software used - this is the one used during class.
Optional: Spyder - Spyder is an IDE that makes it similar to an interface like R studio.
Optional: Jupyter Notebooks
If you want to work with data that is sensitive or private, do NOT upload it to any cloud service. In this case, use Spyder and download/work with the data locally.
If I want to download Python locally, talk to a TA.
its the most popular programming language
Python is used a lot by data analyst
Web scraping very big on Python
Accessing APIs with Python good as well.
command(object)
ex: print(“Hello World”)
There are different types of objects.
Objects are assigned with =
Objects are defined by type:
scalar (cannot be subdivided)
non-scalar (has an internal structure that can be assessed)
Can find type in Python with type()
We can convert different types to other types
Expressions = objects + operators
What are operators?
Python knows order of operations
Example code chunk:
# this is an example of assigning an object and adding operators to the objects.
pi = 3.14159
radius = 5
area = pi * radius**2 # pi is an object * another object (radius) raised to the second power
print(area)
78.53975
We can “rebind”. This assigns a new value to the object of the same name. The object is getting a new value. So be careful when you rebind!
note: need to use the print command. In R you could just type the object and it would print. This is not the same in Python.
62.800000000000004
END DAY 1!
To make coffee I first grab a Keurig coffee cup and put it in my Keurig . Once starting the Keurig , I wait for it to heat up and it begins filling the cup. After, I put my sugar free vanilla creamer inside the the mug. I then mix it up and it is ready to be served.
What is the point of this?
Think about how many different tasks you have to go through.
Learning Python is similar. It is a series of small tasks.
we need to spell out each of those tasks that we do without thinking.
they need to be in our code.
Strings must be in quotes.
defined: text, letter, character, space, digits, etc.
Use triple quotes for multiple lines of strings.
greeting = "Hello! How are you"
who = 'Anastasia'
print(greeting)
print(greeting + who + '?') # this is concatenating. notice the output.
print(greeting + " " + who + '?') # this is one way to fix the spacing issue
# we could also add a space to the object.
Hello! How are you
Hello! How are youAnastasia?
Hello! How are you Anastasia?
Let’s try an example of a multi line string:
This is a string. It is
spanning multiple lines.
We can combine strings with integers.
n_apples = 3
print("I ate", n_apples, "apples.") # this is NOT a concatination. The n_apples is still an integer.
print("I ate", str( n_apples), "apples.") # this is a concatination. We convert the int to a string.
# now try to assign a new object
sentence=("I ate ", n_apples, "apples.")
print(sentence)
type(sentence) # notice the type is not a string. We will discuss tuples later.
I ate 3 apples.
I ate 3 apples.
('I ate ', 3, 'apples.')
tuple
Allows a user to input a response.
Example:
We can go further:
# this will give you your age. Pretty fun. Note this code will not run on this website.
birth_yr = input("Type in your birth year:")
print('You are ' + str(2024 - int(birth_yr)) + ' years old.') # if we were to assign this to an object, we would return a string because the middle input is wrapped in a str() function. So it will convert our input which is originally an integer, to a string.
Used to compare to variables to one another
Used for binary outcomes. True or False?
var1 < var2
var1 >= var2
var1 > var2
var1 <= var2
var1 == var2
var1 != var2
These will help once we start talking about control flow of a model.
Logical operators on Booleans
not, or , and are special words for logical operators
not a
a or b
a and b
Examples:
from pickle import TRUE # this is just a package. the prof originally wrote TRUE but thats for R. Python likes True.
# how did you commute?
bike=True
bus=False
print(bike or bus)
print(bike and bus)
True
False
Example: (four) spaces
if <condition>:
<expression>
<expression>
Spaces/ white space matters in python!
the expressions should be (by convention) be indented by 4 spaces or a Tab
that’s how Python understands that those are the expression to be run if the condition is True
once indented is removed, it’ll be back to evaluating everything.
Let’s use the modulus boolean as an example:
Longer example of control flow:
elif is short for else if
if condition 1 is true, evaluate expression 1
if condition 1 is not true but condition 2 is true, evaluate expression 2.
Last expression is evaluated only if all the other conditions are False.
Basically Python hits the first condition that returns as True.
Further example:
number=0 # change this number and notice how the output changes.
if number > 0:
print("positive number")
elif number == 0:
print("Zero")
else:
print("Negative number")
print("This statement is always executed") # notice the white space
Zero
This statement is always executed
Beware the Nested Statements!
how do you know which else belongs to which if?
Examples:
# program to display numbers from 1 to 5
# intialize the variable
i=1
n=5
# while loop from i = 1 to 5
while i <= n:
print(i)
i=i+1 # see what happens when you take this part of the function out. (its not good)
1
2
3
4
5
number=700
# this function below keeps adding 1 until the number is divisible by 13.
while not number %13==0: #notice the not function
print(number, "is not divisible by 13.")
number=number+1
print(number, "is divisible by 13.")
700 is not divisible by 13.
701 is not divisible by 13.
702 is divisible by 13.
useful for when number of iterations are known
Its function can be achieved by a while loop, but for loop is easier
every time through the loop, <variable> assumes a new value (iterating through <iterable>)
iterable is usually range (<some_num>)
can also be a list
range(start, stop, step)
start =0 and step = 1
only stop is required
it will start at 0, loop until stop-1.
Python starts counting at ZERO NOT at one!
exits the loop it is in
remaining expressions are not evaluated
in nested loops, only innermost loops exited
1 1
1 2
1 3
2 1
3 1
3 2
3 3
lists are on of four built-in data types to store collections of data
the other are tuples, dictionaries, and sets.
used to store items in a single variable.
large lists require more computer power.
lists always start with a square bracket
items in a list don’t need to be of the same type.
lists are ORDERED
lists contain the same elements.
in Python, “methods” are functions that belong to an object
they only work with that object
Some list methods include:
append - adds element to end.
insert - adds an element at the specified position
reverse - reverses the order of the list
sort - sorts the list - object type determines method of sort.
index - returns the index of the first element with the specified value
sorted
extend - adds the elements of a list (or any interable), to the end of the current list
+ add lists together without modifying original lists.
del - remove an element from a list.
note that no re-assignment is necessary
once append() is run, the list is modified in memory.
avoid “.” (dots) in the naming of objects because they have usage in python.
END DAY 2
Write a script that checks whether a number is even.
Lists can be sliced with the following syntax:
[start:stop:step]
start at start (default is zero)
stop one step before stop (default is length of list)
step specifies how many indices to jump.
ordered sequence of items
a type of object.
unlike lists, tuples are immutable
They are typically created with parenthesis ()
Example:
used to conveniently swap variable values
used to return more than one value from a function, since it conveniently packages many values of different type into one object.
not super common TBH. Probably won’t use much. But they are just something to be aware of.
Tuples have two methods
count()
index()
Sets do not order items
sets store unique elements - no duplicates
uses hashing to efficiently store and retrieve
great for quick lookup (does not take much time/RAM)
sets created with curly {} braces
Additional Set example:
Difference between sets and Lists:
Sets:
Lists:
Defined: text, letter, character, space, digits, etc.
create. with single or double quotes (needs to be consistent use)
strings can also be created with triple quotes.
startswith()
endswith()
capitalize() capitalizes the first character
title() capitalizes the first character in every word
upper() capitalizes everything
lower() converts string to all lowercase.
Dictionaries are objects in Python that contain both key and value pairs:
Values
any type (mutable and immutable)
can be duplicates
can be lists, other dictionaries, any type
keys
must be unique
must be immutable type (int, float, string, tuple, bool)
no order to keys (and thus values), just like there is no order in a set.
[key:value, key:value, key:value…]
.index
.keys
.values
reusable pieces of code
functions are not run until they are called/invoked somewhere.
function characteristics:
has a name
has parameters
has a docstring (optional but recommended)
has a body
returns something
Saving bits of code to be used later.
“def” is the keyword used to define the function
name of function comes after “def”
then, in (), comes the parameters/arguments
def is_even(i): # is_even is name of function. i is what we input for the function to evaluate.
"""
Input: i is a positive integer
Returns True if i is even, otherwise False
"""
return i % 2 == 0
is_even(5) # we are saying use the function is_even, which checks to see if we have a remainder after dividing by 2. If we do not, then it is even.
# returns a boolean (False or True) based on the input.
is_even(4)
True
the docstring, enclosed in “““, provides info on how to use the function to the end user.
the docstring can be called with help()
Be cautious of the variable scope issue.
returns can only be used inside a function
there can be multiple returns in a function
only of them will be used each time function is invoked
once return is hit, function’s scope is exited and nothing else in the function is run
Write a function that tests if number is divisible by 6:
def divisible_check(x):
if x % 6 == 0:
return "this number is divisble by 6"
elif x % 6 != 0:
return "this number is not divisible by 6"
else:
return "undefined"
divisible_check(108) # change the number in the parenthesis to test the output.
'this number is divisble by 6'
Write a function that creates a dictionary within the function. This function will take a sentence, assign each word as a key, and the value will correspond with the number of times that word appears in sentence.
def word_freq(sentence):
words_list=sentence.split()
freq={}
for word in words_list:
if word in freq:
freq[word] += 1
else:
freq[word] = 1
return freq
quote = '''Let me tell you the story when the level 600 school gyatt walked
passed me, I was in class drinking my grimace rizz shake from ohio during my
rizzonomics class when all of the sudden this crazy ohio bing chilling gyatt got
sturdy, past my class. I was watching kai cenat hit the griddy on twitch.
This is when I let my rizz take over and I became the rizzard of oz. I screamed,
look at this bomboclat gyatt'''
word_freq(quote)
{'Let': 1,
'me': 1,
'tell': 1,
'you': 1,
'the': 5,
'story': 1,
'when': 3,
'level': 1,
'600': 1,
'school': 1,
'gyatt': 3,
'walked': 1,
'passed': 1,
'me,': 1,
'I': 5,
'was': 2,
'in': 1,
'class': 2,
'drinking': 1,
'my': 4,
'grimace': 1,
'rizz': 2,
'shake': 1,
'from': 1,
'ohio': 2,
'during': 1,
'rizzonomics': 1,
'all': 1,
'of': 2,
'sudden': 1,
'this': 2,
'crazy': 1,
'bing': 1,
'chilling': 1,
'got': 1,
'sturdy,': 1,
'past': 1,
'class.': 1,
'watching': 1,
'kai': 1,
'cenat': 1,
'hit': 1,
'griddy': 1,
'on': 1,
'twitch.': 1,
'This': 1,
'is': 1,
'let': 1,
'take': 1,
'over': 1,
'and': 1,
'became': 1,
'rizzard': 1,
'oz.': 1,
'screamed,': 1,
'look': 1,
'at': 1,
'bomboclat': 1}
Why do we use the lm() command in R?
why not just use the formula (X’X)^-1 X’y?
the lm command is a function.
python modules are files (.py) that (mainly) contain function definitions
they allow us to organize, distribute code; to share and reuse others’ code.
keep code coherent and self-contained.
one can import modules or some functions from modules.
instead of below
we could create a module that contains this function:
Try this example instead:
Today's date: 2024-09-26
We are basically bringing in packages and incorporating the functions contained within them to use for our code.
short hand code to replace for/while loops and if/else statements
comprehensions provide simple syntax to achieve it in a single line.
can be used for lists, sets, and dictionaries
Overall: makes code shorter and easier to read
With for loop:
numbers = [1,2,3,4,5,6,7,8,9,10]
new_list=[]
for number in numbers:
new_list.append(number)
print(new_list)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
with list comprehension:
numbers = [1,2,3,4,5,6,7,8,9,10]
new_list = [num for num in numbers] # this is the exact same thing as the loop above. Just more condensed
print(new_list)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
END DAY 3!
Write a function that, given dictionary consisting of vehicles and their weights in kilograms, constructs a list of the names of vehicles with weight below 2000 kilograms. Use list comprehension to achieve this.
With list comprehension:
d={"Sedan": 1500, "SUV":2000, "Pickup": 2500, "Minivan":1600, "Van":2400, "Semi":13600, "Bicycle":7, "Motorcycle":110}
get_lighter_vehicles=[weight for weight in d if d[weight]<2000]
print(get_lighter_vehicles)
['Sedan', 'Minivan', 'Bicycle', 'Motorcycle']
Without list comprehension:
Let’s first talk about how the internet works.
Clients & Servers:
data (web pages) lives on servers
browsers, apps, etc. are clients
clients send requests to servers
servers serve the necessary files to users
To request data from these servers we use the “requests” library in Python
allows us to send requests to servers
need internet connection
Example:
import requests
r = requests.get('https://www.python.org/')
r.status_code
# you should get 200
# if you get anything else. Something is wrong and is not working.
200
If I were to run the following code:
This would print out the html code for the entire webpage. While this may seem scary, this is actually great! Because html is another coding language, by knowing just a little of html, I can pick and choose what parts of the webpage I want. Below is some basic code and information for html documents:
style information, including links to CSS files
Javascript scripts and links to javascript files
html tags (just add “<>” around these head, li, div, img, etc)
classes, ids, toggle buttons, many more
navigation bar, side bar, footer.
How do parse through all of this code? We use a parser.
a parser is a software that recognizes the structure of an HTML document
allows the extraction of certain parts of the code
the “BeautifulSoup” library serves that purpose
Application Programming Interface (API) provide structured data.
they allow for the building of applications
separate design from content
access the data directly
GET (get/retrieve data from server)
POST (update data on server)
PUT (add data to server)
DELETE (delete data from server)
Many governmental agencies, newspapers, and common data sources have public APIs that can be accessed from R or Python
requests typically start with an endpoint defined by the host (server)
For example:
Wikipedia provides one endpoint
YouTube provides many endpoints, depending on what one is working with.
Format of parameters
?param1=value1¶m2=value2¶m3=value3…
parameters is how we define what we want from the API.
Follow example in pdf documentation for class.
END DAY 4!
@online{neilon2024,
author = {Neilon, Stone},
title = {ICPSR - {Introduction} to {Python}},
date = {2024-06-10},
url = {https://stoneneilon.github.io/notes/ICPSR_Intro_to_Python/},
langid = {en}
}