Week 2: Expanding on the Basics

Video 2.1: Compound Data Types

Video Outline

  1. Set Types
  2. Mapping Types

Dictionaries

A dictionaries are mutable mapping structures that are:

  • Unordered collections of arbitrary objects that associate sets of keys to values(key-value pairs)
  • Keys are required to be a hashable type(strings, tuples, integers, etc.)
  • Values are references to a other object types (numeric values, lists,dictionary, etc.)
  • Dynamically resizable
  • Heterogeneous in nature: ordering cannot be assumed unless specified
  • Implemented using a hashtable for fast lookup of keys to values

Dictionary Literals

alt text -- Learning Python 2013

Dictionary Operations

Basic dictionary operations include:

  • Accessing a value D[key] and modifying key-value pairs D[key]=new_value.

  • Retrieving the length (len), key membership checking (in), and creating lists (list)

In [ ]:
# Creating a dictionary literal 
test_averages={'test1':84,'test2':76,'test3':94}
In [ ]:
# Accessing a value 
print(test_averages['test1'])
In [ ]:
# Changing a key's value
test_averages['test3'] = 96
In [ ]:
# Now we see the update to the dictionary 
test_averages
In [ ]:
# Using 'in' checks to see if a key inside the dictionary 
print('test2' in test_averages)

# 'Bob' is not a key in the test_averages dictionary
print('Bob' in test_averages)

# This is false because 96 is a value in the dictionary and not a key!
print(96 in test_averages)
In [ ]:
# Number of (key,value) entries in the dictionary 
len(test_averages)
In [ ]:
# Create a list of keys. Use the .keys method to retrieve all the keys 
# in the dictionary 
keys = list(test_averages.keys())
print(keys)
print('---')

# Create a list of values. Use the .values method to retrieve all the keys 
# in the dictionary 
values = list(test_averages.values())
print(values)

Mutability of Dictionaries

Dictionaries are mutable, so you can change, expand, and shrink them in place without making new dictionaries.

In [ ]:
# Lets a define a new dictionary 
D = {'eggs': 3, 'spam':2, 'ham': 1}
D
In [ ]:
# Change an entry, we already saw this above
D['ham'] = ['grill','bake','fry']
D
In [ ]:
# Delete an entry in the dictionary 
# Syntax: del dictionary_name[key]
del D['eggs']
D
In [ ]:
# Add a new entry is just the same syntax as updating an entry 
# We are going to add 'brunch' as a key and the value of 'Bacon'
D['brunch'] = 'Bacon'
D
In [ ]:
### For loop idioms with dictionaries 
test_averages={'test1':84,'test2':76,'test3':94}

# For loop 'in' idiom returns all the keys in the dictionary 
for key in test_averages:
    print(key)
In [ ]:
# Use the .items() method to return the key-value pair entries 
for key,value in test_averages.items():
    fmt_str = f'Key = {key}, value = {value}'
    print(fmt_str)

Dictionary methods

alt text -- Learning Python 2013

Documentation:https://docs.python.org/3/library/stdtypes.html#typesmapping2Source:

Dictionary View Objects

The objects returned by dict.keys(), dict.values() and dict.items() are view objects:

  • Provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes the view reflects these changes.

  • Advantage to these views is that they require a small and fixed amount of memory and processor time.

In [ ]:
dishes = {'eggs':2,'sausage':1,'bacon':1,'spam':500}
In [ ]:
# Keys is a view object of the keys from the dishes dictionary
keys = dishes.keys() 
values = dishes.values() 
items  = dishes.items() 

print(keys)
print(values)
print(items)
In [ ]:
# View objects are dynamic and reflect dictionary changes 

# Lets delete the 'eggs' entry 
del dishes['eggs']

# Notice the both the views have removed key and its value 
print(keys)
print(values)
print(items)

Sets

Set are mutable values that contain an unordered collection of unique and immutable values (no duplicates) and also are:

  • Created by using the either {...} or set(iterable)
  • The len and in and functions can be used to determine the number of items and test membership in sets.
In [ ]:
# Defining a set using '{...}' syntax 
colors={'red','black','white','blue'}
colors 
In [ ]:
# Defining a set using the 'set(iterable)' syntax 
colors2=set(["red","blue","black","blue","blue"])

# Notice there are no duplicates{'black','blue','red'}
colors2

Set Theory Operations

Sets are fundamentally mathematical in nature and contain operations based on set theory. They allow the following operations:

  • Union (union() or |}: A set containing all elements that are in both sets

  • Difference (difference() or -): A set that consists of elements that are in one set but not the other.

  • Intersection (intersection or &): A set that consists of all elements that are in both sets.

In [ ]:
#The following creates a set of single strings 'a','b','c','d','e'
# and another set of single strings 'b','d','x','y','z'
A = set('abcde')
B = set('bdxyz')

print(A)
print("--")
print(B)
In [ ]:
# Union Operation 
new_set = A | B 
print(new_set)
print('---')
new_set = A.union(B) # Same operation as above but using method 
print(new_set)
In [ ]:
# Difference Operation 
new_set = A - B 
print(new_set)
new_set = B.difference(A)
print(new_set)
In [ ]:
# Intersection Operation 
new_set = A & B 
print(new_set)
print('---')
new_set = A.intersection(B) # same operation as above but using method 
print(new_set)

Set Methods

The set object provides methods that support set changes, in-place unions, and deletions of items from the set

Documentation:https://docs.python.org/3/library/stdtypes.html#set13/30

In [ ]:
letters={'a','b','c'} 
In [ ]:
# Add a new item to the set 
letters.add('d') 
letters
In [ ]:
# Merge: This is a in-place union 
letters.update(set(['x','y','a','b']))
letters 
In [ ]:
# Delete on item 
letters.remove('a') 
letters 
In [ ]:
# Simple iteration through a set using a for-loop 
letters = {'a','b','c'} 
for item in letters: 
    print(item)

Immutable Constraints on Sets

  • Sets can only contain immutable (a.k.a. “hashable”) object types.
  • Lists and dictionaries cannot be embedded in sets, but tuples can if you need to store compound values
In [ ]:
S = {1.23}
In [ ]:
# Running this code section will through an error because lists are
# not hashable types 
S.add([1,2,3])
In [ ]:
# Running this code section will through an error because dictionaries are
# not hashable types 
S.add({'a':1})
In [ ]:
# This is the same for dictionary types. The keys must be immutable 
# therefore a key cannot be a list or another dictionary. 
d = {'a': 1, 'b':2}
d[['a']] = 3
In [ ]:
# However, tuples work because they immutable. 
S.add((1,2,3))
S
In [ ]:
# Sets themselves are mutable too, and so cannot be nested 
# in other sets directly.
S.add(set((1,2,3)))
S
In [ ]:
# Use the built-in 'frozenset' function (creates an immutable set) 
# that can embedded other sets. Works just like sets
f_set = frozenset([1,2,3])
print(f_set) 

print("---")
S.add(f_set)
print(S)

Video 2.2: Python Objects & References

Video Outline

  1. Overview Python objects
  2. Referencing and copying Python objects
  3. Different ways to assign names to objects

Python Objects, References & Assignments

All data/values in Python takes the form of objects, which are pieces of memory with values and sets associated operations.

  • For example, Numeric objects (ints, float, complex) with their assoicated artmetic operations (+,-,/).

Types Live With Objects, Not Variables

  • Variables in Python are referred to as names or identifiers
  • Name binding is the association of a name with an object (value)

    alt text -- Learning Python 2013

  • A names does not uniquely identify an object!

Types Live With Objects, Not Variables

alt text -- Learning Python 2013

Shared References

Setting a variable to a new value does not alter the original object, but rather causes the variable to reference an entirely different object.

In [ ]:
x = 10 
y = x 
x = 20 
y

Be careful when working with mutable objects and in-place changes:

In [ ]:
x = [1,2,3]
y = x 
x.append(4)
print(y)
print(x)

Identity

The built-in id(...) function returns the identity of an object, which is an integer value guaranteed to be unique and constant for lifetime of object

In [ ]:
x = "MPCS"
print(id(x)) # Unique integer-value for the object pointed by x

In the CPython Interpeter (i.e., the one we are using in this class), it is the address of the memory location storing the object.

Objects having the same value can have different identities:

In [ ]:
fruit1 = ('Apples', 4)
fruit2 = ('Apples', 4)

print(f'Fruit1 id = {id(fruit1)}\nFruit2 id = {id(fruit2)}')

Equality vs. Identity

Two different ways of testing for ``equality":

  • Equality operator(==): Returns true if two objects are equal (i.e., have the same value)
  • Identity operator(is): Returns true if two objects identities are the same:
          a is b <==> id(a) == id(b) 
In [ ]:
a = [1, 2, 3]
b = [1, 2, 3]
a == b 
In [ ]:
print(id(a))
print(id(b))
print(a is b) # The id values are different 

Object Creation

Each time you generate a new value in your script by running an expression, Python creates a new object (i.e., a chunk of memory) to represent that value. -- Learning Python 2013

Not fully true, CPython caches and reuses some immutable objects to save time and memory:

In [ ]:
# CPython caches small integers 
a = 1000 
b = 1000 

# Makes sense two different integer objects so they have differe ids
print(a is b)  

a = 100 
b = 100 

# However, for small integer objects, CPython caches them 
# this means that a and b point to the same object 
print(a is b) 
In [ ]:
# CPython does the same for short strings 
str1 = 'MPCS'
str2 = 'MPCS'
str1 is str2 

Copying Objects

If y = x does not make a copy, how does one get a copy?

  • The copy module provies functions for generating shallow and deep copies:
    • Shallow copy: constructs a new compound object and then inserts references into it of the objects found in the original.
    • Deep copy: constructs a new compound object and then, recursively, inserts copies into it from the objects found in the original object.
In [ ]:
# Shallow Copy example 
import copy
x = [[1, 2], [3, 4]]
y = copy.copy(x)
print(x is y)
print(y[0] is x[0]) 
In [ ]:
# Deep copy example 
z = copy.deepcopy(x)
z[0] is x[0]

Slicing a list gives you a shallow copy mylist[:]

Garbage Collection

Whenever a name is assigned to a new object, the space held by the prior object is reclaimed if it is not referenced by any other name or object. This automatic reclamation of objects' space is known as garbage collection -- Learning Python 2013

Behind the scenes, an object has two header fields:

  • Type designator: The object's type (a pointer to an object of type type)
  • Reference counter: Count of names/objects referencing the object

Python interpreter will reclaim an object's memory exactly when the objects' reference count drops to zero.

  • Reclaimed memory is free to be used for future objects.

Deleting Names

Use the del statement to explicitly remove a predefined variable

In [ ]:
x = ['Intel', 'AMD', 'Apple']
y = x
del x
y
In [ ]:
# Trying to access x will cause an error because you deleted the name.
x
In [ ]:
# Deleting a name does not delete an object. 
# Deletion just decreases the reference count for the 
# associated object.  
del y[1:]
y

Multiple Assigment Summary

alt text -- Learning Python 2013

In [ ]:
 
In [ ]:
# Deep Nesting example 
((lst,num),y), letter  = [((['a','b'],3), 4), 'a'] 
print(lst)
print(num) 

Video 2.3: Functions (cont.)

Video Outline

  1. Different ways to define a function header
  2. Different ways to call a function with arguments

Functions (cont.)

The execution of a function is done by calling it.

  • Call the function by specifying the name followed by any arguments to the function embedded within parenthesis.
In [ ]:
import math 

def distance(pt1,pt2):
    x1,y1 = pt1 
    x2,y2 = pt2 
    return math.sqrt(math.pow(x2-x1,2) + math.pow(y2-y1,2))

distance((1,2),(3,4))

Arguments Matching: Function Header

Python provide multiple ways to define arguments. You can pass arguments by name, with default values, and use collectors for extra arguments.

By default, arguments are matched by position, from left to right,and you must pass exactly as many arguments as there are argument names (i.e., parameters) in the function header.

In [ ]:
# Example of Positionally required arguments 
def func(a,b,c):
    print(a,b,c) # prints: 1 Hey! Bye 

func(1, "Hey!", "Bye")

Python allows for default arguments, which allows you to specify a value when defining the argument name. You are not required to specify values for default arguments when calling the function.

In [ ]:
# In this example, arguments b and c are default arguments. When 
# calling the function 'func' are not required to provide a value. If
# a value is not given then its default value will be assigned to the 
# argument. 
def func (a, b=1, c='spam'):
    print(a,b,c)

func(33) # Calling func, a=33, b = 1, c='spam'
func(33,23) # Calling func, a = 33, b = 23, c = 'spam'

Argument Matching: Function Header

You must place default arguments after non-default arguments.

In [ ]:
# This causes an error because 'b' is a optional argument but 'c' is 
# a positional argument. All positional arguments must be placed before
# optional arguments. 
def func(a,b='spam',c):
    print(a,b,c)    

Argument Matching: Caller

The caller can provide arguments by position or by keyword:

  • Positional arguments: matched from left to right.

  • Keywords: matched by argument name (name=value)

In [ ]:
def func(a, b=1, c='spam'):
    print(a,b,c)
In [ ]:
# Calling the function 'func' with all positional arguments 
func(4,4,4) # a = 4, b = 4, c = 4 
In [ ]:
# Calling the function 'func' with specifying 
# the first two positional arguments. 'c' is assigned its default value
func(4,4) # a = 4, b = 4, c = 'spam'
In [ ]:
# Calling the function 'func' with specifying 
# the only one positional arguments. 
# 'b' and 'c' are assigned their default value
func(4) # a = 4, b = 1, c = 'spam'
In [ ]:
# This causes an error because I MUST assign a value to 'a' because 
# its not a default argument so 'func' must take in at least one value
func() 
In [ ]:
# You can also assign a value to an argument by using keyword argument
# syntax. 
func(a='Bob',b='Sally',c='Joe')
In [ ]:
# Does not have to be in the same order specified in the function 
# header 
func(b='Sally', c='Joe', a='Bob')
In [ ]:
# HOWEVER, you must specify a value for all arguments that do have a 
# default via a mixture of positional or keyword arguments. 
func(4, c='Ni')
In [ ]:
# This causes an error since we did not give 'a' a value since its not 
# a default argument 
func(b='Sally',c='Joe')

Collecting Keyword Arguments

Python allows functions to collect arbitarily many poistional or keyword arguments:

  • Preceded by one or two asterisks
  • Arguments are collected into a tuple.
In [ ]:
# The reduce_add function takes in any number of objects and 
# args is a tuple of the passed in objects. 
def reduce_add(*args):
    s = 0 
    for x in args:
        s += x 
    return s 

# *args = (1.0,2.0,3.43)
print(reduce_add(1.0,2.0,3.43))

# *args = (0.0,0.0,3.43)
print(reduce_add(0,0,3.43))

Collecting Keyword Arguments

Python allows functions to collect arbitrarily many positional or keyword arguments.

In [ ]:
def reduce_add2(a,b=2, *args):
    s = 0
    for x in args:
        s += x
    return s

# a = 0, b = 0, *args = (3.43,), prints: 3.43
print(reduce_add2(0,0,3.43))

# a = 0, b = 3.43, *args = (), prints: 0.0  
print(reduce_add2(0,3.43))

# a = 1, b = 2, *args = (1,3.43), prints: 4.43 
print(reduce_add2(1,2,1,3.43)) 
Collecting Keyword Arguments

Using ** allows callers to pass key/value pairs as individual keywords. They are collected as a dictionary.

In [ ]:
def make_dict(arg="Bob", **kwargs):
    print(kwargs)
    for key, value in kwargs.items():
        print(f'{key} is {value}')

# arg = "Bob", 
# **kwargs = {'name': 'John', 'course': 'Python', 'age': 25}
make_dict(name='John', course='Python', age=25)

# arg = "Sally", 
# **kwargs = {'name': 'John', 'course': 'Python', 'age': 25}
make_dict(name = 'John', course='Python', age=25, arg='Sally')

# arg = 1, 
# **kwargs = {'name': 'John', 'course': 'Python', 'age': 25}
make_dict(1, name='John', course='Python', age=25)

Unpacking arguments

Callers can also use:

  • *iterable: to pass all objects in the iterable object as individual positional arguments

  • **dict: to pass all key/value pairs in dict as individual keyword arguments.

In [ ]:
def double(x, y, z):
    return 2*x, 2*y, 2*z

point = (3, -2, 7)

# x = 3, y = -2, z = 7
print(double(*point)) # prints: (6, -4, 14)
In [ ]:
point = [2,2,2]

# x = 2, y = 2, z = 2
print(double(*point)) # prints: (4, 4, 4)
In [ ]:
point = [2,2,2,44]
#print(double(*point)) # Error too many arguments given
In [ ]:
def double(x, y, z):
    return 2*x, 2*y, 2*z

point = {'x': 0, 'y': 3, 'z': 4}
# x = 0, y = 3, z = 4
print(double(**point)) # prints: (0,6,8)
In [ ]:
point = {'t': 0, 'y': 3, 'z': 4}
#print(double(**point)) # Error, no keyword arg for x 
In [ ]:
point = {'x': 0, 'y': 3, 'z': 4, 'p': 3}
#print(double(**point)) # Error, no function arg for p

Argument Matching Summary

alt text -- Learning Python 2013

Week 2 Discussion Session

Agenda

  1. Announcements
  2. Demo: Running Python files as Programs vs Modules
  3. Demo: Command-line Arguments
  4. File I/O
  5. Python Interpreter Explained
  6. Open Discussion

File I/O

Often times when programming, you may need to read some data from a file. The common pattern with working with file data is the following.

  1. Read the contents of a file (or files) and load them into the program (sometimes referrred to as "loading into memory").

    • This means we read in the contents from a file and place it inside of a data structure for further processing.
  2. Manipulate the data in some way.

  3. (Optional) Write the data back to disk.

Note: When working with large files, it may be more efficient to slectively access parts of the file (to avoid reading in the whole dataset into memory); however, we will not worry about that case.

Reading from a File

--> This is the content of the file uchicago-emails.txt

bob@uchicago.edu 
philb@uchicago.edu 
sally@uchicago.edu 
rebeccaw@uchicago.edu 
joan@uchicago.edu

This is a text file with five lines. Files can also contain binary data, but we will not work with that format for now.

To access the contents of a file, we first need to open it:

In [1]:
f = open('uchicago-emails.txt')

f is a file object. Here are some command operations on file objects.

Reading the entire contents of the file

In [2]:
emails = f.read() 
In [12]:
emails
Out[12]:
'bob@uchicago.edu \nphilb@uchicago.edu \nsally@uchicago.edu \nrebeccaw@uchicago.edu \njoan@uchicago.edu'

As you can see .read() returns the entire contents of the file as a string string.

In [5]:
print(emails) 
bob@uchicago.edu 
philb@uchicago.edu 
sally@uchicago.edu 
rebeccaw@uchicago.edu 
joan@uchicago.edu

When reading from a file, the OS keeps track of the position it has read. In this case, the file pointer has already reached the end-of-file (or EOF). So, if I call read() again, I don't get the contents of the file.

In [6]:
more_data = f.read()
In [8]:
print(more_data)

When you reach EOF, read() returns the empty string.

Closing a File

You should close a file when you no longer need it by calling the close() method on the file object.

Note: Once you close the file you no longer have access to it.

In [9]:
f.close() 

Why is it important to close a file?

  1. It frees resources associated with the file
  2. When writing to a file, it ensures the actual contents get written to disk.

You can also use the with statement to ensure that file is closed once we're don with it.

Note: This is the more common way to work with a file in Python.

In [10]:
with open('uchicago-emails.txt') as f: 
    emails = f.read() 
    print(emails.split())
['bob@uchicago.edu', 'philb@uchicago.edu', 'sally@uchicago.edu', 'rebeccaw@uchicago.edu', 'joan@uchicago.edu']

At the end of the with block, file f is automatically closed.

Reading lines within a File

A file object is an iterable object so you can use for loop to iterate over its contents:

In [11]:
with open('uchicago-emails.txt') as f:
    for line in f: 
        print(line)
bob@uchicago.edu 

philb@uchicago.edu 

sally@uchicago.edu 

rebeccaw@uchicago.edu 

joan@uchicago.edu

Why the extra empty line?

  • Each line includes a newline at the end, and print() adds a newline as well.

When reading lines from a file, we will usually want to use the strip() method to remove any leading and trailing whitespace (including newlines):

In [13]:
with open("uchicago-emails.txt") as f:
    for line in f:
        print(line.strip())
bob@uchicago.edu
philb@uchicago.edu
sally@uchicago.edu
rebeccaw@uchicago.edu
joan@uchicago.edu

Loading data into a data structure

In [14]:
emails = []
with open("uchicago-emails.txt") as f:
    for line in f:
        email = line.strip()
        emails.append(email)
In [15]:
emails
Out[15]:
['bob@uchicago.edu',
 'philb@uchicago.edu',
 'sally@uchicago.edu',
 'rebeccaw@uchicago.edu',
 'joan@uchicago.edu']

Writing data to a file

You can use the open(..) function from before to write to a file. You specify an additional argument (i.e., the mode) with the string 'w'.

In [16]:
with open("names.txt", "w") as f:
    f.write("Bob\n")
    f.write("Phil\n")
    f.write("Sally\n")
    f.write("Rebecca\n")
    f.write("Joan\n")    

Very important: If you open an existing file in write mode, it will wipe all its contents! If you open a file that doesn't exist, it will create the file.

You can use open("a_file.txt','a') to append to a file

In [17]:
with open("names.txt", 'a') as f:
    f.write("Tim\n")   

Tip: Use the print function to include the newline by default.

In [18]:
with open("names2.txt", "w") as f:
    print("Bob", file=f)
    print("Sally", file=f)
    print("Joe", file=f)

Reading, Manipulating, and Writing Data

In [19]:
# Load data into a data structure (a list of strings)
emails = []
with open("uchicago-emails.txt") as f:
    for line in f:
        email = line.strip()
        emails.append(email)
        
# Transform the data
cnetids = []
for email in emails:
    cnetid, domain = email.split("@")
    cnetids.append(cnetid)
    
# Write the data
with open("uchicago-cnetids.txt", "w") as f:
    for cnetid in cnetids:
        print(cnetid, file=f)

Other Useful File Methods

alt text -- Learning Python 2013

Interpreter Explained

What’s happening within the interpreter?

  1. The interpreter compiles source code to byte code

    • Source code: the Python code that is place in the .py files
    • Byte code: lower-level, platform-independent representation of the source code
    • Caches already compiled bytecode to speedup execution by saving it inside byte-codefiles(.pyc)are stored in pycache directory.
  2. Next, interpreter routes the byte code to the Python Virtual Machine (PVM)

    • The PVM component is a loop that iterates through the byte code instructions, executing each one

Other Useful File Methods

alt text -- Learning Python 2013

Interpreted vs. Compiled Languages

Languages such as C, and C++ are compiled languages, which means their source code is ultimately translated into machine (native) code before execution.

  • Machine code runs directly on the CPU.
  • Machine code is specific to the architecture (Intel, Arm, etc.) of that machine.
  • Interpreted languages (Python, Java, etc.) "compile" to bytecode.
    • Once translated, bytecode is ran by a virtual machine.
    • Unlike compiled languages, bytecode is portable to other architectures.