Numpy¶
Numpy is a Python library that supports multi-dimensional arrays.
You need to import Numpy, before you can use it. It is traditional to give the library a shorter name using the import-as mechanism:
import numpy as np
Once this import is done, you can use to functions from the numpy
library using np as the qualifier.
Getting started¶
To get started, open up a terminal and navigate (cd) to your
cs121-aut-16-username directory. Run git pull upstream master
to collect the lab materials and git pull to sync with your
personal repository. The lab6 directory contains a file named
lab6.py.
This file includes a function, read_file, that takes the name of a
CSV file as an argument and returns a list of the column names and a
two dimensional array of data, and a call to the function that loads
the training data from the city dataset for PA #5.
Fire up ipython3 and run lab6.py to get started. This will print
out some output which you can ignore for now.
One-dimensional arrays in numpy¶
We’ll start by looking at one-dimensional arrays in Numpy. Unlike
Python lists, all of the values in a Numpy array must have the same
type. We can create a one-dimensional numpy array from a list using
the function np.array. For example,
In [10]: a1 = np.array([10, 20, 30, 40])
In [11]: a1
Out[11]: array([10, 20, 30, 40])
We can compute length and shape of of this array as follows:
In [12]: len(a1)
Out[12]: 4
In [13]: a1.shape
Out[13]: (4,)
And we can access/update the ith element of the array using the [] notation:
In [14]: a1[0]
Out[14]: 10
In [15]: a1[2]
Out[15]: 30
In [17]: a1[2] = 50
In [18]: a1
Out[18]: array([10, 20, 50, 40])
Operations on numpy arrays are element-wise. For example, the expression:
In [23]: a1*2
Out[23]: array([ 20, 40, 100, 80])
yields a new numpy array where the ith element of the result is equal
to the ith element of a1 times 2. Similarly,
In [25]: a1
Out[25]: array([10, 20, 50, 40])
In [26]: a2 = np.array([100, 200, 300, 400])
In [27]: a1+a2
Out[27]: array([110, 220, 350, 440])
yields a new array where the ith element is the sum of the ith
elements of a1 and a2.
Numpy also provides useful methods for operating on arrays, such as
sum and mean:
In [28]: a1.sum()
Out[28]: 120
In [29]: a1.mean()
Out[29]: 30.0
which add up the values in the array and compute its mean respectively. These operations can also be written using notation that looks more like a function call:
In [32]: np.mean(a1)
Out[32]: 30.0
In [33]: np.sum(a1)
Out[33]: 120
Task 1: Write a function:
def var(y):
that computes the variance of y, where y is a numpy array. We
will define variance to be:
where \(\bar y\) denotes the mean of all y‘s. Your solution should not include an explicit loop.
And then run it on graffiti, which contains the graffiti column
from the city data set and garbage which contains the garbage
column from the city data set . Here’s the output of our
implementation:
GRAFFITI 698987.140898
GARBAGE 4511.14890401
Two-dimensional arrays¶
One-dimensional arrays are useful, but the real power of numpy becomes more apparent when working with data that looks more like a matrix. For example, here’s a matrix represented using a list-of-lists:
m = [[0, 1, 4, 9],
[16, 25, 36, 49],
[64, 81, 100, 121],
[144, 169, 196, 225],
[256, 289, 324, 361],
[400, 441, 484, 529]]
We can convert this data into a two-dimensional array as follows:
In [34]: b = np.array(m)
where the value of b will be:
In [34]: b
array([[ 0, 1, 4, 9],
[ 16, 25, 36, 49],
[ 64, 81, 100, 121],
[144, 169, 196, 225],
[256, 289, 324, 361],
[400, 441, 484, 529]])
Accessing elements of a 2D numpy array can be done using the same
syntax as a 2D list, that is, the expression b[i][j] will yield
the jth element of the ith row of b. More conveniently, you can
use a tuple to access the elements of a numpy array. That is, the
expression b[i, j] will also yield the jth element of the ith row
of b.
Numpy arrays also support slicing and other more advanced forms of
indexing. For example, the expression b[1:4] will yield:
In [35]: b[1:4]
array([[ 16, 25, 36, 49],
[ 64, 81, 100, 121],
[144, 169, 196, 225]])
rows 1, 2, and 3 from b. The expression, b[1:4, 2:4] will
yield columns 2 and 3 from rows 1, 2, and 3 of b:
In [36]: b[1:4, 2:4]
array([[ 36, 49],
[100, 121],
[196, 225]])
As with slicing and lists, a colon (:) can be used to indicate
that you wish to include all the indices in a particular dimension.
For example, b[:,2:4] will yield a slice of b with columns 2
and 3 from all the rows.
In addition to slicing, you can also specifies a list of indices as an
index. For example, the expression: b[:, [1,3]] will yield
columns 1 and 3 from b:
In [37]: b[:, [1,3]]
array([[ 1, 9],
[ 25, 49],
[ 81, 121],
[169, 225],
[289, 361],
[441, 529]])
One thing to keep in mind with Numpy arrays, you will lose a dimension if you specify a single column or row as an index. For example, notice that the results of the following two expressions are both one-dimensional arrays:
In [38]: b[1,:]
Out[38]: array([16, 25, 36, 49])
In [39]: b[:,1]
Out[39]: array([ 1, 25, 81, 169, 289, 441])
If you wish to retain the dimension, you can use list indexing:
In [40]: b[:,[1]]
Out[40]:
array([[ 1],
[ 25],
[ 81],
[169],
[289],
[441]])
In [41]: b[[1], :]
Out[41]: array([[16, 25, 36, 49]])
Task 2: Write expressions to extract the following subarrays of b,
which is defined for you in lab6.py:
- rows 0, 1, and 2.
- rows 0, 1, and 5
- columns 0, 1, and 2
- columns 0, 1, and 3
- columns 0, 1, and 2 from rows 2 and 3.
Task 3: We have imported the linear_regression function from PA #5
in lab6.py. Write code to call linear_regression using
columns 2 (RODENTS) and 3 (GARBAGE) as the value for X and column
7 (CRIME_TOTALS) as the value for Y. This function expects a
two-dimensional numpy array for the value of X and a
one-dimensional numpy array for the value of Y.
Hint: you can do this task in a single line of code.
The result should be:
array([ 141.5257303 , 0.54039996, 16.43439557])
Task 4: Write code to call linear_regression using column 0
(GRAFFITI) as the value for X and column 7 (CRIME_TOTALS) as the
value for Y.
The result should be:
array([ 1.07474236e+03, 5.60420145e-01])
Other useful operations¶
You can find the number of dimensions, shape, and the number of
elements in a numpy array using the ndim, shape and size
properties respectively.
In [42]: b.ndim
Out[42]: 2
In [43]: b.shape
Out[43]: (6, 4)
In [44]: b.size
Out[44]: 24
As noted above, you can compute the mean of the elements using the
mean method.
In [54]: b.mean()
Out[54]: 180.16666666666666
You can also compute the per-column mean and the per-row mean using the mean method by specifying an axis, where 0 is the column axis and 1 is the row axis:
In [55]: b.mean(0)
Out[55]: array([ 146.66666667, 167.66666667, 190.66666667, 215.66666667])
In [56]: b.mean(1)
Out[56]: array([ 3.5, 31.5, 91.5, 183.5, 307.5, 463.5])
When Finished¶
When finished with the lab please check in your work (assuming you are inside the lab directory):
git add lab6.py
git commit -m "Finished with lab6"
git push
No, we’re not grading this, we just want to look for common errors.