from IPython.display import HTML, Image
HTML(open("notes.css", "r").read())
allowExceptions = True # enable to allow the notebook to flag exceptions
from demo_array import demoArray # helpful package for demos (see below)
This lecture is really about numpy
.
Web Site:
Book:
Bressert, SciPy and NumPy, O'Reilly, 2013.
In the beginning, there were Python lists and tuples, which were way too slow for numerical computation.
Shortly after the beginning, there was Numeric
, which added the
array
class.
1-D arrays look like lists, except:
they are of fixed size
all elements are of the same type or class
Meanwhile, Numeric
development fell off and another group
developed a package numarray
, which was mostly compatible with
Numeric.
A few years later, Travis Oliphant was working on SciPy
(Scientific
Python) and decided to unify everything. The result was numpy
, which is now
almost universally used for Python arrays and number crunching in general.
numpy
is mostly compatible with Numeric
and numarray
,
has a lot of speedups, and (mainly) is currently maintained.
Under the hood, numpy
uses the standard (and optimized) BLAS and LINPACK libraries
for array and linear algebra manipulation.
numpy
makes Python code almost as fast as C or Fortran. It is now nearly universally used for big data in Python.
It gives you all the power of MATLAB without:
licensing fees
antiquated (i.e. FORTRAN) syntax
bizarre modular conventions
1-based arrays
and you're using the same Python you use for the rest of your application.
Arrays have arbitrary dimensionality (number of indices).
In C: a[i][j]
can be either an 2D C array reference or a double
pointer reference.
In Python, it would be a reference to the j
th element of the i
th
element of list a[][]
.
One problem is, there's no guarantee that all elements of a[][]
have
the same size:
a = [[1,2,3], [4], [5,6]]
So array
s have a shape
attribute, which gives its size in
each dimension.
This is a fairly standard way to import numpy:
import numpy as np
np.set_printoptions(precision=3, linewidth=128) # for this notebook, keep float precision small
The underlying (effectively abstract) base class is ndarray
, but you never use this explicitly. Here are some functions that initialize arrays for you.
zeros(shape[, dtype])
dtype
refers to the "data type" of the array.
a = np.zeros((3,4), dtype=int)
a
print(a)
Note the subtle difference between lists and arrays and ndarray.__str__()
vs ndarray.__repr__()
: no commas in the latter.
a.__class__
Note:
shape
is a tuple of integers (e.g. (3,4)
)It becomes a the shape
attribute of the ndarray
:
a.shape
len(a.shape)
is the number of dimensions in a
. It can be any
positive integer.
shape[d]
is the number of rows, columns, or whatever in dimension d
.
dtype
is a type identifier:
float, float32, float64
int, int16, int32, ...
long, ...
complex, ...
bool, ...
note: Types are more C-like than Python-like. They are not infinite precision.
All elements of an ndarray
have the same fixed type, although if dtype
is object
, you can put arbitrary Python objects in the array. (This is generally not done, as it slows down Numpy.)
The number of elements in an array is fixed. (Although its shape is not -- see below.)
c = np.zeros((1, 2, 3), 'float128')
c
Other filled array creation functions:
ones(shape[, dtype])
np.ones((5,2,3), float)
empty(shape[, dtype])
If you're going to be initializing all of the array elements yourself, you can save a bit of time by allocating the array without initialization using empty()
:
np.empty((2,2))
array(sequence[, dtype])
Create arrays explicitly from Python sequences with array()
:
np.array(
((1,2,3),
(4,5,6))
)
Think of it as a conversion function.
Remember the dtype
("data type") argument if you want a specfic type:
np.array(((1,2,3),(4,5,6)), dtype=np.float)
np.array(((1,2.0,3),(4,5j,6)))
Those of you who have been missing single-precision floats:
a = np.array(((1,2,3),(4,5,6)), dtype=np.float32)
a
You can always print out an array's dtype
.
print(a.dtype)
ones()
, zeros()
, empty()
, and array()
are all examples of factory functions. You never actually instantiate ndarray
.
ndarray
Attributes¶ndarray
s have several attributes. Here are some useful ones.
flat
In Py3K, this returns an iterator.
if a
is N
x M
x Q
, a.flat has N*M*Q
elements
a = np.array(((1, 2, 3), (4, 5, 6)))
a.flat
for el in a.flat:
print(el)
list(a.flat)
ndim
is the number of array dimensions (i.e., the length of the shape
attribute)a
a.ndim
T
is the transpose of the ndarray
. (This only does something for 2D ndarray
s.)
a
a.T
Other attributes tell you things about the internal storage of the ndarray
, which can be useful in optimizing your code for large data.
ndarray
s¶Start again with a typical a
:
a = np.array(((1, 2, 3),
(4, 5, 6)))
a
Normal scalar arithmetic operations work element-by-element, seemingly in parallel:
a + a
a - a
4 * a
a * a
This is not the matrix multiplication you learned about in linear algebra class. That's done differently.
a / a
Note the Python3 convention of int
/ int
-> float
.
a**2
Arrays are objects and have methods such as transpose:
a.transpose()
or, in shorthand:
a.T
Here's what happens when you try to combine two arrays of different shapes. (There's an exception to this we'll talk about later.)
if allowExceptions:
a + a.T
We'll talk about what "broadcast" means later.
Comparison also works element-by-element:
a
b = np.array(((3, 2, 1),
(6, 5, 4)))
b
a <= b
Because of operator semantics, we can't use logical operators:
if allowExceptions:
a <= b and a - b > b
So the and
operator complains, but there are functions that apply and
element-by-element:
np.logical_and(a <= b, a - b > b)
Let's suppose you want to see if there are any elements in a
greater than 3. You can do this:
a > 3
But if you want to use this in a conditional, this won't work:
if a > 3:
print("a exceeds 3")
The trouble here is that you're not thinking about what you want. Do you mean if any value of a
exceeds 3? If so, use:
if (a > 3).any():
print("at least one element of a exceeds 3")
Or do you mean if all values of a
exceed 3? If so, use:
if (a > 3).all():
print("all values of a exceed 3")
else:
print("not all values of a exceed 3")
As usual, for numerical quantities, zero is False
and anything else is True
.
c = np.array((1, 0, 0, 3, 9))
c.any()
There's an arange()
function that works like range()
np.arange(0, 10)
Unlike range()
, it permits floating point:
np.arange(0.0, 1.0, 0.125)
but it doesn't always do what you expect:
np.arange(0.0, 1.0, 0.2)
np.arange(0.0, 1.0, 0.1999999999999)
A little roundoff error in the "step" (third) argument can produce a substantially different result.
You're often better off using linspace()
, which provides the number of samples you want and guarantees to include the endpoints.
np.linspace(0.0, 1.0, 5)
(This reflects couth programming practice of never letting a loop be governed by incrementing floating point values.)
Start again with a 2 x 3 array a
:
a = np.array(range(1,7))
a.shape = (2, 3)
a
a.shape
Recall the transpose:
a.T
Contrast this with
a.shape = (3,2)
a
or even
a.shape = (6,1)
a
... you don't even have to keep the same dimensionality ...
a.shape = (3,2,1)
a
... as long as the number of elements remains constant:
a.shape = (3,1)
Reshaping ndarray
s like this is fast: No actual data is copied, the shape is just changed.
print(' a:\n', a)
print()
print('hex(id(a)):', hex(id(a)))
print()
a.shape = (3,2)
print('reshaped a:\n', a)
print()
print('hex(id(a)):', hex(id(a)))
But if you want a copy of the array with a different shape, use resize()
:
print(' a:\n', a)
aResized = np.resize(a, (2,3))
print(' aResized:\n', aResized)
print(' hex(id(a)):', hex(id(a)))
print('hex(id(aResized)):', hex(id(aResized)))
print(' aResized is a:', aResized is a)
1-D ndarrays
are indexed by int
s, starting at 0, just like other Python sequences:
if 1:
b = np.array((43, 4, 17, -5))
else:
b = [43, 4, 17, -5]
b[2]
and slicing works the same way ...
c = b[1:3]
c
... up to a point:
b[2] = -6
c
c[0] = 999
b
See the difference? Unlike strings, tuples, and lists, slicing an ndarray
does not create a copy: It just makes a reference back to the original ndarray
. This saves time and memory and in array processing, this is often what you want anyway.
So here are some more array access details:
If this were a list of lists, you'd want a[i][j]
, as in C/C++.
For a 2-D array a
, a[i,j]
is the entry of a
on row i
, column j
(i=0
, j=0
at the upper
left).
Python interprets a[i,j]
as "given the object a, invoke its
__getindex__()
method with the tuple (i,j)
. This is true regardless of
the number of dimensions, assuming it's greater than 1.
Slicing works with multidimensional arrays. To see the effect of slicing, let's introduce a helper function demoArray
, exported by the demo_array
module available in the course demos:
Aside: In C/C++ we'd use a[i][j]
for a 2D array.
if 1: # use a demo array
a = demoArray('a', (4,5))
else: # try the following with a real (int) array
a = np.arange(20)
a.shape = (4,5)
print(a)
Leaving off an index implies the whole row:
b = a.copy() # We'll copy a[] to leave it unchanged.
b
b[0] = b[2]
print(b)
And we can also use ":" to specify exactly which dimension we want to use all of:
b = a.copy()
b[:,0] = b[:,2] # copy column 2 to column 1
print(b)
As always, a negative index has the size of the array (in that dimension) added to it:
b = a.copy()
b[:,0] = b[:,-1] # copy last column to first columns
print(b)
Ellipses allow you to slice off the last dimension of the array without knowing how many dimensions there are:
b = demoArray('b', (2,5,2))
b
print(b[..., 1])
Which is the same as:
print(b[:,:,1])
Broadcasting applies scalar operations to every element of an array.
a = np.arange(16*8)
a.shape = (16,8)
print(' a:\n', a)
print('2*a:\n', 2*a)
Types may be promoted as in C:
a+1.0
We may broadcast assignment values:
%%time
b = a.copy()
b[1:-1,1:-1] = -1 # set the "interior" of b to -1
print(b)
We can broadcast a scalar value to a slice:
b = a.copy()
b[0] = 0
b
%%time
b = a.copy()
b[:,2] = -1
b
When using numpy, minimize for
loops. Use array operations instead. Setting the interior of an array to -1 with the following ...
%%time
b = a.copy()
(n, m) = b.shape
for i in range(1, n-1):
for j in range(1, m-1):
b[i, j] = -1
print(b)
... has the same result as above, but is slower and not really taking advantage of the power of numpy
slicing. In general, using an explicit for
loop on a numpy array is unpythonic.
Numpy provides "universal functions": versions of functions normally found in the math
module that act on arrays element-by-element:
b = a.copy()
np.sin(b)
b = a.copy()
np.exp(-b)
Ufuncs always return a new array, they never fill the old one in.
ndarray
Methods¶For what follows, let's declare a couple of vectors (arrays with one dimension):
v0 = np.array((1, 2, 3))
v1 = np.array((4, 5, 6))
dot()
computes an inner (or "dot") productv0.dot(v1)
Is this correct?
This is also how matrix-vector multiplication works. ("dot" is more properly referred to as "inner product".):
a3x3 = np.arange(9) # warning: singular
a3x3.shape = (3,3)
print(a3x3)
print(a3x3.dot(v0))
More consistently with math notation than a dot()
method is the @
operator, which means about the same thing. (Implemented by the __matmul__()
magic method):
v0 @ v1
print(a3x3 @ v0)
tolist()
converts an array to a list. Avoid this when you can: It can be expensive.
a.tolist()
numpy
Functions¶Here are some other functions in numpy
that do useful things
fromfunction(f, shape)
creates an array with shape shape
by evaluating f(i,j,...)
for each tuple of indicies (i,j,...)
def f(i,j):
return 100 * i + j
np.fromfunction(f, (5,5), dtype=int)
center = (2,2)
def paraboloid(x, y):
z = (x-center[0])**2 + (y-center[1])**2
return z
paraboloid(1,1)
print(np.fromfunction(paraboloid, (5,5)))
In both of the above examples, note that the functions look just like normal functions. That's because they are. (Duck typing strikes again!)
When we call paraboloid()
via fromfunction()
they are only called once, not for every array element. Instead fromfunction()
is passing them arrays x
and y
and numpy
knows what to do with arrays as part of expressions.
Duck typing strikes again!
Aside: To visualize large arrays, you can display them as images using pylab
.
%pylab inline
center = 200,200
parabolicImage = np.fromfunction(paraboloid, (200,200))
imshow(parabolicImage, cmap='plasma')
identity(n)
returns an n
x n
identity matrix
np.identity(8)
where(cond, x, y)
array form of a conditional
for each (i,j,...)
if cond(i, j, ...):
a[i, j, ...] = x(i, j, ...)
else
a[i, j, ...] = y(i, j, ...)
a
mask = np.greater(a, 37)
mask
np.where(mask, a, -a)
numpy.linalg
Subpackage: Linear Algebra¶This has all of the stuff you've forgotten from the linear algebra class you took. It uses the standard BLAS
and LAPACK
packages (written in very optimized C). Import it with:
import numpy.linalg as la
a3x3 = np.arange(9)
a3x3.shape = (3,3)
norm(v)
returns the ($L^2$) norm of a vector
v0
la.norm(v0)
It's the same as...
(v0 @ v0)**0.5
inv(array)
returns the inverse of the array.
a = np.array((
(2, 1, 0),
(1, 2, 1),
(0, 1, 2)
))
aInv = la.inv(a)
aInv
a @ aInv
solve(a, b)
gives x
where a @ x = b
(i.e. $\mathbf{a}\mathbf{x}=\mathbf{b}$)
x = la.solve(a, v0)
x
a.dot(x) - v0
eigvals(a)
gives a
's eigenvalues
la.eigvals(a)
eig(a)
gives the eigenvalues and eigenvectors of a
as a tuple
la.eig(a)
lstsq(a, b)
computes the "least-squares" solution x
to a.dot(x) = b
(i.e. $\mathbf{a}\mathbf{x}=\mathbf{b}$). Unlike solve()
, this works
even if a
is not square.
a = np.array(((1, 2, 3), (4, 5, 6)))
b = np.array((-1, 1))
(x, resids, rank, singvals) = la.lstsq(a, b, rcond=None)
FYI, the return tuple is
x
the residuals
the rank (number of independent dimensions)
the singular values
numpy.matrix
Subpackage: Matrices¶separate class Matrix
subclass of ndarray
strictly 2D
focusses on linear algebra operations
ongoing issue among numpy
cognoscenti: Should matrix
be deprecated?
matrix-specific design is convenient
installed base
much overlap with (say) linalg
I say:
Use numpy
unless you find something in matrix
you can't get elsewhere.
numpy.ma
Subpackage: Masked Arrays¶Allows "holes" in arrays to deal with missing or sparse data.
Associates a boolean validity array with a conventional array.
numpy.random
Subpackage: Random Arrays¶rand(d0, d1, ...)
returns an array of arbitrary dimensionality filled with random floats between 0 and 1 (inclusive)
np.random.rand(5, 5)
randint(low[, high, size])
returns an array of ints from low (inclusive) to high (exclusive)
np.random.randint(5, size=(5,5))
There are a lot of other packages in numpy
and others which use numpy
. Be sure to check the PyPI repository before you start developing your own.