Minimal PyTorch LSTM example for regression and classification tasks
2022-03-10
Categorical data type in Pandas
2022-03-22
13 mins read

In operations between NumPy arrays (ndarray), each shape is automatically converted to be the same by broadcasting.

This article describes the following contents.

  • Broadcasting rules in NumPy
  • Broadcasting examples in NumPy
    • Examples of 2D array
    • Examples of 3D array
  • Cases that cannot broadcast
  • Functions to get the broadcasted array
    • Broadcast an array to a specified shape.: np.broadcast_to()
    • Broadcast multiple arrays: np.broadcast_arrays()

The official documentation explaining the broadcasting is below.

Use reshape() or np.newaxis if you want to reshape ndarray to any shape you want.

Rules of Broadcasting

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

  • Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
  • Rule 2: If the shape of the two arrays does not match in any dimension, the array with a shape equal to 1 in that dimension is stretched to match the other shape.
  • Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Note that the number of dimensions of ndarray can be obtained with the ndim attribute and the shape with the shape attribute.

To make these rules clear, let’s consider a few examples in detail.

Broadcasting example 1

Let’s look at adding a two-dimensional array to a one-dimensional array:

M = np.ones((2, 3))
a = np.arange(3)

Let’s consider an operation on these two arrays. The shape of the arrays are

  • M.shape = (2, 3)
  • a.shape = (3,)

We see by rule 1 that the array a has fewer dimensions, so we pad it on the left with ones:

  • M.shape -> (2, 3)
  • a.shape -> (1, 3)

By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to match:

  • M.shape -> (2, 3)
  • a.shape -> (2, 3)

The shapes match, and we see that the final shape will be (2, 3):

M + a
array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

Broadcasting example 2

Let’s take a look at an example where both arrays need to be broadcast:

a = np.arange(3).reshape((3, 1))
b = np.arange(3)

Again, we’ll start by writing out the shape of the arrays:

  • a.shape = (3, 1)
  • b.shape = (3,)

Rule 1 says we must pad the shape of b with ones:

  • a.shape -> (3, 1)
  • b.shape -> (1, 3)

And rule 2 tells us that we upgrade each of these ones to match the corresponding size of the other array:

  • a.shape -> (3, 3)
  • b.shape -> (3, 3)

Because the result matches, these shapes are compatible. We can see this here:

a + b
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Broadcasting example 3

Now let’s take a look at an example in which the two arrays are not compatible:

M = np.ones((3, 2))
a = np.arange(3)

This is just a slightly different situation than in the first example: the matrix M is transposed. How does this affect the calculation? The shape of the arrays are

  • M.shape = (3, 2)
  • a.shape = (3,)

Again, rule 1 tells us that we must pad the shape of a with ones:

  • M.shape -> (3, 2)
  • a.shape -> (1, 3)

By rule 2, the first dimension of a is stretched to match that of M:

  • M.shape -> (3, 2)
  • a.shape -> (3, 3)

Now we hit rule 3–the final shapes do not match, so these two arrays are incompatible, as we can observe by attempting this operation:

M + a
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-9e16e9f98da6> in <module>()
----> 1 M + a

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

Note the potential confusion here: you could imagine making a and M compatible by, say, padding a‘s shape with ones on the right rather than the left. But this is not how the broadcasting rules work! That sort of flexibility might be useful in some cases, but it would lead to potential areas of ambiguity. If right-side padding is what you’d like, you can do this explicitly by reshaping the array (we’ll use the np.newaxis keyword):

a[:, np.newaxis].shape
(3, 1)
M + a[:, np.newaxis]
array([[ 1.,  1.],
       [ 2.,  2.],
       [ 3.,  3.]])

Also, note that while we’ve been focusing on the + operator here, these broadcasting rules apply to any binary ufunc. For example, here is the logaddexp(a, b) function, which computes log(exp(a) + exp(b)) with more precision than the naive approach:

np.logaddexp(M, a[:, np.newaxis])
array([[ 1.31326169,  1.31326169],
       [ 1.69314718,  1.69314718],
       [ 2.31326169,  2.31326169]])

Broadcasting in Practice

We’ll now take a look at a couple of simple examples of where broadcasting can be useful.

Centering an array

We know that ufuncs allow a NumPy users to remove the need to explicitly write slow Python loops. Broadcasting extends this ability. One commonly seen example is when centering an array of data. Imagine you have an array of 10 observations, each of which consists of 3 values. Using the standard convention, we’ll store this in a 10×3 array:

X = np.random.random((10, 3))

We can compute the mean of each feature using the mean aggregate across the first dimension:

Xmean = X.mean(0)
Xmean
array([ 0.53514715,  0.66567217,  0.44385899])

And now we can center the X array by subtracting the mean (this is a broadcasting operation):

X_centered = X - Xmean

To double-check that we’ve done this correctly, we can check that the centered array has near-zero mean:

X_centered.mean(0)
array([  2.22044605e-17,  -7.77156117e-17,  -1.66533454e-17])

To within machine precision, the mean is now zero.

Plotting a two-dimensional function

One place that broadcasting is very useful is in displaying images based on two-dimensional functions. If we want to define a function z=f(x,y), broadcasting can be used to compute the function across the grid:

# x and y have 50 steps from 0 to 5
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50)[:, np.newaxis]

z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

We’ll use Matplotlib to plot this two-dimensional array:

%matplotlib inline
import matplotlib.pyplot as plt
plt.imshow(z, origin='lower', extent=[0, 5, 0, 5],
           cmap='viridis')
plt.colorbar();

The result is a compelling visualization of the two-dimensional function.

More broadcasting examples in NumPy

Examples of 2D array

2D array and 1D array

The following 2D and 1D arrays are used as examples. To make it easier to understand the result of the broadcast, one of them uses zeros() to set all the elements to 0.

import numpy as np

a = np.zeros((3, 3), np.int)
print(a)
# [[0 0 0]
#  [0 0 0]
#  [0 0 0]]

print(a.shape)
# (3, 3)

b = np.arange(3)
print(b)
# [0 1 2]

print(b.shape)
# (3,)

The shape of 1D array is (3,) instead of (3) because tuples with 1 element have a comma at the end.

The result of the addition of these two ndarray is as follows.

print(a + b)
# [[0 1 2]
#  [0 1 2]
#  [0 1 2]]

Let’s transform the array with a smaller number of dimensions (1D array b) according to the rules described above.

First, according to rule 1, the array is transformed from shape (3,) to (1, 3) by adding a new dimension of size 1 at the head. The reshape() method is used.

b_1_3 = b.reshape(1, 3)
print(b_1_3)
# [[0 1 2]]

print(b_1_3.shape)
# (1, 3)

Next, the size of each dimension is stretched according to rule 2. The array is stretched from (1, 3) to (3, 3). The stretched part is a copy of the original part. np.tile() is used.

print(np.tile(b_1_3, (3, 1)))
# [[0 1 2]
#  [0 1 2]
#  [0 1 2]]

Note that reshape() and np.tile() are used here for the sake of explanation, but if you want to get the broadcasted array, there are functions np.broadcast_to() and np.broadcast_arrays() for that purpose. See below.

2D array and 2D array

The result of addition with the 2D array of (3, 1) is as follows.

b_3_1 = b.reshape(3, 1)
print(b_3_1)
# [[0]
#  [1]
#  [2]]

print(b_3_1.shape)
# (3, 1)

print(a + b_3_1)
# [[0 0 0]
#  [1 1 1]
#  [2 2 2]]

In this case, since the number of dimensions is already the same, the array is stretched from (3, 1) to (3, 3) according to rule 2.

print(np.tile(b_3_1, (1, 3)))
# [[0 0 0]
#  [1 1 1]
#  [2 2 2]]

In the previous examples, only one of the arrays is converted, but there are cases where both are converted by broadcasting.

The following is the result of adding arrays whose shapes are (1, 3) and (3, 1).

print(b_1_3)
# [[0 1 2]]

print(b_1_3.shape)
# (1, 3)

print(b_3_1)
# [[0]
#  [1]
#  [2]]

print(b_3_1.shape)
# (3, 1)

print(b_1_3 + b_3_1)
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

Both (1, 3) and (3, 1) are stretched to (3, 3).

print(np.tile(b_1_3, (3, 1)))
# [[0 1 2]
#  [0 1 2]
#  [0 1 2]]

print(np.tile(b_3_1, (1, 3)))
# [[0 0 0]
#  [1 1 1]
#  [2 2 2]]

print(np.tile(b_1_3, (3, 1)) + np.tile(b_3_1, (1, 3)))
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

The same applies if one of them is 1D array.

c = np.arange(4)
print(c)
# [0 1 2 3]

print(c.shape)
# (4,)

print(b_3_1)
# [[0]
#  [1]
#  [2]]

print(b_3_1.shape)
# (3, 1)

print(c + b_3_1)
# [[0 1 2 3]
#  [1 2 3 4]
#  [2 3 4 5]]

1D array is converted like (4,) -> (1, 4) -> (3, 4), and 2D array-like (3, 1) -> (3, 4).

print(np.tile(c.reshape(1, 4), (3, 1)))
# [[0 1 2 3]
#  [0 1 2 3]
#  [0 1 2 3]]

print(np.tile(b_3_1, (1, 4)))
# [[0 0 0 0]
#  [1 1 1 1]
#  [2 2 2 2]]

print(np.tile(c.reshape(1, 4), (3, 1)) + np.tile(b_3_1, (1, 4)))
# [[0 1 2 3]
#  [1 2 3 4]
#  [2 3 4 5]]

Note that the dimension is stretched only when the original size is 1. Otherwise, it cannot be broadcasted, and an error is raised, as described below.

Examples of 3D array

Rule 1 applies even if the difference in the number of dimensions is two or more.

Using 3D and 1D arrays as examples, the addition results are as follows:

a = np.zeros((2, 3, 4), dtype=np.int)
print(a)
# [[[0 0 0 0]
#   [0 0 0 0]
#   [0 0 0 0]]
# 
#  [[0 0 0 0]
#   [0 0 0 0]
#   [0 0 0 0]]]

print(a.shape)
# (2, 3, 4)

b = np.arange(4)
print(b)
# [0 1 2 3]

print(b.shape)
# (4,)

print(a + b)
# [[[0 1 2 3]
#   [0 1 2 3]
#   [0 1 2 3]]
# 
#  [[0 1 2 3]
#   [0 1 2 3]
#   [0 1 2 3]]]

The shape is changed as (4, ) -> (1, 1, 4) -> (2, 3, 4).

b_1_1_4 = b.reshape(1, 1, 4)
print(b_1_1_4)
# [[[0 1 2 3]]]

print(np.tile(b_1_1_4, (2, 3, 1)))
# [[[0 1 2 3]
#   [0 1 2 3]
#   [0 1 2 3]]
# 
#  [[0 1 2 3]
#   [0 1 2 3]
#   [0 1 2 3]]]

Cases that cannot broadcast

As mentioned above, the dimension is stretched only if the original size is 1. If the sizes of the dimensions are different and the sizes of both arrays are not 1, it cannot be broadcasted, and an error is raised.

a = np.zeros((4, 3), dtype=np.int)
print(a)
# [[0 0 0]
#  [0 0 0]
#  [0 0 0]
#  [0 0 0]]

print(a.shape)
# (4, 3)

b = np.arange(6).reshape(2, 3)
print(b)
# [[0 1 2]
#  [3 4 5]]

print(b.shape)
# (2, 3)

# print(a + b)
# ValueError: operands could not be broadcast together with shapes (4,3) (2,3) 

The same applies to the following case.

a = np.zeros((2, 3, 4), dtype=np.int)
print(a)
# [[[0 0 0 0]
#   [0 0 0 0]
#   [0 0 0 0]]
# 
#  [[0 0 0 0]
#   [0 0 0 0]
#   [0 0 0 0]]]

print(a.shape)
# (2, 3, 4)

b = np.arange(3)
print(b)
# [0 1 2]

print(b.shape)
# (3,)

# print(a + b)
# ValueError: operands could not be broadcast together with shapes (2,3,4) (3,) 

In this example, if a new dimension is added at the end, the array can be broadcasted.

b_3_1 = b.reshape(3, 1)
print(b_3_1)
# [[0]
#  [1]
#  [2]]

print(b_3_1.shape)
# (3, 1)

print(a + b_3_1)
# [[[0 0 0 0]
#   [1 1 1 1]
#   [2 2 2 2]]
# 
#  [[0 0 0 0]
#   [1 1 1 1]
#   [2 2 2 2]]]

It is easy to understand whether it can be broadcasted or not by right-aligned shape.

NG
(2, 3, 4)
(      3)

OK
(2, 3, 4)
(   3, 1) -> (1, 3, 1) -> (2, 3, 4)

If the sizes are different when right-aligned and compared vertically, one of them must be 1 to be broadcasted. For example, in the case of images, a color image is a 3D array whose shape is (height, width, 3) (3 means red, green, and blue), while a grayscale image is a 2D array whose shape is (height, width). In the case of computing the value of each color in a color image and the value of a grayscale image, it is impossible to broadcast even if the height and width are the same.

You need to add a dimension to the end of the grayscale image with np.newaxisnp.expand_dims(), and so on.

NG
(h, w, 3)
(   h, w)

OK
(h, w, 3)
(h, w, 1) -> (h, w, 3)

Functions to get the broadcasted array

Broadcast an array to a specified shape.: np.broadcast_to()

Use np.broadcast_to() to broadcast ndarray with the specified shape.

The first argument is the original ndarray, and the second is a tuple or list indicating shape. The broadcasted ndarray is returned.

a = np.arange(3)
print(a)
# [0 1 2]

print(a.shape)
# (3,)

print(np.broadcast_to(a, (3, 3)))
# [[0 1 2]
#  [0 1 2]
#  [0 1 2]]

print(type(np.broadcast_to(a, (3, 3))))
# <class 'numpy.ndarray'>

An error occurs when specifying a shape that cannot be broadcasted.

# print(np.broadcast_to(a, (2, 2)))
# ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,) and requested shape (2,2)

Broadcast multiple arrays: np.broadcast_arrays()

Use np.broadcast_arrays() to broadcast multiple ndarray.

Specify multiple arrays separated by commas. A list of ndarray is returned.

a = np.arange(3)
print(a)
# [0 1 2]

print(a.shape)
# (3,)

b = np.arange(3).reshape(3, 1)
print(b)
# [[0]
#  [1]
#  [2]]

print(b.shape)
# (3, 1)

arrays = np.broadcast_arrays(a, b)

print(type(arrays))
# <class 'list'>

print(len(arrays))
# 2

print(arrays[0])
# [[0 1 2]
#  [0 1 2]
#  [0 1 2]]

print(arrays[1])
# [[0 0 0]
#  [1 1 1]
#  [2 2 2]]

print(type(arrays[0]))
# <class 'numpy.ndarray'>

An error occurs when specifying a combination of arrays that cannot be broadcasted.

c = np.zeros((2, 2))
print(c)
# [[0. 0.]
#  [0. 0.]]

print(c.shape)
# (2, 2)

# arrays = np.broadcast_arrays(a, c)
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

Summary

Let’s review the two rules for broadcasting in NumPy.

  1. Make the two arrays have the same number of dimensions.
    • If the numbers of dimensions of the two arrays are different, add new dimensions with size 1 to the head (left) of the array with the smaller dimension.
  2. Make each dimension of the two arrays the same size.
    • If the sizes of each dimension of the two arrays do not match, dimensions with size 1 are stretched to the size of the other array.
    • If there is a dimension whose size is not 1 in either of the two arrays, it cannot be broadcasted, and an error is raised.

Resources:

https://note.nkmk.me/en/python-numpy-broadcasting/

https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html

Amir Masoud Sefidian
Amir Masoud Sefidian
Machine Learning Engineer

Comments are closed.