A complete guide on feature selection techniques with Python code
2022-06-03
A guide on regression error metrics (MSE, RMSE, MAE, MAPE, sMAPE, MPE) with Python code
2022-06-18
Show all

Understanding Contiguous vs Non-Contiguous Tensors in PyTorch

13 mins read

Tensor and View

View uses the same data chunk from the original tensor, just a different way to ‘view’ its dimension.

Before we dive into the discussion about what contiguous vs. non-contiguous means, we need to first understand the relations between Tensor and View in Pytorch.

View is nothing but an alternative way to interpret the original tensor’s dimension without making a physical copy in the memory. For example, we can have a 1×12 tensor, i.e. [1,2,3,4,5,6,7,8,9,10,11,12] and then use .view(4,3) to change the shape of the tensor into a 4×3 structure.

x = torch.arange(1,13)
print(x)
>> tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])x = torch.arange(1,13)
y = x.view(4,3)
print(y)
>>
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])

If you change the data in the original tensor x, it also reflects in the view tensor y, because instead of creating another copy of the original tensor x, the view tensor y is reading the data from the same memory address as of the original tensor x. Vice versa, the change of value in a view tensor would simultaneously change the value in the original tensor, because the view tensor and its original tensor share the same chunk of memory block.

x = torch.arange(1,13)
y = x.view(4,3)
x[0] = 100
print(y)
>> 
tensor([[100,   2,   3],
        [  4,   5,   6],
        [  7,   8,   9],
        [ 10,  11,  12]])x = torch.arange(1,13)
y = x.view(4,3)
y[-1,-1] = 1000
print(x)
>> tensor([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,   11, 1000])

A Sequence of Data Can be Viewed with Different Dimensions in a Contiguous manner

The tensor data is stored as 1D data sequence.
Technically
, .view() is an instruction that tells the machine how to stride over the 1D data sequence and provide a tensor view with the given dimension.

Intuitively, you can imagine the .view() function defines the new dimensions, say (2,2,3), as empty template boxes, like the following illustration. Then, the data is consumed from the beginning of the 1D data sequence and is filled into these boxes from the innermost spots, one by one until the box is full, it moves to the next dimension (box). This follows a contiguous order.

Thus, you can have any combination of the dimensions in view(), as long as the total number of the boxes matches the number of elements in the 1D array, e.g., 2x2x3 = 6×2 = 12. You can as well have (3,2,2), or (4,3) as long as the total number of elements adds up.

Strides

If you are unfamiliar with how the computer strides in the memory (1D sequence) to form a N-D dimensional tensor, you can read this post.

Numpy strides() returns (N bytes to Next Row, M bytes to Next Column)
Pytorch stride() returns (N elements to Next Row, M elements to Next Column).

Let’s look at the strides in a 2D array

# x is a contiguous data. Recall that view() doesn't change data arrangement in the original 1D tensor, i.e. the sequence from 1 to 12.x = torch.arange(1,13).view(6,2)
x
>>
tensor([[ 1,  2],
        [ 3,  4],
        [ 5,  6],
        [ 7,  8],
        [ 9, 10],
        [11, 12]])# Check stride
x.stride()
>> (2, 1)

The strides (2, 1) tells us: We need to stride 1 (the last dimension, which is dimension 0) number to reach the next number along axis 0, and we need to stride 2 (the dimension 1) numbers to travel to the next number along axis 1.

What about the strides in a 3D array

y = torch.arange(0,11).view(2,2,3)
y
>>
tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]])# Check stride
y.stride()
>> (6, 3, 1)

Indicated by the strides (6, 3, 1), we can observe starting from every position i(i+1) leads you to travel along axis 0, (i+3) travels along axis 1, and (i+6) travels along axis 2. The formula to retrieve (A, B, C) position in the 1D tensor is done by: A * 6 + B * 3 + C * 1

For example, in the above tensor, we deliberately choose a sequence of numbers ranging from 1 to 12, because we can use that as our indicator of their positions in the 1D array.

  • Index (0, 0, 0)
    Position in 1D: 0 * 6 + 0 * 3 + 0 * 1 = 0
  • Index (1, 0, 0)
    Position in 1D: 1* 6 + 0 * 3 + 0 * 1 = 6
  • Index (0, 1, 0)
    Position in 1D: 0 * 6 + 1* 3 + 0 * 1 = 3
  • Index (0, 0, 1)
    Position in 1D: 0 * 6 + 0* 3 + 1* 1 = 1
  • Index (1, 1, 1)
    Position in 1D: 1* 6 + 1* 3 + 1* 1 = 10

Okay, now we finished the introduction of contiguous view, and also learned how the strides works in a N dimensional tensor in Pytorch. Now let’s take a look at what the non-contiguous data is like.

Non-Contiguous Data Structure: Transpose( )

First of all , Transpose(axis1, axis2) is simply “swapping the way axis1 and axis2 strides.

# Initiate a contiguous tensor
x = torch.arange(0,12).view(2,2,3)
x
>>
tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]])x.stride()
>> (6,3,1)# Now let's transpose axis 0 and 1, and see how the strides swap
y = x.transpose(0,2)
y
>>
tensor([[[ 0,  6],
         [ 3,  9]],

        [[ 1,  7],
         [ 4, 10]],

        [[ 2,  8],
         [ 5, 11]]])y.stride()
>> (1,3,6)

Alright, the y is a x.transpose(0,2), which swaps how the x tensor stride in axis 0 and axis 2, and therefore the resulting strides of y is (1,3,6). It means we need to jump 6 numbers to retrieve the next number in axis 0, and jump 3 numbers to retrieve the next number in axis 1, and jump 1 to get the next number in axis 2. (Striding formula: A * 1+ B * 3+ C * 6)

The different aspect of transpose is: Now the data sequence is not following a contiguous order anymore. It does not fill the sequential data one by one from the innermost dimension and jumps to the next dimension when filled up. Now it jumps 6 numbers in the innermost dimension, so it is not continuous.

Transpose( ) has a Non-Contiguous Data Structure but Still a View Not a Copy

transpose() still returns a View but not a copy of the original tensor. Therefore, it is a non-contiguous ‘View’. It changes the ways of strides on the original data, and any data modification on the original tensor would affect the view, and vice versa.

# Change the value in a transpose tensor y
x = torch.arange(0,12).view(2,6)
y = x.transpose(0,1)
y[0,0] = 100
y
>>
tensor([[100,   2,   4,   6,   8,  10],
        [  1,   3,   5,   7,   9,  11]])# Check the original tensor x
x
>>
tensor([[100,   1],
        [  2,   3],
        [  4,   5],
        [  6,   7],
        [  8,   9],
        [ 10,  11]])

Another way to peak the 1D data sequence stored in a tensor is through the method .storage()

y.storage()
>>
 100
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
[torch.LongStorage of size 12]

Check Contiguous and Non-Contiguous in Pytorch

Pytorch has a method .is_contiguous() that tells you whether the tensor is contiguous.

x = torch.arange(0,12).view(2,6)
x.is_contiguous()
>> Truey = x.transpose(0,1)
y.is_contiguous()
>> False

Convert a Non-Contiguous Tensor (or View) to Contiguous

Pytorch has a method .contiguous() that would convert the non-contiguous tensor or view to contiguous.

z = y.contiguous()
z.is_contiguous()
>> TRUE

It makes a copy of the original ‘non-contiguous’ tensor and then saves it to a new memory chunk following the contiguous order. We can observe it by its strides.

# This is contiguous
x = torch.arange(1,13).view(2,3,2)
x.stride()
>> (6, 2, 1)# This is non-contiguous
y = x.transpose(0,1)
y.stride()
>> (2, 6, 1)# This is a converted contiguous tensor with new stride
z = y.contiguous()
z.stride()
>> (4, 2, 1)

One way I use to differentiate whether the tensor / view is contiguous, is by observing whether the (A, B, C) in strides satisfies A > B > C. If it doesn’t, it means at least one dimension is skipping a longer distance than the dimension above it, which makes it non-contiguous.

We can also observe how the converted contiguous tensor z stores the data in a new order.

# y is a non-contiguous 'view' (remember view uses the original chunk of data in memory, but its strides implies 'non-contiguous', (2,6,1).
y.storage()
>>
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
# Z is a 'contiguous' tensor (not a view, but a new copy of the original data. Notice the order of the data is different). It strides implies 'contiguous', (4,2,1)
z.storage()
>>
 1
 2
 7
 8
 3
 4
 9
 10
 5
 6
 11
 12

Difference Between view( ) and reshape( )

While both of the functions can change the dimensions of the tensor (basically it’s just a different ways of striding in the 1D data), the main difference between the two is:

1/ view(): Does NOT make a copy of the original tensor. It changes the dimensional interpretation (striding) on the original data. In other words, it uses the same chunk of data with the original tensor, so it ONLY works with contiguous data.

2/ reshape(): Returns a view while possible (i.e., when the data is contiguous). If not (i.e., the data is not contiguous), then it copies the data into a contiguous data chunk, and as a copy, it would take up memory space, and also the change in the new tensor would not affect the value in the original tensor.

With contiguous data, reshape() returns a view.

# When data is contiguous
x = torch.arange(1,13)
x
>> tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])# Reshape returns a view with the new dimension
y = x.reshape(4,3)
y
>>
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])# How do we know it's a view? Because the element change in new tensor y would affect the value in x, and vice versa
y[0,0] = 100
y
>>
tensor([[100,   2,   3],
        [  4,   5,   6],
        [  7,   8,   9],
        [ 10,  11,  12]])print(x)
>>
tensor([100,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12])

Next, let’s see how .reshape() works on non-contiguous data.

# After transpose(), the data is non-contiguous
x = torch.arange(1,13).view(6,2).transpose(0,1)
x
>>
tensor([[ 1,  3,  5,  7,  9, 11],
        [ 2,  4,  6,  8, 10, 12]])# Reshape() works fine on a non-contiguous data
y = x.reshape(4,3)
y
>>
tensor([[ 1,  3,  5],
        [ 7,  9, 11],
        [ 2,  4,  6],
        [ 8, 10, 12]])# Change an element in y
y[0,0] = 100
y
>>
tensor([[100,   3,   5],
        [  7,   9,  11],
        [  2,   4,   6],
        [  8,  10,  12]])# Check the original tensor, and nothing was changed
x
>>
tensor([[ 1,  3,  5,  7,  9, 11],
        [ 2,  4,  6,  8, 10, 12]])

Finally, let’s see if view() can work on non-contiguous data.
No, it can’t!

# After transpose(), the data is non-contiguous
x = torch.arange(1,13).view(6,2).transpose(0,1)
x
>>
tensor([[ 1,  3,  5,  7,  9, 11],
        [ 2,  4,  6,  8, 10, 12]])# Try to use view on the non-contiguous data
y = x.view(4,3)
y
>>
-------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
----> 1 y = x.view(4,3)
      2 y

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

A contiguous tensor is a tensor whose elements are stored in a contiguous order without leaving any empty space between them. A tensor created originally is always a contiguous tensor. A tensor can be viewed with different dimensions in a contiguous manner.

A transpose of a tensor creates a view of the original tensor which follows non-contiguous order. The transpose of a tensor is non-contiguous.

tensor.contiguous()

It is defined as:

  1. Tensor.contiguous(memory_format=torch.contiguous_format)

It will return a contiguous in-memory tensor containing the same data as the self tensor.

Why dose we use tensor.contiguous()?

As to tensor.view() function, it should be implemented on a contiguous tensor.

For example:

import torch
x = torch.tensor([[1, 2, 2],[2, 1, 3]])

x = x.transpose(0, 1)
print(x)
y = x.view(-1)
print(y)

In this code, we transpose tensor x, then change its shape with tensor.view() function.

Run this code, we will see this error.

y = x.view(-1)
RuntimeError: view size is not compatible with input tensor’s size and stride

In order to make tensor.view() work, we can get a contiguous tensor.

For example:

import torch
x = torch.tensor([[1, 2, 2],[2, 1, 3]])

x = x.transpose(0, 1)
print(x)
x = x.contiguous()
y = x.view(-1)
print(y)

Run this code, we will see:

tensor([[1, 2],
        [2, 1],
        [2, 3]])
tensor([1, 2, 2, 1, 2, 3])

In this example, we use x.contiguous() to get a contiguous tensor before using x.view(), then x.view() can work well.

is_contiguous() syntax

Tensor.is_contiguous()

It returns True if the Tensor is contiguous; False otherwise.

Let’s take a couple of examples to demonstrate how to use this function to check if a tensor is contiguous or non-contiguous.

Example 1

# import torch library
import torch

# define a torch tensor
A = torch.tensor([1. ,2. ,3. ,4. ,5. ,6.])
print(A)

# find a view of the above tensor
B = A.view(-1,3)
print(B)

print("id(A):", id(A))
print("id(A.view):", id(A.view(-1,3)))
# check if A or A.view() are contiguous or not
print(A.is_contiguous()) # True
print(A.view(-1,3).is_contiguous()) # True
print(B.is_contiguous()) # True

Output

tensor([1., 2., 3., 4., 5., 6.])
tensor([[1., 2., 3.],
[4., 5., 6.]])
id(A): 80673600
id(A.view): 63219712
True
True
True

Example 2

# import torch library
import torch

# create a torch tensor
A = torch.tensor([[1.,2.],[3.,4.],[5.,6.]])
print(A)

# take transpose of the above tensor
B = A.transpose(0,1)
print(B)
print("id(A):", id(A))
print("id(A.transpose):", id(A.transpose(0,1)))

# check if A or A transpose are contiguous or not
print(A.is_contiguous()) # True
print(A.transpose(0,1).is_contiguous()) # False
print(B.is_contiguous()) # False

Output

tensor([[1., 2.],
[3., 4.],
[5., 6.]])
tensor([[1., 3., 5.],
[2., 4., 6.]])
id(A): 63218368
id(A.transpose): 99215808
True
False
False

Summary

  • A ‘View’ uses the same chunk of memory block as the original tensor, and thus any changes among this memory chunk will affect all the views and the original tensor that’s associated with it.
  • A ‘View’ can be contiguous or non-contiguous.
  • non-contiguous tensor view can be converted to a contiguous one, and it would make a copy of it, so the data will not be associated with the original data chunk anymore.
  • Stride Position Formula: Given a strides (A, B, C), the position of the index (j, k, v) in the 1D data array is (A *j + B*k + C*v)
  • Difference between view() and reshape():
    view() cannot apply on ‘non-contiguous’ tensor /view. It returns a view.
    reshape() can apply on both ‘contiguous’ and ‘non-contiguous’ tensor/view. When possible, it will return a view; When the data is non-contiguous, it makes a new copy of it.

Resources:

https://medium.com/analytics-vidhya/pytorch-contiguous-vs-non-contiguous-tensor-view-understanding-view-reshape-73e10cdfa0dd

Amir Masoud Sefidian
Amir Masoud Sefidian
Machine Learning Engineer

Leave a Reply

Your email address will not be published.