The following is an experiment to explore Pytorch How video memory is allocated .
I use VSCode Of jupyter To experiment , First, only import pytorch, The code is as follows :
import torch
Open the work manager to view the main memory and video memory . The situation is as follows :
Set up in video memory 1GB Tensor of , Assign to a, The code is as follows :
a = torch.zeros([256,1024,1024],device= 'cpu')
View main memory and video memory :
You can see that both main memory and video memory are getting bigger , And video memory is not only getting bigger 1G, The extra memory is pytorch Some configuration variables required for execution , We ignore .
Once again, create a... In the video memory 1GB Tensor of , Assign to b, The code is as follows :
b = torch.zeros([256,1024,1024],device= 'cpu')
View the main video memory :
This time the main memory size has not changed , Video memory is getting higher 1GB, It's reasonable . Then we will b Move to main memory , The code is as follows :
b = b.to('cpu')
View the main video memory :
Found that the main memory is getting higher 1GB, Video memory is only getting smaller 0.1GB, It's like copying video memory tensor to main memory . Actually ,pytorch It's a copy of the tensor into main memory , But it also records the movement of this tensor in video memory . We then run the following code , Re establishment 1GB The tensor of is assigned to c:
c = torch.zeros([256,1024,1024],device= 'cuda')
View the main video memory :
Found that only the video memory size increased 0.1GB, This shows that ,Pytorch It does record the movement of tensors in video memory , It's just that there's no immediate release of video memory space , It chooses to override this position the next time a new variable is created . Next , We repeat the above line of code :
c = torch.zeros([256,1024,1024],device= 'cuda')
The main video memory is as follows :
Clearly we measure the tensor c It's covered , The video memory is getting bigger , Why is that ? Actually ,Pytorch When running this code , Is to find the available video memory location first , Build this 1GB Tensor of , And then assign it to c. But because when this tensor is newly created , The original c Still possess 1GB Video memory ,pytorch You can only retrieve the other one first 1GB Video memory to create this tensor , Then assign this tensor to c. In this way , The original one c The video memory is empty , But as I said before ,pytorch It doesn't immediately release the video memory here , And waiting for the next coverage , So the video memory doesn't decrease .
Let's build 1GB Of d Tensors , We can verify the above conjecture , The code is as follows :
d = torch.zeros([256,1024,1024],device= 'cuda')
The main video memory is as follows :
Video memory size has not changed , Because pytorch The new tensor is built on the previous step c Empty space , And then assign it to d. in addition , Deleting variables also does not immediately free video memory :
del d
Main video memory condition :
Video memory has not changed , It's also waiting for the next coverage .
And then the experiment above , We build directly in main memory 1GB And assign it to e, The code is as follows :
e = torch.zeros([256,1024,1024],device= 'cpu')
The main video memory is as follows :
Main memory gets bigger 1GB, be perfectly logical and reasonable . Then the e Move to video memory , The code is as follows :
e = e.to('cuda')
The main video memory is as follows :
The main memory becomes smaller 1GB, The video memory doesn't change because of the tensor above d Deleted, not covered , be perfectly logical and reasonable . The release of main memory is executed immediately .
Through the above experiment , We learned that ,pytorch Memory of failed variables in video memory is not immediately released , It uses the available space in video memory in an overlay way . in addition , If you want to reset a large tensor in video memory , It's better to move it to main memory first , Or delete it directly , Create a new value , Otherwise, it takes twice as much memory to do this , There may be a shortage of video memory .
The experimental code is summarized as follows :
#%% import torch #%% a = torch.zeros([256,1024,1024],device= 'cuda') #%% b = torch.zeros([256,1024,1024],device= 'cuda') #%% b = b.to('cpu') #%% c = torch.zeros([256,1024,1024],device= 'cuda') #%% c = torch.zeros([256,1024,1024],device= 'cuda') #%% d = torch.zeros([256,1024,1024],device= 'cuda') #%% del d #%% e = torch.zeros([256,1024,1024],device= 'cpu') #%% e = e.to('cuda')