Have you ever wondered why we often use Python Tap the code to do the project , In fact, I've been producing objects and occupying memory , And we rarely clean up Python Of memory , In theory, it will one day run out of memory ( overflow ), Can be opened every time Python but “ safe and sound ”? Is it really just that your computer has a lot of memory ?
Not at all , A mature software will have its own memory management and garbage collection mechanism , Instead of relying solely on hardware to provide absolute support .
Python It also has its garbage collection mechanism , This is also a question that the interviewer likes to ask during the interview :Python What is the principle of memory management and garbage collection mechanism ?
Many times we pay too much attention to some surface things and ignore the inner layer .
It's like we drive , If all you know is come on 、 Insert the key 、 step on the gas 、 These surface operations such as brakes and steering wheel , Know nothing about the engine compartment , You're not an old driver , Sooner or later you have to spend the night on the main road .
Today, let's talk about Python What is the principle of memory management and garbage collection mechanism , Learn more about Python, Avoid being asked such a question the next time you can't answer .
stay Python Of C There is a name in the source code refchain Of Circular double linked list
, This list is awesome , because Python Once an object is created in the program, it will be added to refchain In this list . That is, he keeps all the objects . for example :
stay refchain All objects in have a ob_refcnt
The reference counter used to hold the current object , As the name suggests, it is the number of times you have been cited , for example :
The above code indicates that there is... In memory 18 and “ Zhang San ” Two values , Their reference counters are :1、2 .
When a value is referenced more than once , Data will not be created repeatedly in memory , It is Reference counter +1
. When the object is destroyed, it will also make Reference counter -1
, If the reference counter is 0, Then the object is changed from refchain Remove from the linked list , Destroy in memory at the same time ( Special situations such as caching will not be considered ).
Garbage collection based on reference counter is very convenient and simple , But he still exists Circular reference
The problem of , It can't recover some data normally , for example :
For the above code, you'll find , perform del
After the operation , No variables will use those two list objects any more , But because of the problem of circular reference , Their reference counter is not 0, So their state : Never used 、 It will not be destroyed . If there is too much code in the project , It causes memory to be consumed all the time , Until the memory runs out , Program crash .
To solve the problem of circular reference , Introduced Mark clear
technology , Special treatment for objects that may have circular references , There may be types of circular applications such as : list 、 Tuples 、 Dictionaries 、 aggregate 、 Custom classes and other types that can nest data .
Mark clear : Create special linked lists to save list 、 Tuples 、 Dictionaries 、 aggregate 、 Objects such as custom classes , Then check whether the objects in the linked list have circular references , If it exists, let both reference counters - 1 .
Generational recycling : Optimize the linked list in tag clearing , Split objects that may have references into 3 A linked list , Linked lists are called :0/1/2 The three generation , Every generation can store objects and thresholds , When the threshold is reached , It will scan every object in the corresponding linked list , In addition to circular references, each minus 1 And destroy the reference counter as 0 The object of .
Particular attention :0 The generation and 1、2 Generation threshold and count The meaning of expression is different .
0 generation ,count Express 0 The number of objects in the generation list ,threshold Express 0 The threshold value of the number of generation linked list objects , More than once 0 I'm going to do a scan .
1 generation ,count Express 0 The number of times the linked list is scanned ,threshold Express 0 The threshold number of times to scan the linked list , More than once 1 I'm going to do a scan .
2 generation ,count Express 1 The number of times the linked list is scanned ,threshold Express 1 The threshold number of times to scan the linked list , If more than one, perform one 2 I'm going to do a scan .
according to C At the bottom of the language and combined with the diagram to explain the detailed process of memory management and garbage collection .
First step : When creating objects age=19
when , The object will be added to refchain In the list .
The second step : When creating objects num_list = [11,22]
when , The list object is added to refchain and generations 0 Middle generation .
The third step : The newly created object makes generations Of 0 The number of objects on the generation linked list is greater than the threshold 700 when , To scan and check the objects on the linked list
When 0 After generation is greater than the threshold , The bottom layer is not a direct scan 0 generation , It's about judging first 2、1 Whether the threshold is also exceeded
If 2、1 Generation does not reach the threshold , Then scan 0 generation , And let 1 Generation count + 1
If 2 Generation has reached the threshold , Will 2、1、0 Three linked lists are spliced for full scanning , And will 2、1、0 Generation count Reset to 0
If 1 Generation has reached the threshold , Then speak 1、0 Two linked lists are spliced together for scanning , And put all 1、0 Generation count Reset to 0
When scanning the spliced linked list , The main thing is to eliminate circular references and destroy garbage , The detailed process is :
Scan linked list , Copy the reference counter of each object and save it to gc_refs
in , Protect the original reference counter .
Scan each object in the linked list again , And check for circular references , If they exist, let their gc_refs
reduce 1
Scan the linked list again , take gc_refs
by 0 Move the object to unreachable
In the list ; Not for 0 The object of is directly upgraded to the next generation linked list
Handle unreachable Of objects in the linked list Destructor and Weak reference , Objects that cannot be destroyed are upgraded to the next generation linked list , Those that can be destroyed remain in this linked list
Destructor , It refers to those who define __del__
Object of method , It needs to be executed before destruction
Weak reference
The final will be unreachable
Each object in is destroyed and in refchain Remove from linked list ( Regardless of the caching mechanism )
thus , The garbage collection process is over .
As you can see from the above, when the reference counter of an object is 0 when , It will be destroyed and the memory will be released . In fact, he is not so simple and rude , Because repeated creation and destruction will make the execution of the program inefficient .
Python Introduced in “ Caching mechanisms ”.
for example : The reference counter is 0 when , It doesn't really destroy objects , But put him in a place called free_list In the linked list , After that, the object will be created again, and the memory will not be re opened up , But in free_list Use the previous object and reset the internal value to use .
float type , Maintenance of free_list Linked lists can be cached at most 100 individual float object .
int type , Not based on free_list, It's about maintaining a small_ints Linked lists hold common data ( Small data pools ), Small data pool scope :-5 <= value < 257
. namely : When reusing integers in this range , It's not going to reopen memory .
str type , maintain unicode_latin1[256]
Linked list , Inside will be all of ascii character
cached , It won't be created repeatedly in the future
besides ,Python The string is also internally Resident mechanism , For that, it only contains Letter 、 Numbers 、 Underline String ( See source code Objects/codeobject.c), If it already exists in memory, it will not be created again, but will use the original address ( Don't like free_list That's always in memory , Only memory can be reused ).
list type , Maintenance of free_list Arrays can be cached up to 80 individual list object .
tuple type , Maintain a free_list Array and array capacity 20, The elements in the array can be linked lists, and each linked list can hold at most 2000 A tuple object . Of a tuple free_list When arrays store data , It is found according to the number of tuples that can hold for the index free_list The corresponding list in the array , And add it to the linked list .
dict type , Maintenance of free_list Arrays can be cached up to 80 individual dict object .