picture

as everyone knows ,Python Is an object-oriented language , stay Python Everything in our world is the object , So how do we judge whether two objects are the same object .

== Operators, and is

I believe you are familiar with these two operators . To be specific == The operator compares the values of two objects to see if they are equal , and is The meaning of operator is whether they are the same object or not , In other words , That is, whether two objects point to the same memory address .

As we said above ,Python Everything in is the object , Object contains id( Unique identification ),type( type ) and value( value ) Three elements .id You can do this through a function id(obj) To get . therefore is Operator is equivalent to comparing two objects id Are they the same? , and == The operator is equivalent to comparing two objects value Are they the same? .

Let's take a look at a few examples .

>>> a = 'red'>>> b = 'red'>>> a == bTrue

here , We declared two objects a and b, Its contents are all strings ‘red’, Beyond all doubt == The operator should return True, Well understood. .

>>> a = 256>>> b = 256>>> a == bTrue>>> a is bTrue
>>> a = 257>>> b = 257>>> a == bTrue>>> a is bFalse

Again , about == The operator , Whether it's a,b The value of is 256 still 257 Both are equal . Strangely enough, it's the same is operation , The size of the value actually has an impact on the results .

in fact  a is b by True The conclusion of is only applicable to -5 To 256 The numerical , Because for the sake of performance ,Python   The values in this range are cached . When you assign a value to an integer object (-5 To  256) No new objects are generated , Instead, use a pre created cache object . If it exceeds the cache range , Then it will apply for two different memory addresses ,is The operation will of course return False.

I don't believe we can take them id Take it out and have a look .

>>> a = 256>>> b = 256>>> id(a)4525792016>>> id(b)4525792016
>>> a = 257>>> b = 257>>> id(a)4528947760>>> id(b)4528947856

however , When you put the same code in the editor to execute , You will be surprised to find that the execution result of the program conflicts with the caching mechanism we just mentioned .

 picture

Not to say that -5 To 256 Only integers in the range will be cached . Why such a large number is The operation also returns True What about it .

From the results  a and b The memory address of two large numbers must be the same , Otherwise is The operator does not return  True. This is because in interactive mode ( That's the black window ) Every command is a block of code ,Python   Line by line compilation execution ; In the editor , A function , A class or a file is a code block .Python   It will be compiled and executed as a whole , Therefore, variables of the same value are initialized only once , The second time a variable of the same value is initialized, the old value is reused .

The compiler mentioned above refers to CPython The process of compiling source code into bytecode .

Only if the value of the object is numeric or string and is in the cache range ,a is b To return to True, Otherwise, when a and b yes int,str,list,tuple,set or dict Type ,a is b All return to False.

in fact , After testing , I found that for strings with spaces  Python It's not going to cache . For a long time , Finally in the stringobject.h We found an explanation in the book ,Python The interpreter takes intern   Mechanism to improve the efficiency of string operation , When there are strings of the same value in memory, they are reused , Instead of generating a new string object with the same value . But it doesn't mean to take this... For all strings  intern Mechanism . Only if it looks like Python The string of the identifier is cached .

This  is generally restricted to strings that "look like" Python identifiers,  although the intern() builtin can be used to force interning of any  string.

in addition , Comparison operator is Our efficiency is better than ==, because is Operator cannot be overloaded , perform is The operation is just to compare the objects id nothing more . and == The operator recursively traverses all the values of the object , And compare one by one .

Digression : If you know Java You will find ,Python Medium == Be similar to Java Medium equals, and is Similar to Java Medium == Comparison symbol .

Copy of object

The copy of an object is actually the process of creating a new object , stay Python There are two copy modes , Shallow copy and deep copy .

When the top-level object and its child element objects are immutable objects , There is no copy , Because there are no new objects . The difference between shallow copy and deep copy is that shallow copy only copies top-level objects , Instead of copying internal child element objects . Deep copy recursively copies the child element objects inside the top-level objects .

We can use the constructor of the object type itself , section as well as copy Function to achieve shallow copy .

a = [1, 'hello', [1,2]]b = list(a)
a[0] = 100a[2][0] = 100print(a)print(b)
## Output results [100, 'hello', [100, 2]][1, 'hello', [100, 2]]

For variable objects at the top level , If its children are immutable , So when you modify sub objects , It's actually pointing a reference to another new object . The analogy above is not to a[0] From 1 It is amended as follows 100, It's going to be a[0] Point to 100.

If the subobject is variable , For example, for a[2] Come on , Because it's a shallow copy , So actually a[2] and b[2] They all point to the same list object . modify a[2][0] by 100 after ,b[2][0] It will also be revised .

Shallow copy by slicing .

>>> a = [1, 2, 3]>>> b = a[:]>>> a == bTrue>>> a is bFalse

But for immutable objects at the top level , There is no copy of the object , Because they all point to the same object , No new objects are created .

>>> a = (1,2,3)>>> b = tuple(a)>>> a == bTrue>>> a is bTrue

As you can see , About shallow copy, if the element is immutable, OK , No side effects ; If the element is variable , So be careful of the side effects .

Deep copy recursively copies the top-level object and its internal sub objects , therefore , New objects and old objects , No connection whatsoever .Python Use in copy.deepcopy() Function to achieve a deep copy of the object .

import copy
a = [1, 'hello', [1,2]]b = copy.deepcopy(a)
a[0] = 100a[2][0] = 100print(a)print(b)
## Output results [100, 'hello', [100, 2]][1, 'hello', [1, 2]]

A deep copy is a copy of the top-level object , It also copies sub objects , therefore a[2] and b[2] It points to two different lists . modify a[2][0] after , Points back to the new integer , But it doesn't affect b[2].

>>> import copy>>> a = (1,2,3)>>> b = copy.deepcopy(a)>>> a is bTrue
>>> a = (1,2,[1,2])>>> b = copy.deepcopy(a)>>> a is bFalse>>> a[2][0] = 100>>> a(1, 2, [100, 2])>>> b(1, 2, [1, 2])

For immutable objects , If all of its children are immutable , So deep copy has the same effect as shallow copy , They all point to the same memory address .

But if the subobject contains mutable objects , Then the object after deep copy is no longer the original object , Because the mutable object has been reprinted , In an example, it's a[2] and b[2] It no longer points to the same list . therefore , modify a[2] It doesn't affect b[2].

If there is a reference to itself in a deep copy object , So is there an infinite loop .

The answer is no , The deep copy function maintains a dictionary inside , The dictionary records the copied objects and their id. In the process of copying , If the object to be copied is already stored in the dictionary , It will return directly from the dictionary , No more recursion .

summary

This article introduces the comparison and copy of objects .is Compare two objects id Whether the values are equal ,== The comparison is whether the values of two objects are equal , Small integer object [-5, 256] Will be cached and reused ,is Our efficiency is better than == operation . A shallow copy is a reference to a sub object of the original object , There may be side effects , Deep copy creates new objects , It doesn't interfere with the original object . I'm sure you'll have a better understanding of Python The understanding of the object in the text will be deeper .

Code address

Sample code :https://github.com/JustDoPython/python-100-day/tree/master/day-118


Series articles


The first 117 God : Machine learning algorithms K a near neighbor
The first 116 God : Naive Bayesian theory of machine learning algorithms
The first 115 God :Python Is it value passing or reference passing

   The first 114 God : Three board model algorithm project actual combat

   The first 113 God :Python XGBoost Algorithm project actual combat

   The first 112 God : Monte Carlo of machine learning algorithm

   The first 111 God :Python Garbage collection mechanism

from 0 Study Python 0 - 110 Summary of the grand collection