as everyone knows , perform Python The program can be used directly python.exe command , As shown below :
python abc.py
notice python Directly executed abc.py, Maybe many students think that python It's interpretation and execution abc.py Of , It's not . If it's really about interpretation and execution , The slow ones can't be used . actually ,Python And Java equally , It's also a bytecode game .Java A bytecode called Java ByteCode,Python A bytecode called Python ByteCode.Python In the first run abc.py When you file , Will compile the source code file into bytecode , And then execute . Of course , You can also choose to generate bytecode files directly ( The extension is pyc), And then directly execute Python Bytecode file .
Usually Python It's released as source code , But for sensitive information , Don't want to publish as source code , It can be published in bytecode . Of course , Bytecode can also be decompiled . In order to make Python Source code is more secure , You can make your own private Python Environmental Science , We'll talk about that later .
I believe a lot of people have never touched Python Bytecode students must have a lot of questions , Let's continue to read the following !
1. How to view Python Bytecode
Let's first look at Python Bytecode , To prove that it's running Python The script does start with Python Code compiled into bytecode , And then it executes bytecode , Not directly Python Source code .
First look at the following code :
-
value = 20
-
def fun():
-
global value
-
value = 30
-
name = 'Bill'
-
print(value)
-
print(name)
-
fun()
-
print(value)
-
import dis
-
dis.disassemble(fun.__code__)
In this code, there is a fun function , It uses global variables value And local variables name, And output the values of these two variables . Finally, I imported dis modular . In this module there is a disassemble function , Used to output anything that contains __code__ Attribute Python The bytecode form of the code segment .
Now execute this code , Will output the following :
-
30
-
Bill
-
30
-
4 0 LOAD_CONST 1 (30)
-
2 STORE_GLOBAL 0 (value)
-
-
-
5 4 LOAD_CONST 2 ('Bill')
-
6 STORE_FAST 0 (name)
-
-
-
6 8 LOAD_GLOBAL 1 (print)
-
10 LOAD_GLOBAL 0 (value)
-
12 CALL_FUNCTION 1
-
14 POP_TOP
-
-
-
7 16 LOAD_GLOBAL 1 (print)
-
18 LOAD_FAST 0 (name)
-
20 CALL_FUNCTION 1
-
22 POP_TOP
-
24 LOAD_CONST 0 (None)
-
26 RETURN_VALUE
Obviously ,disassemble Output something similar to assembly code . This is actually Python The readable form of bytecode . Each instruction corresponds to a bytecode . So why look at bytecode ? In fact, for application developers , The most direct effect is to better understand Python Source code .
for example , This example uses global variables , That is to say global keyword , that global What do keywords stand for ? from Python It's easy to see in bytecode .
stay Python Something happened in the source code 2 Secondary assignment , The code is as follows :
-
value = 30
-
name = 'Bill'
among value Global variable ,name yes fun A local variable of a function . Convert these two assignments to Python Bytecode , You get the following code :
-
# value = 30
-
LOAD_CONST 1 (30)
-
STORE_GLOBAL 0 (value)
-
-
-
# name = 'Bill'
-
LOAD_CONST 2 ('Bill')
-
STORE_FAST 0 (name)
from Python Bytecode can be seen , Each assignment statement is converted to 2 strip Python Bytecode . All of them use LOAD_CONST Instructions , This is the instruction to load constants . because value and name All given a constant , It's just an integer , The other is the string . But due to the Python You don't need to specify the variable type when using variables ( Variables have types , But you don't need to specify , When using a variable, determine the type of the variable ), So no matter what type of constant is loaded, assign values to variables , All use LOAD_CONST Instructions .
But the first 2 That's different , For global variables value, Use STORE_GLOBAL Instructions assign constants to variables , And local variables name, Used STORE_FAST The instruction assigns a constant to a variable . The difference between these two instructions is that they are stored in different locations . because Python Put global variables and variables in different places , So these two instructions will save constant values to these locations respectively .
Judging from this ,global value This statement is not actually executed , He's just a switch , If you add global value, When it comes to value When assigning values, use STORE_GLOBAL Instructions , without global value, When it comes to value When assigning values, use STORE_FAST Instructions .
If you don't global value Outside , The rest of the code is removed , I can't see global value The figure of the .
Look at the following Python Code :
-
value = 20
-
def fun():
-
global value
-
import dis
-
dis.disassemble(fun.__code__)
Execute this code , You only get the following 2 strip Python Bytecode :
-
LOAD_CONST 0 (None)
-
RETURN_VALUE
this 2 strip Python Bytecode actually makes fun Function has a default return value , That is, if the function does not explicitly return a value , Then the default will return None. There's nothing in it global value The figure of .
2. use Python The code to compile Python Code
In the use of python When the command runs the script , Even though it will Python Source code compiled into bytecode , But the compilation results are not saved as a file , And everything is done in memory . If it runs frequently Python Some part of the program , Running is actually in memory Python Bytecode . But at the time of release , We expect things like Java equally , You can publish .class file , Actually Python There are similar documents , This is it. .pyc file .
use Python Both the code and the command line can put Python Source code compiled into .pyc file , Just by default ,Python It's quite hidden , Will .pyc The file is generated to a default directory , And a lot IDE( Such as PyCharm) This directory will not be displayed . This directory is __pycache__.
Now do an experiment , First create a demo.py file , Then enter the following code :
-
value = 20;
-
print(value)
Now executing the following code will demo.py The file is compiled and generated .pyc file .
-
import py_compile
-
py_compile.compile('demo.py')
so easy, Just two lines of code ( Another line is import sentence ), You can compile demo.py, After running the program , If in IDE in , Nothing will happen , Don't worry. , Switch to demo.py Directory of files , You'll see one more __pycache__ Catalog , Open it up , There's one in the catalog called demo.cpython-38.pyc The file of . The file name may be different on the reader's machine , The difference is in the final number , there 38 It means that I use Python The version is 3.8, The small version number is not shown here . If readers use things 3.7, So the generated .pyc The document is demo.cpython-37.pyc.
Now go to the console , Get into demo.cpython-38.pyc Directory of files , perform python demo.cpython-38.pyc command , You can also output results , And python demo.py The result of execution is exactly the same . So in the release Python When applied , It can be published directly pyc file .
compile The function is compiling Python When you file , You can specify the second 2 Parameter values , Represents the .pyc file name , So you can specify that pyc Put the files in a specific directory , The code is as follows :
-
import py_compile
-
py_compile.compile('demo.py','demo.pyc')
Execute this code , You can generate a file named demo.pyc The file of , perform python demo.pyc command , It will also get the results we expect .
If you need to compile Python Too many scripts , Can be called multiple times compile function , You can also use compileall Module compile_dir Function recursively compiles all... In the specified directory Python Script files .
Now do an experiment , Create... In the current directory 3 Layer subdirectory :aa/bb/cc, And create one or more Python Script files , You don't have to write any code ( An empty file will do ), Pictured 1 Shown .
chart 1
Now do the following code compilation aa All of the... In the catalog Python Script files .
-
import compileall
-
compileall.compile_dir("aa")
Execute this code , First, all the directories are scanned recursively , And then it compiles all the discovered Python Script files , Pictured 2 Shown .
chart 2
Look at these directories , Every directory has a directory called __pycache__ Catalog , There's a correspondence in it pyc file .
If you don't want to recursively compile all the directories Python Script files , have access to compile_dir The function of the first 2 A parameter specifies the recursion level ,0 Represents the current directory ( Not a recursive ),1 Represents a recursive one level directory , And so on . for example , The following code compiles only all of the... In the current directory Python Script files .
-
import compileall
-
compileall.compile_dir("aa", 0)
3. Compile on the command line Python Script
python Commands can also be used to .py File compiled into .pyc file , for example , If you want to compile demo.py file , You can use the following command :
python -m demo.py
there -m Command line arguments indicate compilation demo.py, After executing this command , It will be in the current directory __pycache__ Directory generation demo.cpython-38.pyc file , And then you can use python Execute this file directly .
If you want to recursively compile all of the Python file , You can use the following command :
python -m compileall aa
This command can be compiled recursively aa All in catalog Python file . If you still want to optimize the compilation results , You can add -O or -OO, So what's the difference between these two optimization parameters ?
If you don't optimize the parameters , Only add -m, Then there will be no optimization , That is, the optimization level (Level) by 0, When not optimized ,Python Internal variables __debug__ by True, Readers can Python Shell Output the value of this variable in . If set -O Parameters , So the optimization level is 1, At this level of optimization , Will __debug__ The value of the variable is set to False. If you use -OO Parameters , The optimization level is 2, Not only will __debug__ The value of the variable is set to False, And will Python Medium docstrings Also removed .docstrings Namely Python Document comments in , Can be used for API Automatically generate documents . That is to say 3 For parts enclosed in single or double quotation marks .
The last part of it is about compile Functions and compile_dir Functions also have settings optimized level Parameters of , take compile In terms of functions , The second part of this function 4 Two parameters are used to set the optimization level , The default value is -1, amount to -O Parameters . It can also be set to 0( No optimization )、1( Same as default ) and 2( amount to -OO Parameters ). The following code uses level = 2 Hierarchical optimization compilation of demo.py.
py_compile.compile('demo.py', 'demo.pyc', False, 2)
In fact, the optimization here , It's not about optimization Python Byte Code, Instead, remove the different debugging information and documentation . The debugging information here mainly refers to in order to Console Or some information output from the log to show the execution status of the program . If these are released with the program , It's going to make the program less efficient . Because execution is Console Or the code that outputs information in the log is very slow ( Relative to code executed directly in memory ).
If you use the command line to optimize compilation .py file , If you are using -O Parameters , The generated target file is :demo.cpython-38.opt-1.pyc, If you are using -OO Parameters , The generated target file is :demo.cpython-38.opt-2.pyc.
4. How to Python Code encryption
Although you can .py The file is compiled and generated .pyc file , but .pyc Document and Java Of .class file , It's easy to decompile . A safer way is to make a private Python Compile and run environment , To put it bluntly , Is to modify Python Compiler source code . Listen, it's very tall , It's not complicated , Just change the constants .
First download Python Source code , Then find the following two files :
-
<Python Source code root >/Lib/opcode.py
-
<Python Source code root >/Include/opcode.h
You can open these two files to see ,opcode.py The code snippet in the file looks like this :
-
def def_op(name, op):
-
opname[op] = name
-
opmap[name] = op
-
-
-
def name_op(name, op):
-
def_op(name, op)
-
hasname.append(op)
-
-
-
def jrel_op(name, op):
-
def_op(name, op)
-
hasjrel.append(op)
-
-
-
def jabs_op(name, op):
-
def_op(name, op)
-
hasjabs.append(op)
-
-
-
# Instruction opcodes for compiled code
-
# Blank lines correspond to available opcodes
-
-
-
def_op('POP_TOP', 1)
-
def_op('ROT_TWO', 2)
-
def_op('ROT_THREE', 3)
-
def_op('DUP_TOP', 4)
-
def_op('DUP_TOP_TWO', 5)
-
def_op('ROT_FOUR', 6)
-
-
-
def_op('NOP', 9)
-
def_op('UNARY_POSITIVE', 10)
-
def_op('UNARY_NEGATIVE', 11)
-
def_op('UNARY_NOT', 12)
-
-
-
def_op('UNARY_INVERT', 15)
-
-
-
def_op('BINARY_MATRIX_MULTIPLY', 16)
-
def_op('INPLACE_MATRIX_MULTIPLY', 17)
opcode.h The code snippet in the file looks like this :
-
#define POP_TOP 1
-
#define ROT_TWO 2
-
#define ROT_THREE 3
-
#define DUP_TOP 4
-
#define DUP_TOP_TWO 5
-
#define ROT_FOUR 6
-
#define NOP 9
-
#define UNARY_POSITIVE 10
-
#define UNARY_NEGATIVE 11
-
#define UNARY_NOT 12
-
#define UNARY_INVERT 15
-
#define BINARY_MATRIX_MULTIPLY 16
-
#define INPLACE_MATRIX_MULTIPLY 17
-
#define BINARY_POWER 19
-
#define BINARY_MULTIPLY 20
We can see , stay opcode.h A bunch of macros are defined in the file ( It's a constant ), and opcode.py The document also defines and opcode.h A value with the same name , The corresponding integer values are also equal . Students who have done compiler should be able to guess what it is , In fact, that is Python Byte Code The corresponding instruction code . compiled .pyc Files are made up of these instructions . for example ,for Instructions are defined as follows :
#define FOR_ITER 93
in other words , If Python In the code for loop , There must be this command . We can do an experiment , The following paragraph contains 1 individual for Cyclic Python Code :
-
# demo.py
-
def fun():
-
for i in [1,2]:
-
print(i);
The output of this code is Python Bytecode , as follows :
-
0 SETUP_LOOP 20 (to 22)
-
2 LOAD_CONST 1 ((1, 2))
-
4 GET_ITER
-
6 FOR_ITER 12 (to 20)
-
8 STORE_FAST 0 (i)
-
-
-
10 LOAD_GLOBAL 0 (print)
-
12 LOAD_FAST 0 (i)
-
14 CALL_FUNCTION 1
-
16 POP_TOP
-
18 JUMP_ABSOLUTE 6
-
20 POP_BLOCK
-
22 LOAD_CONST 0 (None)
-
24 RETURN_VALUE
We can see , The first 4 Line is FOR_ITER Instructions , Each instruction is given by 2 Byte composition , The first 1 A byte represents the instruction itself , The first 2 Bytes represent operands . In the first 11 Yes JUMP_ABSOLUTE The command is a jump command ,FOR_ITER And JUMP_ABSOLUTE Cooperation can form a cycle .JUMP_ABSOLUTE Jump straight to 6, That is to say FOR_ITER Where the command is located .
because FOR_ITER The value of the instruction is 93, This is the decimal system , To convert to hexadecimal is 5d, If you consider the following operands 12( The 16th process is 0C, As for why the operands are 12, This is a FOR_ITER The nature of instruction , Readers can refer to Python Bytecode related documents , This question has nothing to do with this article , I won't elaborate here ), So the complete instruction should be 5d0c. So compile demo.py, Generate corresponding .pyc file , Then open the .pyc file ( Open it with software that can view binary data ), You'll see the picture 1 The code in hexadecimal form shown in , In the 6 OK, we can find 5d0c, This is it. for The starting instruction of the loop .
chart 1
Readers can add another for loop , The code is as follows :
-
# demo.py
-
def fun():
-
for i in [1,2]:
-
print(i);
-
for i in [10,20]:
-
print(i);
see pyc File code , You'll see the picture 2 In the form of . Obviously , The first 6 Xing He 7 It's all about business 5d0c Instructions , This means that the code contains 2 strip for sentence .
chart 2
Python Bytecode decompilers are implemented according to these rules , But the problem is , If 5d Does not mean for loop , And it means if sentence , So the original decompiler is not easy to use .
If there is... In the code if sentence , So according to the different scenes , Will use POP_JUMP_IF_FALSE Instructions or POP_JUMP_IF_TRUE Instructions , These two instructions are in opcode.h Is defined as follows :
-
#define POP_JUMP_IF_FALSE 114
-
#define POP_JUMP_IF_TRUE 115
If you have the following Python Code :
-
if value:
-
print('hello world');
Then you can use POP_JUMP_IF_FALSE Instructions , At this time pyc The code will contain 72(114 The hexadecimal representation of ), But if it will FOR_ITER Of 93 and POP_JUMP_IF_FALSE Of 114 Change it , It takes the form of , Then press Python The standard directive will for As a if,if As a for, In this way, the decompiled code is out of order . The decompiler doesn't know how you swap instruction values . It's like using standard base64 Encoding is not encrypted , But if the standard base64 The code is randomly scrambled , Use this to disrupt base64 Coding rules for coding , There is no standard base64 The encoding table decodes . Unless you get the changed base64 Encoding table , If you want to test every permutation , There will be 64 There are so many possibilities , It is impossible to crack in a limited time . And this modification Python The way of source code , It's like messing up the standards base64 The order of the encoding table , It increases the difficulty and time of cracking .
-
#define FOR_ITER 114
-
#define POP_JUMP_IF_FALSE 93
in addition , It's not enough just to modify the two files mentioned above , Another file needs to be modified , Path as follows :
<Python Source code root >/Python/opcode_targets.h
Readers can open this file , See why you want to modify this file , The code fragment of the file is as follows :
-
static void *opcode_targets[256] = {
-
&&_unknown_opcode,
-
&&TARGET_POP_TOP,
-
&&TARGET_ROT_TWO,
-
&&TARGET_ROT_THREE,
-
&&TARGET_DUP_TOP,
-
&&TARGET_DUP_TOP_TWO,
-
&&TARGET_ROT_FOUR,
-
&&_unknown_opcode,
-
&&_unknown_opcode,
-
&&TARGET_NOP,
-
&&TARGET_UNARY_POSITIVE,
-
&&TARGET_UNARY_NEGATIVE,
-
&&TARGET_UNARY_NOT,
-
... ...
-
}
Obviously , This code defines Python Bytecode instruction , And in the opcode.h The value of each macro defined in the file , Namely opcode_targets Index of array . We know ,C Language array index from 0 Start , therefore opcode_targets The first of an array of 1 An element is a placeholder (&&_unknown_opcode), and POP_TOP Instruction in opcode.h The median in the file is exactly 1, So it's just like opcode_targets The first of an array of 2 Two elements correspond to .
We can continue to look at opcode_targets Array code , See the code form below : find TARGET_INPLACE_TRUE_DIVIDE, The corresponding is INPLACE_TRUE_DIVIDE Instructions , Pictured 3 Shown .
chart 3
And then in opcode.h Found in file INPLACE_TRUE_DIVIDE Instructions , It's just worth it 29, It's exactly the same as opcode_targets The index for 29 The element value of . and TARGET_INPLACE_TRUE_DIVIDE Here is a pile of &&_unknown_opcode Place holder , This also shows that INPLACE_TRUE_DIVIDE There are a lot of free values behind it , I want to see others opcode.h The definition in the document , Pictured 4 Shown .
chart 4
Obviously ,INPLACE_TRUE_DIVIDE After the order RERAISE The instructions go directly from 48 Here we go , So use multiple &&_unknown_opcode As placeholder , Otherwise, the corresponding instruction cannot be found .
So modify Python The source code should follow the following rules :
(1) modify opcode.py Document and opcode.h The code in the file , We should unify and exchange , You can't just swap one ;
(2) And then opcode_targets.h in opcode_targets The relative position of the array is also changed , Otherwise, the corresponding instruction cannot be found ;
It's all changed , Then you can compile Python Code. , Just execute the following command :
-
configure
-
make
-
make install
Finally, when the program is released , You need to bring your own compiled Python Environmental Science , The standard Python The environment is no longer able to run our own generation of pyc The file .
Of course , contain Python There are many ways to code , for example , Yes Python Code obfuscation 、 take Python Code to C Code and so on , I will write a special article to explain these contents later . Okay , Today's sharing is here , If you are right about Python Interested in , Welcome to join us 【python Exchange of learning skirt 】, Free access to learning materials and source code .