Can Python code be encrypted? Python bytecode tells you!

Programmer Lin Lin 2021-02-23 12:05:35
python code encrypted python bytecode


as everyone knows , perform Python The program can be used directly python.exe command , As shown below :

python abc.py

notice python Directly executed abc.py, Maybe many students think that python It's interpretation and execution abc.py Of , It's not . If it's really about interpretation and execution , The slow ones can't be used . actually ,Python And Java equally , It's also a bytecode game .Java A bytecode called Java ByteCode,Python A bytecode called Python ByteCode.Python In the first run abc.py When you file , Will compile the source code file into bytecode , And then execute . Of course , You can also choose to generate bytecode files directly ( The extension is pyc), And then directly execute Python Bytecode file .

Usually Python It's released as source code , But for sensitive information , Don't want to publish as source code , It can be published in bytecode . Of course , Bytecode can also be decompiled . In order to make Python Source code is more secure , You can make your own private Python Environmental Science , We'll talk about that later .

I believe a lot of people have never touched Python Bytecode students must have a lot of questions , Let's continue to read the following !

1.   How to view Python Bytecode

Let's first look at Python Bytecode , To prove that it's running Python The script does start with Python Code compiled into bytecode , And then it executes bytecode , Not directly Python Source code .

First look at the following code :

 
  1.  
    value = 20
  2.  
    def fun():
  3.  
    global value
  4.  
    value = 30
  5.  
    name = 'Bill'
  6.  
    print(value)
  7.  
    print(name)
  8.  
    fun()
  9.  
    print(value)
  10.  
    import dis
  11.  
    dis.disassemble(fun.__code__)

In this code, there is a fun function , It uses global variables value And local variables name, And output the values of these two variables . Finally, I imported dis modular . In this module there is a disassemble function , Used to output anything that contains __code__ Attribute Python The bytecode form of the code segment .

Now execute this code , Will output the following :

 
  1.  
    30
  2.  
    Bill
  3.  
    30
  4.  
    4 0 LOAD_CONST 1 (30)
  5.  
    2 STORE_GLOBAL 0 (value)
  6.  
     
  7.  
     
  8.  
    5 4 LOAD_CONST 2 ('Bill')
  9.  
    6 STORE_FAST 0 (name)
  10.  
     
  11.  
     
  12.  
    6 8 LOAD_GLOBAL 1 (print)
  13.  
    10 LOAD_GLOBAL 0 (value)
  14.  
    12 CALL_FUNCTION 1
  15.  
    14 POP_TOP
  16.  
     
  17.  
     
  18.  
    7 16 LOAD_GLOBAL 1 (print)
  19.  
    18 LOAD_FAST 0 (name)
  20.  
    20 CALL_FUNCTION 1
  21.  
    22 POP_TOP
  22.  
    24 LOAD_CONST 0 (None)
  23.  
    26 RETURN_VALUE

Obviously ,disassemble Output something similar to assembly code . This is actually Python The readable form of bytecode . Each instruction corresponds to a bytecode . So why look at bytecode ? In fact, for application developers , The most direct effect is to better understand Python Source code .

for example , This example uses global variables , That is to say global keyword , that global What do keywords stand for ? from Python It's easy to see in bytecode .

stay Python Something happened in the source code 2 Secondary assignment , The code is as follows :

 
  1.  
    value = 30
  2.  
    name = 'Bill'

among value Global variable ,name yes fun A local variable of a function . Convert these two assignments to Python Bytecode , You get the following code :

 
  1.  
    # value = 30
  2.  
    LOAD_CONST 1 (30)
  3.  
    STORE_GLOBAL 0 (value)
  4.  
     
  5.  
     
  6.  
    # name = 'Bill'
  7.  
    LOAD_CONST 2 ('Bill')
  8.  
    STORE_FAST 0 (name)

from Python Bytecode can be seen , Each assignment statement is converted to 2 strip Python Bytecode . All of them use LOAD_CONST Instructions , This is the instruction to load constants . because value and name All given a constant , It's just an integer , The other is the string . But due to the Python You don't need to specify the variable type when using variables ( Variables have types , But you don't need to specify , When using a variable, determine the type of the variable ), So no matter what type of constant is loaded, assign values to variables , All use LOAD_CONST Instructions .

But the first 2 That's different , For global variables value, Use STORE_GLOBAL Instructions assign constants to variables , And local variables name, Used STORE_FAST The instruction assigns a constant to a variable . The difference between these two instructions is that they are stored in different locations . because Python Put global variables and variables in different places , So these two instructions will save constant values to these locations respectively .

Judging from this ,global value This statement is not actually executed , He's just a switch , If you add global value, When it comes to value When assigning values, use STORE_GLOBAL Instructions , without global value, When it comes to value When assigning values, use STORE_FAST Instructions .

If you don't global value Outside , The rest of the code is removed , I can't see global value The figure of the .

Look at the following Python Code :

 
  1.  
    value = 20
  2.  
    def fun():
  3.  
    global value
  4.  
    import dis
  5.  
    dis.disassemble(fun.__code__)

Execute this code , You only get the following 2 strip Python Bytecode :

 
  1.  
    LOAD_CONST 0 (None)
  2.  
    RETURN_VALUE

this 2 strip Python Bytecode actually makes fun Function has a default return value , That is, if the function does not explicitly return a value , Then the default will return None. There's nothing in it global value The figure of .

2. use Python The code to compile Python Code

In the use of python When the command runs the script , Even though it will Python Source code compiled into bytecode , But the compilation results are not saved as a file , And everything is done in memory . If it runs frequently Python Some part of the program , Running is actually in memory Python Bytecode . But at the time of release , We expect things like Java equally , You can publish .class file , Actually Python There are similar documents , This is it. .pyc file .

use Python Both the code and the command line can put Python Source code compiled into .pyc file , Just by default ,Python It's quite hidden , Will .pyc The file is generated to a default directory , And a lot IDE( Such as PyCharm) This directory will not be displayed . This directory is __pycache__.

Now do an experiment , First create a demo.py file , Then enter the following code :

 
  1.  
    value = 20;
  2.  
    print(value)

Now executing the following code will demo.py The file is compiled and generated .pyc file .

 
  1.  
    import py_compile
  2.  
    py_compile.compile('demo.py')

so easy, Just two lines of code ( Another line is import sentence ), You can compile demo.py, After running the program , If in IDE in , Nothing will happen , Don't worry. , Switch to demo.py Directory of files , You'll see one more __pycache__ Catalog , Open it up , There's one in the catalog called demo.cpython-38.pyc The file of . The file name may be different on the reader's machine , The difference is in the final number , there 38 It means that I use Python The version is 3.8, The small version number is not shown here . If readers use things 3.7, So the generated .pyc The document is demo.cpython-37.pyc.

Now go to the console , Get into demo.cpython-38.pyc Directory of files , perform python demo.cpython-38.pyc command , You can also output results , And python demo.py The result of execution is exactly the same . So in the release Python When applied , It can be published directly pyc file .

compile The function is compiling Python When you file , You can specify the second 2 Parameter values , Represents the .pyc file name , So you can specify that pyc Put the files in a specific directory , The code is as follows :

 
  1.  
    import py_compile
  2.  
    py_compile.compile('demo.py','demo.pyc')

Execute this code , You can generate a file named demo.pyc The file of , perform python demo.pyc command , It will also get the results we expect .

If you need to compile Python Too many scripts , Can be called multiple times compile function , You can also use compileall Module compile_dir Function recursively compiles all... In the specified directory Python Script files .

Now do an experiment , Create... In the current directory 3 Layer subdirectory :aa/bb/cc, And create one or more Python Script files , You don't have to write any code ( An empty file will do ), Pictured 1 Shown .

                      chart 1

Now do the following code compilation aa All of the... In the catalog Python Script files .

 
  1.  
    import compileall
  2.  
    compileall.compile_dir("aa")

Execute this code , First, all the directories are scanned recursively , And then it compiles all the discovered Python Script files , Pictured 2 Shown .

                                  chart 2

Look at these directories , Every directory has a directory called __pycache__ Catalog , There's a correspondence in it pyc file .

If you don't want to recursively compile all the directories Python Script files , have access to compile_dir The function of the first 2 A parameter specifies the recursion level ,0 Represents the current directory ( Not a recursive ),1 Represents a recursive one level directory , And so on . for example , The following code compiles only all of the... In the current directory Python Script files .

 
  1.  
    import compileall
  2.  
    compileall.compile_dir("aa", 0)

3. Compile on the command line Python Script

python Commands can also be used to .py File compiled into .pyc file , for example , If you want to compile demo.py file , You can use the following command :

python -m demo.py

there -m Command line arguments indicate compilation demo.py, After executing this command , It will be in the current directory __pycache__ Directory generation demo.cpython-38.pyc file , And then you can use python Execute this file directly .

If you want to recursively compile all of the Python file , You can use the following command :

python -m compileall aa

This command can be compiled recursively aa All in catalog Python file . If you still want to optimize the compilation results , You can add -O or -OO, So what's the difference between these two optimization parameters ?

If you don't optimize the parameters , Only add -m, Then there will be no optimization , That is, the optimization level (Level) by 0, When not optimized ,Python Internal variables __debug__ by True, Readers can Python Shell Output the value of this variable in . If set -O Parameters , So the optimization level is 1, At this level of optimization , Will __debug__ The value of the variable is set to False. If you use -OO Parameters , The optimization level is 2, Not only will __debug__ The value of the variable is set to False, And will Python Medium docstrings Also removed .docstrings Namely Python Document comments in , Can be used for API Automatically generate documents . That is to say 3 For parts enclosed in single or double quotation marks .

The last part of it is about compile Functions and compile_dir Functions also have settings optimized level Parameters of , take compile In terms of functions , The second part of this function 4 Two parameters are used to set the optimization level , The default value is -1, amount to -O Parameters . It can also be set to 0( No optimization )、1( Same as default ) and 2( amount to -OO Parameters ). The following code uses level = 2 Hierarchical optimization compilation of demo.py.

py_compile.compile('demo.py', 'demo.pyc', False, 2)

In fact, the optimization here , It's not about optimization Python Byte Code, Instead, remove the different debugging information and documentation . The debugging information here mainly refers to in order to Console Or some information output from the log to show the execution status of the program . If these are released with the program , It's going to make the program less efficient . Because execution is Console Or the code that outputs information in the log is very slow ( Relative to code executed directly in memory ).

If you use the command line to optimize compilation .py file , If you are using -O Parameters , The generated target file is :demo.cpython-38.opt-1.pyc, If you are using -OO Parameters , The generated target file is :demo.cpython-38.opt-2.pyc.

4. How to Python Code encryption

Although you can .py The file is compiled and generated .pyc file , but .pyc Document and Java Of .class file , It's easy to decompile . A safer way is to make a private Python Compile and run environment , To put it bluntly , Is to modify Python Compiler source code . Listen, it's very tall , It's not complicated , Just change the constants .

First download Python Source code , Then find the following two files :

 
  1.  
    <Python Source code root >/Lib/opcode.py
  2.  
    <Python Source code root >/Include/opcode.h

You can open these two files to see ,opcode.py The code snippet in the file looks like this :

 
  1.  
    def def_op(name, op):
  2.  
    opname[op] = name
  3.  
    opmap[name] = op
  4.  
     
  5.  
     
  6.  
    def name_op(name, op):
  7.  
    def_op(name, op)
  8.  
    hasname.append(op)
  9.  
     
  10.  
     
  11.  
    def jrel_op(name, op):
  12.  
    def_op(name, op)
  13.  
    hasjrel.append(op)
  14.  
     
  15.  
     
  16.  
    def jabs_op(name, op):
  17.  
    def_op(name, op)
  18.  
    hasjabs.append(op)
  19.  
     
  20.  
     
  21.  
    # Instruction opcodes for compiled code
  22.  
    # Blank lines correspond to available opcodes
  23.  
     
  24.  
     
  25.  
    def_op('POP_TOP', 1)
  26.  
    def_op('ROT_TWO', 2)
  27.  
    def_op('ROT_THREE', 3)
  28.  
    def_op('DUP_TOP', 4)
  29.  
    def_op('DUP_TOP_TWO', 5)
  30.  
    def_op('ROT_FOUR', 6)
  31.  
     
  32.  
     
  33.  
    def_op('NOP', 9)
  34.  
    def_op('UNARY_POSITIVE', 10)
  35.  
    def_op('UNARY_NEGATIVE', 11)
  36.  
    def_op('UNARY_NOT', 12)
  37.  
     
  38.  
     
  39.  
    def_op('UNARY_INVERT', 15)
  40.  
     
  41.  
     
  42.  
    def_op('BINARY_MATRIX_MULTIPLY', 16)
  43.  
    def_op('INPLACE_MATRIX_MULTIPLY', 17)

opcode.h The code snippet in the file looks like this :

 
  1.  
    #define POP_TOP 1
  2.  
    #define ROT_TWO 2
  3.  
    #define ROT_THREE 3
  4.  
    #define DUP_TOP 4
  5.  
    #define DUP_TOP_TWO 5
  6.  
    #define ROT_FOUR 6
  7.  
    #define NOP 9
  8.  
    #define UNARY_POSITIVE 10
  9.  
    #define UNARY_NEGATIVE 11
  10.  
    #define UNARY_NOT 12
  11.  
    #define UNARY_INVERT 15
  12.  
    #define BINARY_MATRIX_MULTIPLY 16
  13.  
    #define INPLACE_MATRIX_MULTIPLY 17
  14.  
    #define BINARY_POWER 19
  15.  
    #define BINARY_MULTIPLY 20

We can see , stay opcode.h A bunch of macros are defined in the file ( It's a constant ), and opcode.py The document also defines and opcode.h A value with the same name , The corresponding integer values are also equal . Students who have done compiler should be able to guess what it is , In fact, that is Python Byte Code The corresponding instruction code . compiled .pyc Files are made up of these instructions . for example ,for Instructions are defined as follows :

#define FOR_ITER 93

in other words , If Python In the code for loop , There must be this command . We can do an experiment , The following paragraph contains 1 individual for Cyclic Python Code :

 
  1.  
    # demo.py
  2.  
    def fun():
  3.  
    for i in [1,2]:
  4.  
    print(i);

The output of this code is Python Bytecode , as follows :

 
  1.  
    0 SETUP_LOOP 20 (to 22)
  2.  
    2 LOAD_CONST 1 ((1, 2))
  3.  
    4 GET_ITER
  4.  
    6 FOR_ITER 12 (to 20)
  5.  
    8 STORE_FAST 0 (i)
  6.  
     
  7.  
     
  8.  
    10 LOAD_GLOBAL 0 (print)
  9.  
    12 LOAD_FAST 0 (i)
  10.  
    14 CALL_FUNCTION 1
  11.  
    16 POP_TOP
  12.  
    18 JUMP_ABSOLUTE 6
  13.  
    20 POP_BLOCK
  14.  
    22 LOAD_CONST 0 (None)
  15.  
    24 RETURN_VALUE

We can see , The first 4 Line is FOR_ITER Instructions , Each instruction is given by 2 Byte composition , The first 1 A byte represents the instruction itself , The first 2 Bytes represent operands . In the first 11 Yes JUMP_ABSOLUTE The command is a jump command ,FOR_ITER And JUMP_ABSOLUTE Cooperation can form a cycle .JUMP_ABSOLUTE Jump straight to 6, That is to say FOR_ITER Where the command is located .

because FOR_ITER The value of the instruction is 93, This is the decimal system , To convert to hexadecimal is 5d, If you consider the following operands 12( The 16th process is 0C, As for why the operands are 12, This is a FOR_ITER The nature of instruction , Readers can refer to Python Bytecode related documents , This question has nothing to do with this article , I won't elaborate here ), So the complete instruction should be 5d0c. So compile demo.py, Generate corresponding .pyc file , Then open the .pyc file ( Open it with software that can view binary data ), You'll see the picture 1 The code in hexadecimal form shown in , In the 6 OK, we can find 5d0c, This is it. for The starting instruction of the loop .

                                                          chart 1

Readers can add another for loop , The code is as follows :

 
  1.  
    # demo.py
  2.  
    def fun():
  3.  
    for i in [1,2]:
  4.  
    print(i);
  5.  
    for i in [10,20]:
  6.  
    print(i);

see pyc File code , You'll see the picture 2 In the form of . Obviously , The first 6 Xing He 7 It's all about business 5d0c Instructions , This means that the code contains 2 strip for sentence .

                                                            chart 2

Python Bytecode decompilers are implemented according to these rules , But the problem is , If 5d Does not mean for loop , And it means if sentence , So the original decompiler is not easy to use .

If there is... In the code if sentence , So according to the different scenes , Will use POP_JUMP_IF_FALSE Instructions or POP_JUMP_IF_TRUE Instructions , These two instructions are in opcode.h Is defined as follows :

 
  1.  
    #define POP_JUMP_IF_FALSE 114
  2.  
    #define POP_JUMP_IF_TRUE 115

If you have the following Python Code :

 
  1.  
    if value:
  2.  
    print('hello world');

Then you can use POP_JUMP_IF_FALSE Instructions , At this time pyc The code will contain 72(114 The hexadecimal representation of ), But if it will FOR_ITER Of 93 and POP_JUMP_IF_FALSE Of 114 Change it , It takes the form of , Then press Python The standard directive will for As a if,if As a for, In this way, the decompiled code is out of order . The decompiler doesn't know how you swap instruction values . It's like using standard base64 Encoding is not encrypted , But if the standard base64 The code is randomly scrambled , Use this to disrupt base64 Coding rules for coding , There is no standard base64 The encoding table decodes . Unless you get the changed base64 Encoding table , If you want to test every permutation , There will be 64 There are so many possibilities , It is impossible to crack in a limited time . And this modification Python The way of source code , It's like messing up the standards base64 The order of the encoding table , It increases the difficulty and time of cracking .

 
  1.  
    #define FOR_ITER 114
  2.  
    #define POP_JUMP_IF_FALSE 93

in addition , It's not enough just to modify the two files mentioned above , Another file needs to be modified , Path as follows :

<Python Source code root >/Python/opcode_targets.h

Readers can open this file , See why you want to modify this file , The code fragment of the file is as follows :

 
  1.  
    static void *opcode_targets[256] = {
  2.  
    &&_unknown_opcode,
  3.  
    &&TARGET_POP_TOP,
  4.  
    &&TARGET_ROT_TWO,
  5.  
    &&TARGET_ROT_THREE,
  6.  
    &&TARGET_DUP_TOP,
  7.  
    &&TARGET_DUP_TOP_TWO,
  8.  
    &&TARGET_ROT_FOUR,
  9.  
    &&_unknown_opcode,
  10.  
    &&_unknown_opcode,
  11.  
    &&TARGET_NOP,
  12.  
    &&TARGET_UNARY_POSITIVE,
  13.  
    &&TARGET_UNARY_NEGATIVE,
  14.  
    &&TARGET_UNARY_NOT,
  15.  
    ... ...
  16.  
    }

Obviously , This code defines Python Bytecode instruction , And in the opcode.h The value of each macro defined in the file , Namely opcode_targets Index of array . We know ,C Language array index from 0 Start , therefore opcode_targets The first of an array of 1 An element is a placeholder (&&_unknown_opcode), and POP_TOP Instruction in opcode.h The median in the file is exactly 1, So it's just like opcode_targets The first of an array of 2 Two elements correspond to .

We can continue to look at opcode_targets Array code , See the code form below : find TARGET_INPLACE_TRUE_DIVIDE, The corresponding is INPLACE_TRUE_DIVIDE Instructions , Pictured 3 Shown .

                                                        chart 3

And then in opcode.h Found in file INPLACE_TRUE_DIVIDE Instructions , It's just worth it 29, It's exactly the same as opcode_targets The index for 29 The element value of . and TARGET_INPLACE_TRUE_DIVIDE Here is a pile of &&_unknown_opcode Place holder , This also shows that INPLACE_TRUE_DIVIDE There are a lot of free values behind it , I want to see others opcode.h The definition in the document , Pictured 4 Shown .

                                            chart 4

Obviously ,INPLACE_TRUE_DIVIDE After the order RERAISE The instructions go directly from 48 Here we go , So use multiple &&_unknown_opcode As placeholder , Otherwise, the corresponding instruction cannot be found .

So modify Python The source code should follow the following rules :

(1) modify opcode.py Document and opcode.h The code in the file , We should unify and exchange , You can't just swap one ;

(2) And then opcode_targets.h in opcode_targets The relative position of the array is also changed , Otherwise, the corresponding instruction cannot be found ;

It's all changed , Then you can compile Python Code. , Just execute the following command :

 
  1.  
    configure
  2.  
    make
  3.  
    make install

Finally, when the program is released , You need to bring your own compiled Python Environmental Science , The standard Python The environment is no longer able to run our own generation of pyc The file .

Of course , contain Python There are many ways to code , for example , Yes Python Code obfuscation 、 take Python Code to C Code and so on , I will write a special article to explain these contents later . Okay , Today's sharing is here , If you are right about Python Interested in , Welcome to join us 【python Exchange of learning skirt 】, Free access to learning materials and source code .

版权声明
本文为[Programmer Lin Lin]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/02/20210223115614929b.html

  1. Python notes: List
  2. Translation: practical Python Programming 02_ 03_ Formatting
  3. Python中的四种队列(queue)、堆(heap)
  4. Side effects of Python mutable types as default parameters of functions
  5. This is the best Python tutorial I've ever seen: ten minutes to get to know python
  6. 使用python编写量子线路打印的简单项目,并使用Sphinx自动化生成API文档
  7. Python happy enemy: crawler and anti crawler with a solution to give you New Year
  8. 使用python编写量子线路打印的简单项目,并使用Sphinx自动化生成API文档
  9. When writing python, you will encounter the following error: modulenotfounderror: no module named ' email.mime '; 'email' is not a package
  10. Python class call and private and public property method call
  11. Proprietary methods for Python classes
  12. Foundation of Python: number string and list
  13. Foundation of Python: number string and list
  14. Foundation of Python: number string and list
  15. 华为 Python网络自动化
  16. Python Cannot open E:\Python36\Scripts\pip-script.py
  17. Peeping into the future is not a dream, python data analysis is easy to achieve
  18. The practical skills summed up by Alibaba and Huawei Python engineers, only you haven't seen them yet?
  19. Sour! See the Python programmers on the tiktok get the pay slip...
  20. Foundation of Python: number string and list
  21. Python installation tutorial
  22. Python installation tutorial
  23. This article will familiarize you with the transformation process of Python - > Cafe - > om model
  24. Four kinds of queues and heaps in Python
  25. Using Python to write a simple project of quantum circuit printing, and using Sphinx to automatically generate API documents
  26. Using Python to write a simple project of quantum circuit printing, and using Sphinx to automatically generate API documents
  27. Huawei Python Network Automation
  28. Python Cannot open E:\Python36\Scripts\pip- script.py
  29. 找不到Python问题解决
  30. PHP和Python哪个更有市场前景?我学的是PHP
  31. Python problem resolution not found
  32. Which has more market prospects, PHP or Python? I studied PHP
  33. Foundation of Python: number string and list
  34. python 编码问题之终极解决
  35. The ultimate solution to the problem of Python coding
  36. 能取值亦能赋值的Python切片
  37. Python slice with value and value
  38. 能取值亦能赋值的Python切片
  39. Python slice with value and value
  40. python 异常处理
  41. Python exception handling
  42. python 异常处理
  43. Python exception handling
  44. Orca: 基于DolphinDB的分布式pandas接口
  45. Orca: distributed panda interface based on dolphin DB
  46. 5个无聊Python程序,用Python整蛊你的朋友们吧
  47. Five boring Python programs, trick your friends with Python
  48. python进阶训练营
  49. Python advanced training camp
  50. 【免费】0基础也能轻松学的Python训练营来啦,限时抢位中!
  51. [free] Python training camp, which is easy to learn, is here. It's time to grab a place!
  52. 手把手教你把Python应用到实际开发 不再空谈语法
  53. 全面系统Python3.8入门+进阶 (程序员必备第二语言)
  54. Hand in hand to teach you how to apply Python to practical development
  55. Comprehensive system introduction to Python 3.8 + Advanced
  56. Python语言的排序算法有哪些?Python学习班!
  57. Python language sorting algorithm what? Python classes!
  58. Java、JavaScript、C、C++、PHP、Python都是用来开发什么?
  59. 为什么学习Python?什么途径学习Python合适?
  60. What are Java, JavaScript, C, C + +, PHP and python used to develop?