In depth understanding of Python's type hints

Programmer Xiaoxin 2022-06-23 17:53:38 阅读数:162


Code without type prompt :

def greeting(name):
return 'Hello ' + name
Copy code 

Code with type prompt :

def greeting(name: str) -> str:
return 'Hello ' + name
Copy code 

The general format for tips is usually this :

def function(variable: input_type) -> return_type:
Copy code 

However , About what they are ( In this paper , I call them tips )、 How they will benefit your code , There is still a lot of confusion .

When I start to investigate and measure whether type cues are useful to me , I became very confused . therefore , Just like I usually treat things I don't understand , I decided to dig deeper , At the same time, I hope this article will be useful to others .

As usual , If you want to comment on something you see , Please feel free pull request.

How do computers compile our code

To make sure Python Core developers are trying to do something with type hints , Let's come from Python There are several levels , So as to better understand the working principle of computer and programming language .

The core function of programming language , It's using CPU Data processing , And store the input and output in memory .

CPU Quite stupid , It can accomplish arduous tasks , But I can only understand machine language , The bottom layer is powered by electricity . Machine language bottom use 0 and 1 To express .

In order to get these 0 and 1, We need to move from high-level language to low-level language , This requires compiling and interpreting languages .

Programming languages are either Be compiled or executed (Python Interpret and execute through interpreter ), Code conversion to lower level machine code , Tell the low-level components of the computer what the hardware should do .

There are many ways to translate code into machine recognizable code : You can build binaries and have the compiler translate them (C++、Go、Rust etc. ), Or run the code directly and let the interpreter execute . The latter is Python( as well as PHP、Ruby And similar scripting languages ) How it works .

How does the hardware know how to put these 0 and 1 Stored in memory ? Software, that is, our code needs to tell hardware how to allocate memory for data . What kind of data are these ? This is determined by the type of data the language chooses .

Every language has data types , They are often the first thing to learn when you learn programming .

You may have seen a tutorial like this ( come from Allen Downey Excellent teaching materials ,“ Think like a computer scientist ”), Tell us what they are . In short , They are different ways of representing data in memory .

Depending on the language used , There will be strings , Integers and other types . such as Python Basic data types for contain :

int, float, complex
Copy code 

There are also advanced data types made up of several basic data types . for example ,Python The list can contain integers , String or both contain .

To know how much memory needs to be allocated , Computers need to know the type of data being stored . Fortunately, ,Python Of Built in functions getsizeof, It can tell us how many bytes each different data type takes .

This Wonderful answer Told us something “ Empty data structure ” Approximate value :

import sys
import decimal
import operator
d = {"int": 0,
"float": 0.0,
"dict": dict(),
"set": set(),
"tuple": tuple(),
"list": list(),
"str": "a",
"unicode": u"a",
"decimal": decimal.Decimal(0),
"object": object(),
# Create new dict that can be sorted by size
d_size = {}
for k, v in sorted(d.items()):
sorted_x = sorted(d_size.items(), key=lambda kv: kv[1])
[('object', 16),
('float', 24),
('int', 24),
('tuple', 48),
('str', 50),
('unicode', 50),
('list', 64),
('decimal', 104),
('set', 224),
('dict', 240)]
Copy code 

If we sort the results , We can see that by default , The biggest data structure is an empty dictionary , Then there's the assembly ; Compared to strings , Plastic surgery takes up little space .

This lets us know how much memory space is occupied by different types of data in the program .

Why should we care about this ? Because some types are more efficient than others , More suitable for different tasks . There are also occasions , We need to do a strict check on the types to make sure they don't violate some of the constraints of our program .

But what are these types ? Why do we need them ?

Here's where type systems work .

Type system introduction

Long ago , People who rely on manual arithmetic To realize , In proving the equation , They can use “ type ” Mark numbers or other elements in the equation , To reduce many logical problems .

In limine , Computer science basically relies on hand to complete a lot of mathematical operations , Some principles continue , A type system assigns different variables or elements to a specific type , Become a way to reduce the number of errors in the program .

Here are some examples :

  • If we write software for banks , String cannot be used in the code snippet that calculates the total amount of user account .
  • If we're going to process the survey data , Want to know what people do or don't do , In this case, Boolean values representing yes or no will be most appropriate .
  • In a big search engine , We must limit the number of characters allowed to enter the search box , So we need to type verify some types of strings .

Now in the field of programming , There are two types of systems that go on and on : Static and dynamic .Steve Klabnik writes :

In a static system , The compiler checks the source code and will “ type ” Tags are assigned to parameters in the code , Then use them to infer information about the behavior of the program . In dynamic type system , The compiler generates code to track the type of data the program uses ( It happens to be called “ type ”).

What does that mean? ? This means that for compiled languages , You need to specify the type in advance so that the compiler does type checking at compile time to make sure the program is reasonable .

Maybe what I read recently is The best explanation for both

I used to use static typing language , But in the past few years I have mainly used Python Language . The first experience was a little irritated , It feels like it just slows me down , and Python I could have done nothing but what I wanted to do , Even if I make mistakes occasionally, it doesn't matter . It's kind of like directing people who like to ask questions , Not those who always say they agree with you , But you're not sure if they understand everything correctly .

Here's one thing to note : Static and dynamic languages are closely linked , But it's not a synonym for compiled or interpreted languages . You can use dynamically typed languages ( Such as Python) Compile implementation , You can also use static language ( Such as Java) Explain to perform , For example, using Java REPL.

Data types in static and dynamic type languages

So what's the difference between data types in these two languages ? In static type , You have to define the type first . for example , If you use Java, Your program might look like this :

public class CreatingVariables {
public static void main(String[] args) {
int x, y, age, height;
double seconds, rainfall;
x = 10;
y = 400;
age = 39;
height = 63;
seconds = 4.71;
rainfall = 23;
double rate = calculateRainfallRate(seconds, rainfall);
private static double calculateRainfallRate(double seconds, double rainfall) {
return rainfall/seconds;
Copy code 

Notice the beginning of this program , We declared the type of the variable :

int x, y, age, height;
double seconds, rainfall;
Copy code 

Method must also contain the passed in variable , So that the code can compile correctly . stay Java in , You have to design the type from the beginning so that the compiler knows what to check when compiling code into machine code .

and Python Hide the type , Allied Python The code looks like this :

x = 10
y = 400
age = 39
height = 63
seconds = 4.71
rainfall = 23
rate = calculateRainfall(seconds, rainfall)
def calculateRainfall(seconds, rainfall):
return rainfall/seconds
Copy code 

What is the principle behind this ?

Python How to handle data types

Python It's a dynamic type of language , This means that he will only check the type of variables you declare when you run the program . As we saw in the code snippet above , You don't have to plan the type and memory allocation in advance .

Among them What happened? :

stay Python in ,CPython Compile the source code into a simpler bytecode form . These instructions are similar to CPU Instructions , But they are not made by CPU perform , It's done by the virtual machine software .( These virtual machines are not imitating the entire operating system , It's just a simplification CPU execution environment )

When CPython When compiling a program , If you do not specify a data type , How does it know the type of variable ? The answer is that it doesn't know , It only knows that the variable is an object .Python Everything in is object , Until it becomes a specific type , That's when it was examined .

For types like strings ,Python Suppose that anything surrounded by single or double quotes is a string . For numbers ,Python There is a numerical type corresponding to . If we try to do something about a certain type Python An operation that cannot be completed ,Python We will be prompted to .

for example , It looks like this :

name = 'Vicki'
seconds = 4.71;
TypeError Traceback (most recent call last)
<ipython-input-9-71805d305c0b> in <module>
----> 5 name + seconds
TypeError: must be str, not float
Copy code 

It tells us that we can't add strings and floating-point numbers .Python I didn't know until the moment of execution name It's a string and seconds Is a floating point number .

let me put it another way ,

Duck type happens in this case : When we add ,Python I don't care about the type of object . It is concerned about whether the content returned by the addition method it calls is reasonable , If not , Will throw an exception .

So what does that mean ? If we take a similar Java perhaps C To write a piece of code , We are CPython The interpreter will not encounter any errors before executing the lines of code with answers .

For teams that write a lot of code , This has proved inconvenient . Because you don't have to deal with just a few variables , And deal with a lot of classes that call each other , And need to be able to quickly check everything .

If you can't write good test code , Identify bugs in the program before putting it into production , You're going to destroy the whole system .

Broadly , There are A lot of benefits

If you use complex data structures , Or functions with a lot of input , It will be easier to read the code again in a long time . If it's just a simple function with a single parameter in our example , It will be very simple .

But if you're dealing with a code base with a lot of input , such as PyTorch In document This example

def train(args, model, device, train_loader, optimizer, epoch):
for batch_idx, (data, target) in enumerate(train_loader):
data, target =,
output = model(data)
loss = F.nll_loss(output, target)
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
Copy code 

What is? model? Let's look at the following code .

model = Net().to(device)
Copy code 

If we can specify it in the method signature without having to look at the code , Is that cool ? It looks like this :

def train(args, model (type Net), device, train_loader, optimizer, epoch):
Copy code 

What is device Well

device = torch.device("cuda" if use_cuda else "cpu")
Copy code 

What is? torch.device? It's a special PyTorch type . If we come to Documentation and other parts of the code , We can find out :

A :class:`torch.device` is an object representing the device on which a :class:`torch.Tensor` is or will be allocated.
The :class:`torch.device` contains a device type ('cpu' or 'cuda') and optional device ordinal for the device type. If the device ordinal is not present, this represents the current device for the device type; e.g. a :class:`torch.Tensor` constructed with device 'cuda' is equivalent to 'cuda:X' where X is the result of :func:`torch.cuda.current_device()`.
A :class:`torch.device` can be constructed via a string or via a string and device ordinal
Copy code 

If we can annotate these , You don't have to look up , Isn't that better ?

def train(args, model (type Net), device (type torch.Device), train_loader, optimizer, epoch):
Copy code 

There are many other examples ......

So type prompt is helpful for everyone to program .

Type tips also help others read your code . Code with type hints is easier to read , You don't have to check the content of the entire program as in the example above . Type cues improve readability .

that ,Python What has been done to improve the readability of statically typed languages ?

Python Type tips for

Here are the sources of type cues , As a comment next to the code , Called a type comment or type hint . I'll call them type cues . In other languages , The meaning of notes and hints is totally different .

stay Python 2 People began to add hints to the code , To represent what the various functions return .

That code looks like such :

users = [] # type: List[UserID]
examples = {} # type: Dict[str, Any]
Copy code 

The start type prompt is like a comment . But later Python Gradually use a more unified approach to handling type hints , Began to include Function Comments

Function annotations, both for parameters and return values, are completely optional.
Function annotations are nothing more than a way of associating arbitrary Python expressions with various parts of a function at compile-time.
By itself, Python does not attach any particular meaning or significance to annotations. Left to its own, Python simply makes these expressions available as described in Accessing Function Annotations below.
The only way that annotations take on meaning is when they are interpreted by third-party libraries. These annotation consumers can do anything they want with a function's annotations. For example, one library might use string-based annotations to provide improved help messages, like so:
Copy code 

With PEP 484 The development of , It is related to mypy Developed together , This is one from DropBox Project , It checks the type when you run the program . Remember not to check type at run time . If you try to run a method on an incompatible type , There will only be problems . For example, trying to slice a dictionary or pop a value from a string .

In terms of implementation details :

Although these comments pass at run time annotations Properties available , But there is no type checking at run time . contrary , The proposal assumes that there is a separate offline type checker , Users can run their own source code . In essence , This type of inspector is like a powerful linter.( Of course, individual users can use similar checkers for design execution or immediate optimization at runtime , But these tools are not mature enough )

What is it like in practice ?

Type checking also means that you can use the integrated development environment more easily . for example PyCharm According to the type provided Code completion and check , It's like VS Code equally .

Type checking is also useful on the other hand : They can stop you from making stupid mistakes . Here is a good example .

Here we want to add a name to the dictionary :

names = {'Vicki': 'Boykis',
'Kim': 'Kardashian'}
def append_name(dict, first_name, last_name):
dict[first_name] = last_name
Copy code 

If we allow the program to do this , We'll get a bunch of malformed entries in the dictionary .

So how to correct it ?

from typing import Dict
names_new: Dict[str, str] = {'Vicki': 'Boykis',
'Kim': 'Kardashian'}
def append_name(dic: Dict[str, str] , first_name: str, last_name: str):
dic[first_name] = last_name
Copy code 

By means of mypy function :

(kanye) mbp-vboykis:types vboykis$ mypy error: Argument 3 to "append_name" has incompatible type "float"; expected "str"
Copy code 

We can see ,mypy This type of . Include... In the test pipeline in the continuous integration pipeline mypy It makes sense .

Type hints in the inheritance development environment

One of the biggest benefits of using type cues is , You can IDE It will get the same auto completion function as static language .

such as , Let's say you have a piece of code , It's just that the two functions used above are wrapped into classes .

from typing import Dict
class rainfallRate:
def __init__(self, hours, inches):
self.hours= hours
self.inches = inches
def calculateRate(self, inches:int, hours:int) -> float:
return inches/hours
class addNametoDict:
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
self.dict = dict
def append_name(dict:Dict[str, str], first_name:str, last_name:str):
dict[first_name] = last_name
Copy code 

Ingeniously , Now we've added the type , When we call a method of a class , We can see what happened :

Start using type tips

mypy There's something about developing a code base Good advice

 1. Start small – get a clean mypy build for some files, with few hints
2. Write a mypy runner script to ensure consistent results
3. Run mypy in Continuous Integration to prevent type errors
4. Gradually annotate commonly imported modules
5. Write hints as you modify existing code and write new code
6. Use MonkeyType or PyAnnotate to automatically annotate legacy code
Copy code 

To start using type hints in your own code , It will be helpful to understand the following points :

First , If you're using anything other than strings , plastic , Boor and others Python Basic type , You need Import type module .

second , Through the module , There are several complex types available :

Dictionaries 、 Tuples 、 list 、 Collection etc. .

for example , Dictionaries [str, float] Means you want to check a dictionary , Where the key is a string type , Value is a floating-point type .

There's another one called Optional and Union The type of .

Third , Here is the form of the type prompt :

import typing
def some_function(variable: type) -> return_type:
Copy code 

If you want to start using type hints more deeply , Many smart people have written some tutorials . Here is the introduction The best tutorial . And it will know how you set up the test environment .

that , How to decide ? Use it or not ?

Should you use type cues ?

It depends on your usage scenario , It's like Guido and mypy The document says :

mypy The goal is not to convince everyone to write static types Python, No matter now or in the future , Static type programming is completely optional .mypy The goal is to Python Programmers offer more options , send Python Called a more competitive alternative to other statically typed languages in large projects , So as to improve the working efficiency of programmers and improve the quality of software .

Because of the settings mypy And think about the type of overhead needed , Type hints don't make sense for a small code base ( For example jupyter notebook in ). What is a small code base ? Say conservatively , Probably anything less than 1k The content of .

For large code bases , When you need to work with others , pack , When you need version control and continuous integration systems , Type cues are meaningful and can save a lot of time .

In my opinion , Type cues are becoming more and more common . In the next few years , Even in less common places , It's not a bad thing to take the lead in using it .

版权声明:本文为[Programmer Xiaoxin]所创,转载请带上原文链接,感谢。