This blog is only for my spare time to record articles , Publish to , Only for users to read , If there is any infringement , Please let me know , I'll delete it .
This article is pure and wild , There is no reference to other people's articles or plagiarism . Insist on originality !!
Hello . Here is Python Reptiles from getting started to giving up series of articles . I am a SunriseCai.
Use Python Reptiles Here are three steps , One step corresponds to one article .
This article introduces Python Reptiles The first step : Request web page .
requests It's a Python HTTP Client library , Use Python Reptiles It can't be without it , It's also the highlight of this chapter .
First , Need to be in cmd Window input the following command , Install... For network requests requests modular .
pip install requests
Here only for requests The basic use of , For more information, please click requests Official documents .
requests There are many ways to request a module , Here are just two of the most commonly used requests , Namely GET and POST request
Method | describe |
---|---|
requests.get() | Request the specified page information , And return the entity body |
requests.post() | Submit data to the specified resource for processing request ( For example, submit form ) |
The first step to start is , It is necessary to put requests Module import .
import requests
Examples of success :
resp = requests.get('https://www.baidu.com')
print(resp.status_code) # 200 If the return value of the status code is 200 The visit is successful
Examples of failures :
resp = requests.get('https://www.douban.com/') # Douban Homepage
print(resp.status_code) # 418 Here the status code is returned as 418, It's obvious that the request didn't succeed , Let's talk about how to deal with
Example :
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36'
}
data = {
'name': 'xxx', # account number
'password': 'xxx' # password
}
# carry headers And data For the request
resp = requests.post('https://accounts.douban.com/j/mobile/login/basic', data=data, headers=headers)
print(resp.status_code) # 200 The request is successful
print(resp.text) # text In order to get the response text information {
"status":"success","message":"success","description":" Handle a successful "...}}
Let's take a look at adding request headers ( Disguise one's identity ) The result of the subsequent request :
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36'
} # Here is the request header of Google browser
r = requests.get('https://www.douban.com/', headers=headers) # Douban Homepage
print(r.status_code) # 200 carry headers After the request , Successful visit
You can also add cookie,referer Equal parameter , No more introduction here , Later articles will talk about how to use them .
There are three main methods :
r = requests.get('http://www.xxx.com')
Method | describe | use |
---|---|---|
r.text | Return the response body text information | Text content |
r.content | Return binary response content | picture 、 music 、 Video etc. ’ |
r.json() | return json Content , Extract the data in the returned content in the form of key value pairs | json Format page |
It is mainly divided into 5 Categories: :
Status code | describe |
---|---|
1** | instructions – Indicates that the request has been received , To continue processing |
2** | success – Indicates that the request was received successfully 、 understand 、 Accept |
3** | Redirect – Incomplete information needs to be added |
4** | Client error – The request has a syntax error or the request cannot be implemented |
5** | Server-side error – The server could not fulfill the legitimate request |
View response headers :
resp = requests.get('https://www.baidu.com')
print(resp.headers)
# {
'Accept-Ranges': 'bytes', 'Cache-Control': 'no-cache'...} The data returned is dict
Check the response header for Cache-Control:
resp = requests.get('https://www.baidu.com')
print(resp.headers['Cache-Control']) # no-cache
Check the other parameters of the response header .
The following is quoted from requests Official documents .
The web page is shown below :
Request code :
import requests
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36'
}
resp = requests.get(url=url, headers=headers)
print(resp.text)
The return value is as follows :
<!DOCTYPE html>
<html lang="zh-cmn-Hans" class="">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="renderer" content="webkit">
<meta name="referrer" content="always">
<meta name="google-site-verification" content="ok0wCgT20tBBgo9_zat2iAcimtN4Ftf5ccsh092Xeyw" />
<title>
Watercress movie Top 250
</title>
<body>
<ol class="grid_view">
<li>
<div class="item">
<div class="pic">
<em class="">1</em>
<a href="https://movie.douban.com/subject/1292052/">
<img width="100" alt=" Shawshank redemption " src="https://img3.doubanio.com/view/photo/s_ratio_poster/public/p480747492.jpg" class="">
</a>
</div>
<div class="info">
<div class="hd">
<a href="https://movie.douban.com/subject/1292052/" class="">
<span class="title"> Shawshank redemption </span>
<span class="title"> / The Shawshank Redemption</span>
<span class="other"> / The moon is black and high ( harbor ) / stimulate 1995( platform )</span>
</a>
<span class="playable">[ Playable ]</span>
</div>
<div class="bd">
<p class="">
The director : frank · Delabond Frank Darabont starring : Tim · Robbins Tim Robbins /...<br>
1994 / The United States / crime The plot
</p>
<div class="star">
<span class="rating5-t"></span>
<span class="rating_num" property="v:average">9.7</span>
<span property="v:best" content="10.0"></span>
<span>1758111 People comment on </span>
</div>
<p class="quote">
<span class="inq"> Hope makes people free .</span>
</p>
</div>
</div>
</div>
</li>
' Omitted below '......
</body>
</html>
First , This article doesn't make sense , Please also write kindly , Suggestions requests Official documents .
Finally, I will summarize the content of this chapter :
Next article , be known as 《Python Reptiles from getting started to giving up 05 | Python Crawler launched the first analysis page 》.