use python Crawling will inevitably encounter the anti crawling means of the website , Therefore, you need to disguise your reptiles multiple times , No more nonsense , Direct delivery of dry goods .
First import the corresponding module :
from fake_useragent import UserAgent
import random
import requests
Python3 The first step of anti creep ---- Set random UserAgent:
ua = UserAgent()
# print(ua.random)
header = {
'User-Agent': ua.random ,}
# print(header)
Python3 Carry out the second step of anti reverse climbing ---- carry Cookie Log in to crawler (cookie Your picture is below ):
cookie = " Readers find it on the website cookie"
# Combine the above UA
header = {
'User-Agent': ua.random ,
'Cookie': cookie,
}
# print(header)
Python3 Carry out the third step of anti reverse climbing ---- Set random IP:
# Helpful ip
ip_list = [
{
'HTTPS' : '120.83.103.87:9999'},
{
'HTTPS': '120.83.109.33:9999'},
{
'HTTPS': '1.199.30.247:9999'},
{
'HTTPS': '58.253.155.189:9999'},
{
'HTTPS': '120.84.101.75:9999'},
{
'HTTPS': '163.204.241.125:9999'},
{
'HTTPS': '175.155.137.30:1133'},
{
'HTTPS': '58.253.158.156:9999'},
{
'HTTPS': '58.253.156.8:9999'},
{
'HTTPS': '112.85.164.168:9999'},
{
'HTTPS': '120.83.109.113:9999'},
{
'HTTPS': '1.198.73.43:9999'},
{
'HTTPS': '163.204.242.153:9999'},
{
'HTTPS': '1.197.204.143:9999'},
{
'HTTPS': '117.91.130.15:9999'},
{
'HTTPS': '171.11.179.158:9999'},]
IP = random.choices(ip_list)[0]
# print(IP)
header = {
'User-Agent': ua.random ,
'Cookie': cookie,
}
# here url Use Baidu as an example
url = "https://www.baidu.com/"
response = requests.get(url , headers = header , proxies= IP)
# there cookie Not set yet , Readers should first put their cookie After setting, you can proceed to the next
print(response.text)
Organize and perfect python Anti creep version :
from fake_useragent import UserAgent
import random
import requests
# Set global variables
# # Helpful ip
ip_list = [
{
'HTTPS' : '120.83.103.87:9999'},
{
'HTTPS': '120.83.109.33:9999'},
{
'HTTPS': '1.199.30.247:9999'},
{
'HTTPS': '58.253.155.189:9999'},
{
'HTTPS': '120.84.101.75:9999'},
{
'HTTPS': '163.204.241.125:9999'},
{
'HTTPS': '175.155.137.30:1133'},
{
'HTTPS': '58.253.158.156:9999'},
{
'HTTPS': '58.253.156.8:9999'},
{
'HTTPS': '112.85.164.168:9999'},
{
'HTTPS': '120.83.109.113:9999'},
{
'HTTPS': '1.198.73.43:9999'},
{
'HTTPS': '163.204.242.153:9999'},
{
'HTTPS': '1.197.204.143:9999'},
{
'HTTPS': '117.91.130.15:9999'},
{
'HTTPS': '171.11.179.158:9999'},]
# Set up cookie Information , Here, if the reader has a lot of cookie, You can set IP Set up random cookie
cookie = " Readers find it on the website cookie"
# Set random UA
ua = UserAgent()
UA = ua.random
# Set request header , Here the reader can put the complete request header (headers) Fill in
header = {
'User-Agent': ua.random ,
'Cookie': cookie,
}
# Visit the website
url = "https://www.baidu.com/"
# Visit the website
tag = True # The sign of the loop
while tag:
IP = random.choices(ip_list)[0]
response = requests.get( url , headers = header , proxies= IP )
if response.status_code == 200: # The feedback is 200 Flag request succeeded
print(response.text)
tag = False
else:
ip_list.remove(IP)
tag = True
Python3 Carry out the fourth step of anti reverse climbing ---- Use selenium Simulate the browser to crawl :
This module is more complex , Bloggers are going to write a separate blog post for this module ;
Python3 Carry out the fifth step of anti reverse climbing ---- Use scrapy Frame for crawling :
This module is more difficult , If the reader needs , Bloggers will write a separate blog post to explain in detail how to use scrapy Framework to do crawler work .
The code ends here , If readers don't understand something, they can post a private letter to the blogger or comment below , The blogger will reply at the first time , I am an active smiling face .