Precautions for Python crawler

Active smiling face 2021-11-25 10:18:32
precautions python crawler

use python Crawling will inevitably encounter the anti crawling means of the website , Therefore, you need to disguise your reptiles multiple times , No more nonsense , Direct delivery of dry goods .
First import the corresponding module :

from fake_useragent import UserAgent
import random
import requests

Python3 The first step of anti creep ---- Set random UserAgent:

ua = UserAgent()
# print(ua.random)
header = {
'User-Agent': ua.random ,}
# print(header)

Python3 Carry out the second step of anti reverse climbing ---- carry Cookie Log in to crawler (cookie Your picture is below ):

 Insert picture description here

cookie = " Readers find it on the website cookie"
# Combine the above UA
header = {

'User-Agent': ua.random ,
'Cookie': cookie,
}
# print(header)

Python3 Carry out the third step of anti reverse climbing ---- Set random IP:

# Helpful ip
ip_list = [
{
'HTTPS' : '120.83.103.87:9999'},
{
'HTTPS': '120.83.109.33:9999'},
{
'HTTPS': '1.199.30.247:9999'},
{
'HTTPS': '58.253.155.189:9999'},
{
'HTTPS': '120.84.101.75:9999'},
{
'HTTPS': '163.204.241.125:9999'},
{
'HTTPS': '175.155.137.30:1133'},
{
'HTTPS': '58.253.158.156:9999'},
{
'HTTPS': '58.253.156.8:9999'},
{
'HTTPS': '112.85.164.168:9999'},
{
'HTTPS': '120.83.109.113:9999'},
{
'HTTPS': '1.198.73.43:9999'},
{
'HTTPS': '163.204.242.153:9999'},
{
'HTTPS': '1.197.204.143:9999'},
{
'HTTPS': '117.91.130.15:9999'},
{
'HTTPS': '171.11.179.158:9999'},]
IP = random.choices(ip_list)[0]
# print(IP)
header = {

'User-Agent': ua.random ,
'Cookie': cookie,
}
# here url Use Baidu as an example 
url = "https://www.baidu.com/"
response = requests.get(url , headers = header , proxies= IP)
# there cookie Not set yet , Readers should first put their cookie After setting, you can proceed to the next 
print(response.text)

Organize and perfect python Anti creep version :

from fake_useragent import UserAgent
import random
import requests
# Set global variables 
# # Helpful ip
ip_list = [
{
'HTTPS' : '120.83.103.87:9999'},
{
'HTTPS': '120.83.109.33:9999'},
{
'HTTPS': '1.199.30.247:9999'},
{
'HTTPS': '58.253.155.189:9999'},
{
'HTTPS': '120.84.101.75:9999'},
{
'HTTPS': '163.204.241.125:9999'},
{
'HTTPS': '175.155.137.30:1133'},
{
'HTTPS': '58.253.158.156:9999'},
{
'HTTPS': '58.253.156.8:9999'},
{
'HTTPS': '112.85.164.168:9999'},
{
'HTTPS': '120.83.109.113:9999'},
{
'HTTPS': '1.198.73.43:9999'},
{
'HTTPS': '163.204.242.153:9999'},
{
'HTTPS': '1.197.204.143:9999'},
{
'HTTPS': '117.91.130.15:9999'},
{
'HTTPS': '171.11.179.158:9999'},]
# Set up cookie Information , Here, if the reader has a lot of cookie, You can set IP Set up random cookie
cookie = " Readers find it on the website cookie"
# Set random UA
ua = UserAgent()
UA = ua.random
# Set request header , Here the reader can put the complete request header (headers) Fill in 
header = {

'User-Agent': ua.random ,
'Cookie': cookie,
}
# Visit the website 
url = "https://www.baidu.com/"
# Visit the website 
tag = True # The sign of the loop 
while tag:
IP = random.choices(ip_list)[0]
response = requests.get( url , headers = header , proxies= IP )
if response.status_code == 200: # The feedback is 200 Flag request succeeded 
print(response.text)
tag = False
else:
ip_list.remove(IP)
tag = True

Python3 Carry out the fourth step of anti reverse climbing ---- Use selenium Simulate the browser to crawl :
This module is more complex , Bloggers are going to write a separate blog post for this module ;

Python3 Carry out the fifth step of anti reverse climbing ---- Use scrapy Frame for crawling :
This module is more difficult , If the reader needs , Bloggers will write a separate blog post to explain in detail how to use scrapy Framework to do crawler work .

The code ends here , If readers don't understand something, they can post a private letter to the blogger or comment below , The blogger will reply at the first time , I am an active smiling face .

版权声明
本文为[Active smiling face]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/11/20211109005853877E.html

  1. Sorting out the learning route for Python beginners
  2. The 6-line Python code uses the pdf2docx module converter object to convert PDF into docx file
  3. Batch compression of picture files using Python
  4. Using Python to write djikstra algorithm for robot path planning
  5. python实现手机号获取短信验证码 | 对接打码平台
  6. Detailed explanation of Euler Rodriguez code in Python
  7. Prove that angular displacement is not a vector with Python
  8. Using Python program to deeply understand homogeneous transfer matrix t
  9. Triple product formula of vector and its proof (with Python code)
  10. Derivation of differential formula of DH matrix using Python
  11. Python openpyxl operation on Excel (get the total number of columns, get the value of a row, get the value of a column, and set the cell value)
  12. Realizing Excel data filtering and data matching with Python
  13. Python reads and writes files
  14. Four scenarios of processing Excel files with Python
  15. Python converts dictionary to excel
  16. Python implements file reading and writing
  17. Basic Python syntax -- functions
  18. Python learning thinking
  19. Python basic syntax -- Boolean operation, break, continue, pass
  20. Python basic syntax -- loop
  21. Basic Python syntax -- lists, dictionaries
  22. Python basic syntax -- conditional judgment, input ()
  23. Python first experience - efficient office, data analysis, crawler
  24. Modulenotfounderror: no module named 'Django summernote details
  25. Key points for Django to make personal blog website
  26. Path setting of templates in Django settings
  27. Leetcode 1611. Minimum one bit operations to make integers Zero (Python)
  28. Directory C: \ users \ a \ desktop \ Django_ The blog master is registered as the GIT root, but no git repository details are found there
  29. Django. Core. Exceptions. Improveconfigured: application labels aren't unique, duplicates: admin
  30. How to verify that Django has created the project details correctly
  31. How to create a database when using Django to construct a website
  32. The solution of using Django framework to create project in Windows system
  33. Running Python virtual environment on win10 system to execute ll_ Env \ scripts \ activate: unable to load file elaboration scheme
  34. Detailed explanation of constructing virtual environment with Django in Python 3
  35. Python implementation of affine cipher
  36. RC4 Python implementation
  37. Simple: Python_ Automatic body temperature clock
  38. 用python把两个csv中的日期的列提出年,做出新的一列,再把两个csv表格按照新做出的日期这列和ID号合并为一个表。
  39. python中类实例化后,其对象无法被其他模块调用方法
  40. [JSON] - Python creates JSON file format
  41. Utilisez Python pour proposer l'année de la colonne de date dans les deux CSV, faire une nouvelle colonne, puis combiner les deux tableaux CSV en un seul tableau avec la colonne de date et le numéro d'identification.
  42. 关于#python#的问题,请各位专家解答!
  43. ***
  44. ***
  45. 關於#python#的問題,請各比特專家解答!
  46. S'il vous plaît répondre aux questions de Python!
  47. About the import of Python class
  48. Magic Python property decorator: 1 line of code makes Python methods become properties in seconds
  49. Python 音频调整音量(附代码) | Python工具
  50. Python programming ideas [series of articles]
  51. Python crawler programming idea (67): modify nodes using pyquery
  52. Python crawler programming idea (66): using pyquery to obtain node information
  53. Python crawler programming idea (65): find nodes using pyquery
  54. Python crawler programming idea (64): using CSS selectors in pyquery
  55. Python crawler programming idea (63): basic knowledge of pyquery
  56. Python crawler programming idea (62): project practice: capturing cool dog online red song list
  57. Python crawler programming idea (61): project practice: capturing rental information
  58. Python crawler programming idea (60): get CSS selector code through browser
  59. Python crawler programming idea (58): nested selection nodes with beautiful soup CSS selectors
  60. Python crawler programming idea (56): find method of beautiful soup method selector