You're going to learn Python on the sly, and then you'll be stunned (day 11)

Python a meow 2020-11-11 17:54:47
going learn python sly ll


 

The title is not intended to offend , I think this advertisement is very interesting

List of articles

  • Preface
  • Welcome to our circle
  • Cookie bypassing login verification :cookies and session
  • Statement
  • What is? cookies? What is? session?
  • How to achieve “ Remember my login status ” The function of
  • post request
  • The first step in practice
  • The front loop turns
  • Put the cookies in the cookie box
  • Automation :selenium
  • Let's show it first
  • Code display
  • Environment configuration
  • selenium Simple explanation
  • Set up the browser engine
  • selenium What can I do? ?
  • Why? selenium So capable ?

 

Preface

Previous review : You have to learn Python( Tenth days )

What about the last one , The last one wasn't very good , I know for myself . therefore , I have prepared a lot of interesting things for this article ( A bad laugh ), Hey , Come and do it with me .

I can , You can, too !!!

 

Insert a push :( If it's Xiaobai , Take a look at the following paragraph )

Welcome to our circle

I built a Python Learn how to answer questions , Interested friends can get to know : If you have difficulties in learning , Looking for one python Learning communication environment , Can join us python circle , Skirt number 947618024, Can claim python Learning materials , It will save a lot of time , Reduce a lot of problems .


 This series of articles default that you have certain C or C++ Basics , Because I learned a little C++ After the fur of Python.
 This series of articles default you will Baidu , Study ‘ modular ’ The words of this module , Or suggest you have your own editor and compiler , The last article has already made a recommendation for you ?
 so what , The catalogue of this series , To be honest, I prefer those two books Primer Plus, So follow their directory structure .
 This series will also focus on developing your hands-on skills , After all, I can't tell you all the knowledge , So the ability to solve their own needs is particularly important , So I buried holes in the article, please don't regard them as pits , That's the exercise I left you , Please show your powers , Take care of yourself .
1234567

Cookie bypassing login verification :cookies and session

Statement

You see the title , Excited No ? Don't , Can we steal the number today ? Hey , Get your black cap ready .
Hello, hello. , Wake up , Wake up , All the harrass are coming out . We are good citizens who abide by the law , How to do this ?

I'll only teach you , How to click in someone else “ Remember the account password ” Under the circumstances , You give it bypass login authentication . As for how you're going to get this condition , That has nothing to do with me , It is hereby declared that ha ha .

I read the article I sent two days ago “ Climb your own photos ” My blog friends don't know if they still have an impression of this process , Are there any doubts , Such a troublesome operation , It embodies human intervention everywhere , What happened to the machine ? You don't log in , You don't keep , You don't go to the website , How to get cookies Well .

A little friend who can ask this question ( There is really ) ah , I can only say you have a good head , But don't stray , These are your questions , There are technical means to solve these problems , But we let the crawler log in to their own account , Can't do a lot of things ? The tools are in your hands .


What is? cookies? What is? session?

cookie:  In the website ,http Request is stateless , in other words , Even after the first connection to the server and after successful login , The second request server still doesn't know which user the current request is .cookie To solve this problem : When the browser visits the website , These sites store a set of data on the client side , When the user sends a second request , Will automatically store the last request cookie The data is automatically carried to the server , The server can identify the current user through the data carried by the browser .

Generally, there are some local data in web pages , It is used to verify the next visit , Commonly used for login verification , Remember the State

session: Session Is stored on the server side similar to HashTable The structure of the user data , When the browser first sends a request , The server automatically generates a HashTable And a Session ID Used to uniquely identify this HashTable, And send it to the browser in response . When the browser sends the request a second time , In the previous server response Session ID Put it in the request and send it to the server , Server extracts from request Session ID, And all Session ID Contrast , Find the corresponding HashTable.

Similar to the client local cookie,session For the server ’cookie’, Can achieve the same function , You can also log in with interactive verification , Remember the State

How to achieve “ Remember my login status ” The function of

So we can know , If you will Session ID adopt Cookie When sending to the client, set the valid time to 1 year , So in the next year , When clients visit my website, they will return this Session ID Value sent to server , Server according to this Session ID Restore storage from memory or database Key-Value Right HashTable.

however , On the server Session It doesn't actually save . After a certain time , On the server Session Will be destroyed , To reduce the server access pressure . When the data on the server is destroyed , Even if the client has cookie There's no way “ Remember my login status ” 了 .

therefore , This method is only a short-term verification cookie Skip login verification access , The local cookie The failure time is mainly related to the server session The setting time is related to .


post request

What is? post request ? If you haven't heard of post request , So think about get Ask for it. .

Actually ,post and get Can all be asked with parameters , however get The parameters of the request will be in url It shows that .

but post The requested parameters will not be displayed directly , It's hiding . Private information like account and password , You should use post Request .

Usually ,get The request will be applied to get web data , For example, we learned before requests.get().post The request is used to submit data to a web page , For example, submit form type data ( For example, the account password is the data of the web form ).


The first step in practice

open CSDN Login page for , Fill in your personal information :https://passport.csdn.net/login?code=public

 

The tick that should be checked , The right choice , Then click log in .

 

Guess which bag it is , Be smart , You see, after you log in successfully , The right is still loading packages , Then it can be determined that the login package must be in front of .
After you click login , As soon as the signal is transmitted , The first step must be to log in , So take a look at the first few bags , I saw that at a glance “doLogin” Well , It opens at .

 


You see ,post、

 

What's inside , See a bunch of them set-cookies Did you? ? Nothing else , I'll just mention it. Ha ha ha ha .

 

Na , I drew it for you .
I'll mention that above , Actually, I want to say , open , Different websites , You may find your biscuit in some small corner .

Actually, it's not just cookies , Account and password have :

 

Let's try another way to log in , Log in with reference .

import requests
# introduce requests.
url = 'https://www.csdn.net/'
# Assign the URL you want to log in to url.
headers = {
'origin':'https://passport.csdn.net',
# Source of the request , In this case, we don't need to add this parameter , Just to demonstrate
'referer':'https://passport.csdn.net/login',
'User-Agent':' Omit '
}
# Add request header , As mentioned above, adding the request header is to simulate the normal access of the browser , Avoid being anti crawler .
data = {
"loginType": "1",
"pwdOrVerifyCode": " password ",
"userIdentification":" account number "
}
# Encapsulate the login parameters into a dictionary , Assign a value to data.
login_in = requests.post(url,headers=headers,data=data)
print(login_in)
123456789101112131415161718192021

well , Return value 403, Hasty ..
All right , All right .


The front loop turns

Oh , I tried again and again , Finally, I successfully logged in :

import requests
from bs4 import BeautifulSoup
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36',
'Connection': 'keep-alive',
'accept': 'application/json, text/plain, */*',
#'Cookie': cookie,
'referer': “ My blog ” Page address
}
url = The upper one referer
data = {
"loginType": "1",
"pwdOrVerifyCode": Yours ,
"userIdentification": Yours
}
# Encapsulate the login parameters into a dictionary , Assign a value to data.
login_in = requests.post(url,headers=header,data=data)
print(login_in)
1234567891011121314151617181920212223

Excellent , The return value this time is 200 了 .

What's next ? Next, find a blog to comment on it, and it's over .

cookies = login_in.cookies
# extract cookies Methods : call requests object (login_in) Of cookies Property to get the login cookies, And assign to variable cookies.
url_1 = Find... Yourself
# The URL of the article we want to comment on .
data_1 = {
'content': 'test',
'articleId': Fill in by yourself
}
# Encapsulate the parameters of comments into a dictionary .
comment = requests.post(url_1,headers=header,data=data_1,cookies=cookies)
# use requests.post Initiate a request for comment , Put in the parameters : Article website 、headers、 Comment parameters 、cookies Parameters , Assign a value to comment.
# call cookies The way to do it is in post In the request cookies=cookies Parameters of .
print(comment.status_code)
# Print out comment The status code , If the status code equals 200, It proves that our comments are successful .
123456789101112131415

The status code shall prevail , Sometimes it takes a day to slow down before you can see .

If you can't comment after waiting for a day , Don't worry , I told you , It should have been cut by the backstage .
Don't worry , We'll have a better way later .

Put the cookies in the cookie box

Forget it , To look intuitive , I'd better extract the code from the previous student ID card .

import requests
from bs4 import BeautifulSoup
cookie = '''* Paste here from chrome A copy of cookie Information *'''
header = {
'User-Agent': ' Put your own ',
'Connection': 'keep-alive',
'accept': ' Put your own ',
'Cookie': cookie,
'referer': ' Put your own blog home address '
}
url = 'https://blog.csdn.net/python_miao?spm=1010.2135.3001.5113' # csdn In the personal Center , Loading the name of js Address
seesion = requests.session()
response = seesion.get(url,headers=header)
print(type(session.cookies))
# Print cookies The type of ,session.cookies It's the login cookies
123456789101112131415

Excellent , It turns out that :<class ‘requests.cookies.RequestsCookieJar’>

I'm afraid it can't be stored in the text , Who's going to have a try .

But take a closer look , This cookies Does it look like a dictionary string

 

Do it yourself , I just want to say : In fact, you can try it without changing the string , No, no longer later .


Yes, of course , There are other ways to get cookies, But my method is the most direct .


Automation :selenium

Now the website , It's not stupid , Which login does not need your verification code ? Very few .
Then you have to manually input the verification code , Of course , Some people say machine learning , Crack the captcha , Good idea , Try it .

There are also websites , I think you have met it , chickens , Intricate , Climb a ball, climb .

Not to mention those URL It's encrypted , Or the website that forbids the crawler directly .

good , Now let's take a look at this new technology that we're going to be exposed to :selenium How many obstacles can help us overcome .

Let's show it first

Let me give you a rough picture of , Open the browser , Open a blog , Then turn off , As for other high-end operations , We'll show you in code later :

 

Code display


# Local Chrome Browser settings
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://mp.toutiao.com/profile_v4/graphic/articles')
time.sleep(2)
driver.get('https://www.toutiao.com/i6887003700720566795/')
time.sleep(2)
driver.close()
1234567891011121314

Code for you , You can try it , Most of them don't work , Because most of you don't have a configuration environment .

Environment configuration

good , No need to worry without environment , Everything will come as planned .

First , You need a Google browser , Always saying , I don't think you have downloaded it yet .
secondly , You need to look at the version of your Google browser , This is very important , Because one generation version corresponds to one generation driver , If it doesn't match, the problem will be a bit troublesome .
Next , Let's download a driver :http://npm.taobao.org/mirrors/chromedriver/
Choose your own version .

After downloading , decompression , Put this driver in Python Under the installed peer Directory , If you don't know which , So how many of them have been downloaded Python Install the suspected directory on how many .

good , And on again pycharm, Run the previous code .

Oh , by the way , You have to download a selenium My bag , It's a little big .


I'm not going to talk about too many operations today , Just start , The number of words is more than 8000 , Save all the fun for the next one .

Now let's talk about the above lines of code , Make a good start , That's true. , Maybe there will be a little partner, and then I will check it by myself .


selenium Simple explanation

Set up the browser engine

# First step , The import module , But tell me more
from selenium import webdriver
import time
driver = webdriver.Chrome() # Get control of Google browser , If there is no driver here, it will report an error directly
driver.get('') # Command Google browser : well , Sample , Open this page for me
time.sleep(2) # Mainly because the browser is a little slow , Still, the network is a little slower , There's a delay anyway , You wait two seconds .
driver.get('https://lion-wu.blog.csdn.net/article/details/109244401') # Open another one
time.sleep(2) # ditto
driver.close() # All right , Play here , Shut down
123456789101112

selenium What can I do? ?

Let me just say that , The above paragraph , hold Chrome Browser set to engine , And then assign it to the variable driver.driver Is an instantiated browser , You'll always see it in the back , That's understandable , Because we want to control the instantiated browser to do something for us .

Do you know .

Why? selenium So capable ?

selenium It can simplify the problems we have encountered before , Crawling dynamic pages is as easy as crawling static pages .

We used it directly at first BeautifulSoup The kind of web page that you can deal with , It's a static web page . We use BeautifulSoup Go to this type of web page , Because the web page source code contains all the information of the web page , therefore , Page address bar URL It's the source code of the web page URL.

later , We're starting to get into more complex web pages , If I remember correctly , We're grabbing from CSDN Let's start with a review of , At that time, we began to come into contact with json.
And the back QQ music , The data to crawl is not in HTML In the source code , But in json in , You can't use the URL bar directly URL 了 , And you need to find json The reality of the data URL. This is a dynamic web page .

No matter where the data exists , Browsers are always making all kinds of requests to the server , When these requests are complete , They're going to make up a developer tool together Elements As shown in , Rendering the completed web source code .

In the face of complex page interaction or URL When the encryption logic is complex ,selenium That's where it comes in , It can actually open a browser , Wait for all the data to be loaded into Elements In the following , Then take this page as a static web page to crawl .

So many advantages , Use selenium when , Of course, there are also some shortcomings .
Because you want to actually run your local browser , It will take some time to open the browser and wait for the web rendering to finish ,selenium Inevitably, more speed and resources are sacrificed , however , At least not slower than people . So it's up to you to wait , Young people , I'd rather stop for three minutes , Don't grab a second .


That's it , Leave some suspense .

 

One more sentence at the end , Want to learn Python Please contact Xiaobian , Here's my own set python Learning materials and routes , Anyone who wants this information can enter q skirt 947618024 receive .

The material of this article comes from the Internet , If there is infringement, please contact to delete .

版权声明
本文为[Python a meow]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database