You're going to learn Python on the sly, and then you'll be stunned (day 11)

Python a meow 2020-11-11 17:54:47
going learn python sly ll


The title is not intended to offend , I think this advertisement is very interesting

List of articles

  • Preface
  • Welcome to our circle
  • Cookie bypassing login verification :cookies and session
  • Statement
  • What is? cookies? What is? session?
  • How to achieve “ Remember my login status ” The function of
  • post request
  • The first step in practice
  • The front loop turns
  • Put the cookies in the cookie box
  • Automation :selenium
  • Let's show it first
  • Code display
  • Environment configuration
  • selenium Simple explanation
  • Set up the browser engine
  • selenium What can I do? ?
  • Why? selenium So capable ?



Previous review : You have to learn Python( Tenth days )

What about the last one , The last one wasn't very good , I know for myself . therefore , I have prepared a lot of interesting things for this article ( A bad laugh ), Hey , Come and do it with me .

I can , You can, too !!!


Insert a push :( If it's Xiaobai , Take a look at the following paragraph )

 This series of articles default that you have certain C or C++ Basics , Because I learned a little C++ After the fur of Python.
 This series of articles default you will Baidu , Study ‘ modular ’ The words of this module , Or suggest you have your own editor and compiler , The last article has already made a recommendation for you ?
 so what , The catalogue of this series , To be honest, I prefer those two books Primer Plus, So follow their directory structure .
 This series will also focus on developing your hands-on skills , After all, I can't tell you all the knowledge , So the ability to solve their own needs is particularly important , So I buried holes in the article, please don't regard them as pits , That's the exercise I left you , Please show your powers , Take care of yourself .

Cookie bypassing login verification :cookies and session


You see the title , Excited No ? Don't , Can we steal the number today ? Hey , Get your black cap ready .
Hello, hello. , Wake up , Wake up , All the harrass are coming out . We are good citizens who abide by the law , How to do this ?

I'll only teach you , How to click in someone else “ Remember the account password ” Under the circumstances , You give it bypass login authentication . As for how you're going to get this condition , That has nothing to do with me , It is hereby declared that ha ha .

I read the article I sent two days ago “ Climb your own photos ” My blog friends don't know if they still have an impression of this process , Are there any doubts , Such a troublesome operation , It embodies human intervention everywhere , What happened to the machine ? You don't log in , You don't keep , You don't go to the website , How to get cookies Well .

A little friend who can ask this question ( There is really ) ah , I can only say you have a good head , But don't stray , These are your questions , There are technical means to solve these problems , But we let the crawler log in to their own account , Can't do a lot of things ? The tools are in your hands .

What is? cookies? What is? session?

cookie:  In the website ,http Request is stateless , in other words , Even after the first connection to the server and after successful login , The second request server still doesn't know which user the current request is .cookie To solve this problem : When the browser visits the website , These sites store a set of data on the client side , When the user sends a second request , Will automatically store the last request cookie The data is automatically carried to the server , The server can identify the current user through the data carried by the browser .

Generally, there are some local data in web pages , It is used to verify the next visit , Commonly used for login verification , Remember the State

session: Session Is stored on the server side similar to HashTable The structure of the user data , When the browser first sends a request , The server automatically generates a HashTable And a Session ID Used to uniquely identify this HashTable, And send it to the browser in response . When the browser sends the request a second time , In the previous server response Session ID Put it in the request and send it to the server , Server extracts from request Session ID, And all Session ID Contrast , Find the corresponding HashTable.

Similar to the client local cookie,session For the server ’cookie’, Can achieve the same function , You can also log in with interactive verification , Remember the State

How to achieve “ Remember my login status ” The function of

So we can know , If you will Session ID adopt Cookie When sending to the client, set the valid time to 1 year , So in the next year , When clients visit my website, they will return this Session ID Value sent to server , Server according to this Session ID Restore storage from memory or database Key-Value Right HashTable.

however , On the server Session It doesn't actually save . After a certain time , On the server Session Will be destroyed , To reduce the server access pressure . When the data on the server is destroyed , Even if the client has cookie There's no way “ Remember my login status ” 了 .

therefore , This method is only a short-term verification cookie Skip login verification access , The local cookie The failure time is mainly related to the server session The setting time is related to .

post request

What is? post request ? If you haven't heard of post request , So think about get Ask for it. .

Actually ,post and get Can all be asked with parameters , however get The parameters of the request will be in url It shows that .

but post The requested parameters will not be displayed directly , It's hiding . Private information like account and password , You should use post Request .

Usually ,get The request will be applied to get web data , For example, we learned before requests.get().post The request is used to submit data to a web page , For example, submit form type data ( For example, the account password is the data of the web form ).

The first step in practice

open CSDN Login page for , Fill in your personal information :


The tick that should be checked , The right choice , Then click log in .


Guess which bag it is , Be smart , You see, after you log in successfully , The right is still loading packages , Then it can be determined that the login package must be in front of .
After you click login , As soon as the signal is transmitted , The first step must be to log in , So take a look at the first few bags , I saw that at a glance “doLogin” Well , It opens at .


You see ,post、


What's inside , See a bunch of them set-cookies Did you? ? Nothing else , I'll just mention it. Ha ha ha ha .


Na , I drew it for you .
I'll mention that above , Actually, I want to say , open , Different websites , You may find your biscuit in some small corner .

Actually, it's not just cookies , Account and password have :


Let's try another way to log in , Log in with reference .

import requests
# introduce requests.
url = ''
# Assign the URL you want to log in to url.
headers = {
# Source of the request , In this case, we don't need to add this parameter , Just to demonstrate
'User-Agent':' Omit '
# Add request header , As mentioned above, adding the request header is to simulate the normal access of the browser , Avoid being anti crawler .
data = {
"loginType": "1",
"pwdOrVerifyCode": " password ",
"userIdentification":" account number "
# Encapsulate the login parameters into a dictionary , Assign a value to data.
login_in =,headers=headers,data=data)

well , Return value 403, Hasty ..
All right , All right .

The front loop turns

Oh , I tried again and again , Finally, I successfully logged in :

import requests
from bs4 import BeautifulSoup
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36',
'Connection': 'keep-alive',
'accept': 'application/json, text/plain, */*',
#'Cookie': cookie,
'referer': “ My blog ” Page address
url = The upper one referer
data = {
"loginType": "1",
"pwdOrVerifyCode": Yours ,
"userIdentification": Yours
# Encapsulate the login parameters into a dictionary , Assign a value to data.
login_in =,headers=header,data=data)

Excellent , The return value this time is 200 了 .

What's next ? Next, find a blog to comment on it, and it's over .

cookies = login_in.cookies
# extract cookies Methods : call requests object (login_in) Of cookies Property to get the login cookies, And assign to variable cookies.
url_1 = Find... Yourself
# The URL of the article we want to comment on .
data_1 = {
'content': 'test',
'articleId': Fill in by yourself
# Encapsulate the parameters of comments into a dictionary .
comment =,headers=header,data=data_1,cookies=cookies)
# use Initiate a request for comment , Put in the parameters : Article website 、headers、 Comment parameters 、cookies Parameters , Assign a value to comment.
# call cookies The way to do it is in post In the request cookies=cookies Parameters of .
# Print out comment The status code , If the status code equals 200, It proves that our comments are successful .

The status code shall prevail , Sometimes it takes a day to slow down before you can see .

If you can't comment after waiting for a day , Don't worry , I told you , It should have been cut by the backstage .
Don't worry , We'll have a better way later .

Put the cookies in the cookie box

Forget it , To look intuitive , I'd better extract the code from the previous student ID card .

import requests
from bs4 import BeautifulSoup
cookie = '''* Paste here from chrome A copy of cookie Information *'''
header = {
'User-Agent': ' Put your own ',
'Connection': 'keep-alive',
'accept': ' Put your own ',
'Cookie': cookie,
'referer': ' Put your own blog home address '
url = '' # csdn In the personal Center , Loading the name of js Address
seesion = requests.session()
response = seesion.get(url,headers=header)
# Print cookies The type of ,session.cookies It's the login cookies

Excellent , It turns out that :<class ‘requests.cookies.RequestsCookieJar’>

I'm afraid it can't be stored in the text , Who's going to have a try .

But take a closer look , This cookies Does it look like a dictionary string


Do it yourself , I just want to say : In fact, you can try it without changing the string , No, no longer later .

Yes, of course , There are other ways to get cookies, But my method is the most direct .

Automation :selenium

Now the website , It's not stupid , Which login does not need your verification code ? Very few .
Then you have to manually input the verification code , Of course , Some people say machine learning , Crack the captcha , Good idea , Try it .

There are also websites , I think you have met it , chickens , Intricate , Climb a ball, climb .

Not to mention those URL It's encrypted , Or the website that forbids the crawler directly .

good , Now let's take a look at this new technology that we're going to be exposed to :selenium How many obstacles can help us overcome .

Let's show it first

Let me give you a rough picture of , Open the browser , Open a blog , Then turn off , As for other high-end operations , We'll show you in code later :


Code display

# Local Chrome Browser settings
from selenium import webdriver
import time
driver = webdriver.Chrome()

Code for you , You can try it , Most of them don't work , Because most of you don't have a configuration environment .

Environment configuration

good , No need to worry without environment , Everything will come as planned .

First , You need a Google browser , Always saying , I don't think you have downloaded it yet .
secondly , You need to look at the version of your Google browser , This is very important , Because one generation version corresponds to one generation driver , If it doesn't match, the problem will be a bit troublesome .
Next , Let's download a driver :
Choose your own version .

After downloading , decompression , Put this driver in Python Under the installed peer Directory , If you don't know which , So how many of them have been downloaded Python Install the suspected directory on how many .

good , And on again pycharm, Run the previous code .

Oh , by the way , You have to download a selenium My bag , It's a little big .

I'm not going to talk about too many operations today , Just start , The number of words is more than 8000 , Save all the fun for the next one .

Now let's talk about the above lines of code , Make a good start , That's true. , Maybe there will be a little partner, and then I will check it by myself .

selenium Simple explanation

Set up the browser engine

# First step , The import module , But tell me more
from selenium import webdriver
import time
driver = webdriver.Chrome() # Get control of Google browser , If there is no driver here, it will report an error directly
driver.get('') # Command Google browser : well , Sample , Open this page for me
time.sleep(2) # Mainly because the browser is a little slow , Still, the network is a little slower , There's a delay anyway , You wait two seconds .
driver.get('') # Open another one
time.sleep(2) # ditto
driver.close() # All right , Play here , Shut down

selenium What can I do? ?

Let me just say that , The above paragraph , hold Chrome Browser set to engine , And then assign it to the variable driver.driver Is an instantiated browser , You'll always see it in the back , That's understandable , Because we want to control the instantiated browser to do something for us .

Do you know .

Why? selenium So capable ?

selenium It can simplify the problems we have encountered before , Crawling dynamic pages is as easy as crawling static pages .

We used it directly at first BeautifulSoup The kind of web page that you can deal with , It's a static web page . We use BeautifulSoup Go to this type of web page , Because the web page source code contains all the information of the web page , therefore , Page address bar URL It's the source code of the web page URL.

later , We're starting to get into more complex web pages , If I remember correctly , We're grabbing from CSDN Let's start with a review of , At that time, we began to come into contact with json.
And the back QQ music , The data to crawl is not in HTML In the source code , But in json in , You can't use the URL bar directly URL 了 , And you need to find json The reality of the data URL. This is a dynamic web page .

No matter where the data exists , Browsers are always making all kinds of requests to the server , When these requests are complete , They're going to make up a developer tool together Elements As shown in , Rendering the completed web source code .

In the face of complex page interaction or URL When the encryption logic is complex ,selenium That's where it comes in , It can actually open a browser , Wait for all the data to be loaded into Elements In the following , Then take this page as a static web page to crawl .

So many advantages , Use selenium when , Of course, there are also some shortcomings .
Because you want to actually run your local browser , It will take some time to open the browser and wait for the web rendering to finish ,selenium Inevitably, more speed and resources are sacrificed , however , At least not slower than people . So it's up to you to wait , Young people , I'd rather stop for three minutes , Don't grab a second .

That's it , Leave some suspense .


The material of this article comes from the Internet , If there is infringement, please contact to delete .

本文为[Python a meow]所创,转载请带上原文链接,感谢

