Today, I'll introduce how to turn the historical articles of your favorite official account into a love letter. PDF Save to local . A few days ago, a friend asked again , Can you download some articles from a certain official account? , Because he likes this article very much , However, the historical articles can not be sorted because they are viewed on wechat , Some earlier articles took a long time to find , And often do not read a few articles at a time , We'll have to do it again next time , It's painful to think about it .
The idea of grasping
Now I've looked online , There are three ways to realize it :
- Connect to computer through mobile phone , utilize Fiddler Capture packet to get request and return message , Then the batch download is realized by message simulation request .
- Through Sogou browser or
wechatsogou
This Python modular , After searching the public number , Realize batch download . - Through official account platform , This requires you to log on to the official account platform. , The rest is relatively simple .
On the whole, the last way is the simplest , Next, take the third way as an example , For you to introduce how to achieve the purpose of batch download .
obtain Cookie
First, we will log onto the official account platform. , After landing, it will jump to the official account. , Here's the picture :
Then we open the browser developer tool on the current page , Refresh next page , You can see all kinds of requests in the network , Let's make a request here url, Then you can see the network request information in the figure below , It contains the requested Cookie Information .
Next we need to put Cookie Information is copied and transformed into Json Save the format string to a text file , For later link request . I need to write a paragraph here Python Code to process , New file gen_cookies.py
Write the code as follows :
# gen_cookies.py
import json
# Copied from browser Cookie character string
cookie_str = "pgv_pvid=9551991123; pac_uid=89sdjfklas; XWINDEXGREY=0; pgv_pvi=89273492834; tvfe_boss_uuid=lkjslkdf090; RK=lksdf900; ptcz=kjalsjdflkjklsjfdkljslkfdjljsdfk; ua_id=ioje9899fsndfklsdf-DKiowiekfjhsd0Dw=; h_uid=lkdlsodifsdf; mm_lang=zh_CN; ts_uid=0938450938405; mobileUV=98394jsdfjsd8sdf; \
…… Middle part omitted \
EXIV96Zg=sNOaZlBxE37T1tqbsOL/qzHBtiHUNZSxr6TMqpb8Z9k="
cookie = {}
# Traverse cookie Information
for cookies in cookie_str.split("; "):
cookie_item = cookies.split("=")
cookie[cookie_item[0]] = cookie_item[1]
# take cookies Write to local file
with open('cookie.txt', "w") as file:
# write file
file.write(json.dumps(cookie))
Okay , take Cookie After writing to file , Next, let's talk about where we can find the article link of a company .
Get links to articles
On the homepage of public number management platform, click the material management menu on the left , Enter the material management page , Then click the new graphic material button on the right , Here's the picture :
Enter the new graphic material page , Then click on the hyperlink here :

In the edit hyperlink pop-up box , Click to select other official account connections. :
Here we can go through the search , Input keyword search we want to find official account. , For example, we search here "Python technology ", You can see the following search results :
Then click on the first Python Official account of Technology , Here we can see all the articles published in the history of the official account. :
We see that there are only five articles on each page , It's divided up 31 page , Now let's open our own developer tool , Then click the next button below the list , In the network, you will see a request sent to the service , Let's analyze the parameters of this request .
By request parameters , We can probably analyze the meaning of the parameters , begin
Which article does it start with ,count
How many at a time ,fakeId
The uniqueness of the corresponding public number Id,token
It's through Cookie Information . Okay , If we know that, we can use it Python Write a piece of code to traverse the request , New file gzh_download.py
, The code is as follows :
# gzh_download.py
# Introduce modules
import requests
import json
import re
import random
import time
import pdfkit
# open cookie.txt
with open("cookie.txt", "r") as file:
cookie = file.read()
cookies = json.loads(cookie)
url = "https://mp.weixin.qq.com"
# Request public number platform
response = requests.get(url, cookies=cookies)
# from url In order to get token
token = re.findall(r'token=(\d+)', str(response.url))[0]
# Set request access header information
headers = {
"Referer": "https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=10&token=" + token + "&lang=zh_CN",
"Host": "mp.weixin.qq.com",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
}
# Before loop traversal 10 Page articles
for j in range(1, 10, 1):
begin = (j-1)*5
# Request the current page to get the list of articles
requestUrl = "https://mp.weixin.qq.com/cgi-bin/appmsg?action=list_ex&begin="+str(begin)+"&count=5&fakeid=MzU1NDk2MzQyNg==&type=9&query=&token=" + token + "&lang=zh_CN&f=json&ajax=1"
search_response = requests.get(requestUrl, cookies=cookies, headers=headers)
# Get the return list Json Information
re_text = search_response.json()
list = re_text.get("app_msg_list")
# Traverses the list of articles on the current page
for i in list:
# Convert article links pdf Download to current directory
pdfkit.from_url(i["link"], i["title"] + ".pdf")
# Too fast request may be asked by wechat , Here we go 10 Seconds to wait
time.sleep(10)
Okay , The above code is enough , Here will be URL Turn into PDF Is used in pdfkit
Module , To use this, you need to install it first wkhtmltopdf
This tool , The official website address is given at the end of the article , Support for multiple operating systems , Download and install by yourself , No more details here .
After the installation , It needs to be implemented again pip3 install pdfkit
Command to install this module . installed , Now let's do it python gzh_download.py
Command to start the program to see how the effect .
It seems to be a success , This tool is still very powerful .
summary
This article introduces how to analyze the function of official account platform. , Find links to all articles that can be accessed to a public official account. , So that you can download all the articles of official account in batches. , And turn to PDF The purpose of saving the format locally . Through here Python Write a small amount of code to achieve the capture and conversion of the article , You can try it if you are interested .
Reference resources
https://wkhtmltopdf.org/downloads.html