picture

Today, I'll introduce how to turn the historical articles of your favorite official account into a love letter.  PDF   Save to local . A few days ago, a friend asked again , Can you download some articles from a certain official account? , Because he likes this article very much , However, the historical articles can not be sorted because they are viewed on wechat , Some earlier articles took a long time to find , And often do not read a few articles at a time , We'll have to do it again next time , It's painful to think about it .

The idea of grasping

Now I've looked online , There are three ways to realize it :

  1. Connect to computer through mobile phone , utilize Fiddler Capture packet to get request and return message , Then the batch download is realized by message simulation request .
  2. Through Sogou browser or wechatsogou This Python modular , After searching the public number , Realize batch download .
  3. Through official account platform , This requires you to log on to the official account platform. , The rest is relatively simple .

On the whole, the last way is the simplest , Next, take the third way as an example , For you to introduce how to achieve the purpose of batch download .

obtain Cookie

First, we will log onto the official account platform. , After landing, it will jump to the official account. , Here's the picture :

 picture

Then we open the browser developer tool on the current page , Refresh next page , You can see all kinds of requests in the network , Let's make a request here url, Then you can see the network request information in the figure below , It contains the requested Cookie Information .

 picture

Next we need to put Cookie Information is copied and transformed into Json Save the format string to a text file , For later link request . I need to write a paragraph here Python Code to process , New file gen_cookies.py Write the code as follows :

# gen_cookies.py
import json
# Copied from browser Cookie character string cookie_str = "pgv_pvid=9551991123; pac_uid=89sdjfklas; XWINDEXGREY=0; pgv_pvi=89273492834; tvfe_boss_uuid=lkjslkdf090; RK=lksdf900; ptcz=kjalsjdflkjklsjfdkljslkfdjljsdfk; ua_id=ioje9899fsndfklsdf-DKiowiekfjhsd0Dw=; h_uid=lkdlsodifsdf; mm_lang=zh_CN; ts_uid=0938450938405; mobileUV=98394jsdfjsd8sdf; \…… Middle part omitted \ EXIV96Zg=sNOaZlBxE37T1tqbsOL/qzHBtiHUNZSxr6TMqpb8Z9k="
cookie = {}# Traverse cookie Information for cookies in cookie_str.split("; "):    cookie_item = cookies.split("=")    cookie[cookie_item[0]] = cookie_item[1]# take cookies Write to local file with open('cookie.txt', "w") as file:    #   write file    file.write(json.dumps(cookie))

Okay , take Cookie After writing to file , Next, let's talk about where we can find the article link of a company .

Get links to articles

On the homepage of public number management platform, click the material management menu on the left , Enter the material management page , Then click the new graphic material button on the right , Here's the picture :

 picture

Enter the new graphic material page , Then click on the hyperlink here :

 picture

In the edit hyperlink pop-up box , Click to select other official account connections. :

 picture

Here we can go through the search , Input keyword search we want to find official account. , For example, we search here "Python technology ", You can see the following search results :

 picture

Then click on the first Python Official account of Technology , Here we can see all the articles published in the history of the official account. :

 picture

We see that there are only five articles on each page , It's divided up 31 page , Now let's open our own developer tool , Then click the next button below the list , In the network, you will see a request sent to the service , Let's analyze the parameters of this request .

 picture

By request parameters , We can probably analyze the meaning of the parameters , begin Which article does it start with ,count How many at a time ,fakeId The uniqueness of the corresponding public number Id,token It's through Cookie Information . Okay , If we know that, we can use it Python Write a piece of code to traverse the request , New file gzh_download.py, The code is as follows :

# gzh_download.py# Introduce modules import requestsimport jsonimport reimport randomimport timeimport pdfkit
# open cookie.txtwith open("cookie.txt", "r") as file:    cookie = file.read()cookies = json.loads(cookie)url = "https://mp.weixin.qq.com"# Request public number platform response = requests.get(url, cookies=cookies)# from url In order to get tokentoken = re.findall(r'token=(\d+)', str(response.url))[0]# Set request access header information headers = {    "Referer": "https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=10&token=" + token + "&lang=zh_CN",    "Host": "mp.weixin.qq.com",    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",}
# Before loop traversal 10 Page articles for j in range(1, 10, 1):    begin = (j-1)*5    # Request the current page to get the list of articles    requestUrl = "https://mp.weixin.qq.com/cgi-bin/appmsg?action=list_ex&begin="+str(begin)+"&count=5&fakeid=MzU1NDk2MzQyNg==&type=9&query=&token=" + token + "&lang=zh_CN&f=json&ajax=1"    search_response = requests.get(requestUrl, cookies=cookies, headers=headers)    # Get the return list Json Information    re_text = search_response.json()    list = re_text.get("app_msg_list")    # Traverses the list of articles on the current page    for i in list:        # Convert article links pdf Download to current directory        pdfkit.from_url(i["link"], i["title"] + ".pdf")    # Too fast request may be asked by wechat , Here we go 10 Seconds to wait    time.sleep(10)

Okay , The above code is enough , Here will be URL Turn into PDF Is used in pdfkit Module , To use this, you need to install it first wkhtmltopdf This tool , The official website address is given at the end of the article , Support for multiple operating systems , Download and install by yourself , No more details here .

 picture

After the installation , It needs to be implemented again pip3 install pdfkit Command to install this module . installed , Now let's do it python gzh_download.py Command to start the program to see how the effect .

 picture  picture  picture

It seems to be a success , This tool is still very powerful .

summary

This article introduces how to analyze the function of official account platform. , Find links to all articles that can be accessed to a public official account. , So that you can download all the articles of official account in batches. , And turn to PDF The purpose of saving the format locally . Through here Python Write a small amount of code to achieve the capture and conversion of the article , You can try it if you are interested .

Reference resources

https://wkhtmltopdf.org/downloads.html