I used Python to capture the hot reviews of Wu's microblog events, chatting about technology and eating melons

Five packs of spicy strips! 2021-09-15 12:40:59
used python capture hot reviews


Hello everyone , I'm spicy .

I've written an article on crawling microblog reviews before , However, Wu's melon is too big recently , The whole network is paying attention to this matter , Until yesterday 【 police 】 Come out and explain , The incident came to an end , Today, I'd like to talk to you about capturing microblog reviews , By the way, learn the whole story of this incident .

Climb to the target

website : Microblogging

 Insert picture description here  

Effect display

 Insert picture description here

Tool use

development environment :win10、python3.7

development tool :pycharm、Chrome

tool kit :requests、re,csv

Analysis of project ideas

Find articles that need to eat melons

 Insert picture description here

The request header needs to carry the basic configuration data

headers = {
   "referer": "",
   "cookie":"",
   "user-agent": ""
}

Find the comment data dynamically submitted by the article

 Insert picture description here

Find the corresponding comment data information through the packet capture tool

 Insert picture description here

Microblogging url There will be an article id,mid It's also an article id, max_id Is each json In the data max_id, There are no rules

https://m.weibo.cn/comments/hotflow?id=4661850409272066&mid=4661850409272066&max_id=5640809315785878&max_id_type=0

 Insert picture description here

Take out the current max_id, You will get the request interface of the next page

Simple source code analysis

import csv
import re
import requests
import time
​
start_url = "https://m.weibo.cn/comments/hotflow?id=4661850409272066&mid=4661850409272066&max_id_type=0"
next_url = "https://m.weibo.cn/comments/hotflow?id=4638585665621278&mid=4661850409272066&max_id={}&max_id_type=0"
continue_url = start_url
headers = {
   "referer": "https://m.weibo.cn/detail/4638585665621278",
   "cookie": "SUB=_2A25Nq-BcDeRhGeBG7VUW-SnEyjyIHXVvV4AUrDV6PUJbkdAKLULFkW1NRhXYfC2JIAilAAFJ_-2diWZ1ZEACRZ5K; SCF=AgGUxHxg_ZjvVbYikCOVICTc-a4gDcEtR02fexDZstBq_XKr3s1Rp9CxdS4y4k4IvDQ2eIgTTyJg73pcUmvYRKc.; _T_WM=58609113785; WEIBOCN_FROM=1110006030; MLOGIN=1; M_WEIBOCN_PARAMS=oid%3D4638585665621278%26luicode%3D20000061%26lfid%3D4638585665621278%26uicode%3D20000061%26fid%3D4638585665621278; XSRF-TOKEN=06ed3f",
   "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
}
count = 0
​
def csv_data(fileheader):
   with open("wb1234.csv", "a", newline="")as f:
       write = csv.writer(f)
       write.writerow(fileheader)
​
​
def get_data(start_url):
   print(start_url)
   try:
       response = requests.get(start_url, headers=headers).json()
       max_id = response['data']['max_id']
   except Exception as e:
       get_data(start_url.split("type")[0] + "type=1")
​
   else:
       # max_id = response['data']['max_id']
       content_list = response.get("data").get('data')
       for item in content_list:
           global count
           count += 1
           create_time = item['created_at']
           text = "".join(re.findall('[\u4e00-\u9fa5]', item["text"]))
           user_id = item.get("user")["id"]
           user_name = item.get("user")["screen_name"]
           # print([count, create_time, user_id, user_name, text])
           csv_data([count, create_time, user_id, user_name, text])
​
       global next_url
       continue_url = next_url.format(max_id)
       time.sleep(2)
       get_data(continue_url)
​
​
if __name__ == "__main__":
   fileheader = ["id", " Comment on time ", " user id", "user_name", " Comment content "]
   csv_data(fileheader)
   get_data(start_url)

Previous push :


I use Python Changed ban Hua's boot password , After logging in again, I found her secret !

I use Python Collected the spatial data set of Banhua , In addition to meizhao, she found her other secret again !

Roommate single love class flower failed , I climbed a website and sent it to him for instant cure , Men's happiness is so simple 【 Once a day , Forget first love 】

版权声明
本文为[Five packs of spicy strips!]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/09/20210909161448783d.html

  1. Python - Programmation orientée objet - pratique (6)
  2. Python - Programmation orientée objet - réflexion hasattr, GetAttr, GetAttr, delattr
  3. Python - Programmation orientée objet - _Dict
  4. Python - pydantic (2) Modèle imbriqué
  5. Non-ASCII character ‘\xe5‘ in file kf1.py on line 4, but no encoding declared; see http://python.or
  6. python笔记(一)
  7. Non - ASCII character 'xe5' in file kf1.py on Line 4, but no Encoding declared;Voirhttp://python.or
  8. Notes Python (1)
  9. Talk about how JMeter executes Python scripts concurrently
  10. In Beijing, you can't see the moon in the Mid Autumn Festival. Draw a blood red moon in Python
  11. Un des pandas crée un objet
  12. Machine learning | unitary regression model Python practical case
  13. Draw a "Lollipop chart" with Excel and python
  14. Python uses scikit learn to calculate TF-IDF
  15. Getting started with Python Basics_ 3 conditional statements and iterative loops
  16. Python dynamic properties and features
  17. 云计算开发:Python内置函数-min()函数详解
  18. [Python skill] how to speed up loop operation and numpy array operation
  19. 雲計算開發:Python內置函數-min()函數詳解
  20. Développement de l'informatique en nuage: explication détaillée de la fonction intégrée python - min ()
  21. 从0起步学Python(附程序实例讲解)第1讲
  22. 从0起步学Python(附程序实例讲解)第1讲
  23. Apprendre Python à partir de 0 (avec des exemples de programme) leçon 1
  24. Apprendre Python à partir de 0 (avec des exemples de programme) leçon 1
  25. With Python, I'll take you to enjoy it for a month when the Mid Autumn Festival is coming
  26. You can't write interface software in Python! Which software on sale has no UI?
  27. Python国内外原题解析及源码1~15
  28. Python实现长篇英文自动纠错~
  29. Python implémente la correction automatique des erreurs en anglais long
  30. Analyse des problèmes originaux et code source de Python au pays et à l'étranger 1 ~ 15
  31. 一张思维导图学Python之表白
  32. Python教学中课程思政建设的研究探索2
  33. Recherche sur la construction idéologique et politique du Programme d'études dans l'enseignement Python 2
  34. Une présentation de la cartographie mentale Python
  35. Python高级用法总结(8)-函数式编程
  36. Python + Mirai development QQ robot starting tutorial (2021.9.9 test is valid)
  37. Python Advanced use Summary (8) - functional Programming
  38. How to get started with Python and share learning methods for free. All you want to know is here
  39. Python + Mirai development QQ robot starting tutorial (2021.9.9 test is valid)
  40. Python趣味编程中(PPT适合青少儿和零基础学习Python)
  41. Python基础第1讲(含代码、Python最新安装包、父与子的编程之旅:与小卡特一起学Python中文版)
  42. 用 Python 增强 Git
  43. Python基礎第1講(含代碼、Python最新安裝包、父與子的編程之旅:與小卡特一起學Python中文版)
  44. Base Python leçon 1 (y compris le Code, le dernier paquet d'installation Python, le voyage de programmation parent - enfant: apprendre la version chinoise de python avec le petit Carter)
  45. Dans la programmation amusante Python (ppt pour les jeunes enfants et l'apprentissage de base zéro Python)
  46. 非常好的题目详解Python字典的用法
  47. Python teaches you to build wechat push live Betta reminder from 0 (single room simplified version)
  48. Python 协程与 JavaScript 协程的对比
  49. 手把手带你用Python实现一个量化炒股策略
  50. Main dans la main pour mettre en œuvre une stratégie quantitative de spéculation boursière en python
  51. Comparaison des coproductions Python et JavaScript
  52. 【python种子项目ppc】一行代码生成项目与开发详细指导
  53. Docker 部署一个用 Python 编写的 Web 应用
  54. Python - poetry(4)管理环境
  55. Python - poetry(2)命令介绍
  56. [Python Seed Project PPC] a line of Code Generation Project and Development detailed guidance
  57. Introduction à la commande python - Poetry (2)
  58. Python - Poetry (4) Management Environment
  59. I collected Banhua's spatial data set in Python. In addition to meizhao, I found her other secrets again!
  60. I modified ban Hua's boot password in Python and found her secret after logging in again!