In order to automatically collect B station barrage, I developed a tool in Python

zeroing 2021-02-21 17:02:28
order automatically collect station barrage

Hello everyone , I'm Xiao Zhang !

stay 《Python Make word cloud video , From the picture of words and clouds, I can see the dancing of my sister 》 The article briefly introduces B The climbing method of the station barrage , Just find the parameters in the video cid, You can capture all the bullet screens in the video ; The idea is simple , But I feel that it's still troublesome , For example, one day after , I want to collect B Some video barrage on the station , We need to start from scratch : look for cid Parameters 、 Write code , Repetitive monotony ;

So I wonder if it's possible to get there in one step , In the future, only one step is needed to capture a certain video barrage , For example, enter the video link you want to crawl , The program can automatically identify downloads

Realization effect

Based on this , With the help of PyQt5 I wrote a little tool , Just provide the target video url And the goal txt route , The program automatically collects the bullet screen under the video and saves the data to the target txt Text , Let's take a look at the preview :


PS Wechat public number has a limit on the number of frames of moving pictures , When I made the motion picture, I cut out part of it , So the effect may not be very smooth

The tool implementation is divided into UI Interface 、 Data collection Two parts , Use of Python library :

import requests
import re
from PyQt5.QtWidgets import *
from PyQt5 import QtCore
from PyQt5.QtGui import *
from PyQt5.QtCore import QThread, pyqtSignal
from bs4 import BeautifulSoup
 Copy code 

UI Interface

UI The interface uses PyQt5, Two buttons were placed ( Start the download 、 Save to ), Input video link Of editline Control and debug window ;


The code is as follows :

 def __init__(self,parent =None):
self.setWindowTitle("B Station barrage collection ")
self.setWindowIcon(QIcon('pic.jpg'))# Icon
self.top_label = QLabel(" author : Xiao Zhang \n WeChat number : Xiao Zhang Python")
self.label = QLabel("B Stop video url")
self.editline1 = QLineEdit()
self.pushButton = QPushButton(" Start the download ")
self.pushButton.setEnabled(False)# Turn off and start
self.Console = QListWidget()
self.saveButton = QPushButton(" Save to ")
self.layout = QGridLayout()
self.savepath = None
 Copy code 

When url After it is not empty and the target text storage path has been set , To enter the data acquisition module

 effect 12

The code to implement this function :

 def syns_lineEdit(self):
if self.editline1.text():
self.pushButton.setEnabled(True)# Turn on the button
def savePushbutton(self):
savePath = QFileDialog.getSaveFileName(self,'Save Path','/','txt(*.txt)')
if savePath[0]:# Choose txt File path
self.savepath = str(savePath[0])# Assign a value
 Copy code 

Data collection

The program gets url after , The first step is to visit url Extract the... Of the video in the current page cid Parameters ( A series of numbers )


utilize cid Parameter to construct the API Interface , After that, we used the conventional method requests and bs4 Package to achieve text collection


Data acquisition part code :

f = open(self.savepath, 'w+', encoding='utf-8') # open txt file
res = requests.get(url)
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text, 'lxml')
items = soup.find_all('d') # find d label
for item in items:
text = item.text
 Copy code 

cid Parameters It's not in the normal html On the label , When extracting, I choose re Regular matching ; But this step consumes a lot of memory , In order to reduce the UI The effect of interface response speed , This step is implemented in a single thread

class Parsetext(QThread):
trigger = pyqtSignal(str) # Signal transmission ;
def __init__(self,text,parent = None):
self.text = text
def __del__(self):
def run(self):
print(' analysis -----------{}'.format(self.text))
result_url = re.findall('.*?"baseUrl":"(.*?)","base_url".*?', self.text)[0]
 Copy code 


Okay , The above is the whole content of this article , Hope the content can help you in your work or study .

Thank you for reading , See you next time ~

The source code for

About the source code used in this article , Pay attention to wechat account Xiao Zhang Python, Background reply keyword 210217 Can get !


  1. Python Tkinter inserts all the pictures in a directory into the docx file
  2. 解决忽略VScode中Python插件pylint报错的问题
  3. To solve the problem of ignoring the error of Python plug-in in vscode
  4. python 毫秒级时间,时间戳转换
  5. Python millisecond time, timestamp conversion
  6. python try except 出现异常时,except 中如何返回异常的信息字符串
  7. When an exception occurs in Python try except, how to return the exception information string in except
  8. 手机最强Python编程神器,在手机上运行Python
  9. The strongest Python Programming artifact on mobile phones, running Python on mobile phones
  10. 2021年Python程序员薪资待遇如何?
  11. 「python安装」Windows上安装和创建python开发环境
  12. What is the salary of Python programmers in 2021?
  13. "Python installation" to install and create a python development environment on Windows
  14. python解决组合问题
  15. Python to solve the problem of composition
  16. Python中的Lasso回归之最小角算法LARS
  17. Lars, the least angle algorithm of lasso regression in Python
  18. 利用python提取网站曲线图数据
  19. Using Python to extract website graph data
  20. Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)
  21. Detailed usage of urllib in Python 3 (header, proxy, timeout, authentication, exception handling)
  22. python 第三方库paramiko
  23. python 第三方库paramiko
  24. Python third party library paramiko
  25. Python third party library paramiko
  26. 卸载 PyCharm!这才是 Python 小白的最理想的 IDE
  27. 卸载 PyCharm!这才是 Python 小白的最理想的 IDE
  28. Uninstall pycharm! This is the ideal IDE for Python Xiaobai
  29. django学习-27.admin管理后台里:对列表展示页面的数据展示进行相关优化
  30. Uninstall pycharm! This is the ideal IDE for Python Xiaobai
  31. Django learning - 27. Admin management background: optimize the data display of the list display page
  32. python day2
  33. python day2
  34. Python 内存泄漏问题排查
  35. Troubleshooting of Python memory leak
  36. Python 与 excel的简单应用
  37. Simple application of Python and excel
  38. Python 与 excel的简单应用
  39. Simple application of Python and excel
  40. 2.7万 Star!最全面的 Python 设计模式集合
  41. 27000 stars! The most comprehensive collection of Python design patterns
  42. python day3
  43. python day3
  44. Commonly used data operation functions of Python
  45. (数据科学学习手札108)Python+Dash快速web应用开发——静态部件篇(上)
  46. (learning notes of data science 108) Python + dash rapid web application development -- static components (I)
  47. (数据科学学习手札108)Python+Dash快速web应用开发——静态部件篇(上)
  48. (learning notes of data science 108) Python + dash rapid web application development -- static components (I)
  49. [Python] Matplotlib 图表的绘制和美化技巧
  50. Drawing and beautifying skills of [Python] Matplotlib chart
  51. [Python] Matplotlib 图表的绘制和美化技巧
  52. Drawing and beautifying skills of [Python] Matplotlib chart
  53. Virtual environment of Python project
  54. 翻译:《实用的Python编程》02_01_Datatypes
  55. Translation: practical Python Programming 02_ 01_ Datatypes
  56. 翻译:《实用的Python编程》02_01_Datatypes
  57. 翻译:《实用的Python编程》02_01_Datatypes
  58. Translation: practical Python Programming 02_ 01_ Datatypes
  59. Translation: practical Python Programming 02_ 01_ Datatypes
  60. Python 3 入门,看这篇就够了