Hello everyone , I'm Xiao Zhang !

stay 《Python Make word cloud video , From the picture of words and clouds, I can see the dancing of my sister 》 The article briefly introduces B The climbing method of the station barrage , Just find the parameters in the video cid, You can capture all the bullet screens in the video ; The idea is simple , But I feel that it's still troublesome , For example, one day after , I want to collect B Some video barrage on the station , We need to start from scratch : look for cid Parameters 、 Write code , Repetitive monotony ;

So I wonder if it's possible to get there in one step , In the future, only one step is needed to capture a certain video barrage , For example, enter the video link you want to crawl , The program can automatically identify downloads

Realization effect

Based on this , With the help of PyQt5 I wrote a little tool , Just provide the target video url And the goal txt route , The program automatically collects the bullet screen under the video and saves the data to the target txt Text , Let's take a look at the preview :


PS Wechat public number has a limit on the number of frames of moving pictures , When I made the motion picture, I cut out part of it , So the effect may not be very smooth

The tool implementation is divided into UI Interface 、 Data collection Two parts , Use of Python library :

import requests
import re
from PyQt5.QtWidgets import *
from PyQt5 import QtCore
from PyQt5.QtGui import *
from PyQt5.QtCore import QThread, pyqtSignal
from bs4 import BeautifulSoup
UI Interface

UI The interface uses PyQt5, Two buttons were placed ( Start the download 、 Save to ), Input video link Of editline Control and debug window ;


The code is as follows :

 def __init__(self,parent =None):
self.setWindowTitle("B Station barrage collection ")
self.setWindowIcon(QIcon('pic.jpg'))# Icon
self.top_label = QLabel(" author : Xiao Zhang \n WeChat number : Xiao Zhang Python")
self.label = QLabel("B Stop video url")
self.editline1 = QLineEdit()
self.pushButton = QPushButton(" Start the download ")
self.pushButton.setEnabled(False)# Turn off and start
self.Console = QListWidget()
self.saveButton = QPushButton(" Save to ")
self.layout = QGridLayout()
self.savepath = None
When url After it is not empty and the target text storage path has been set , To enter the data acquisition module

 effect 12

The code to implement this function :

 def syns_lineEdit(self):
if self.editline1.text():
self.pushButton.setEnabled(True)# Turn on the button
def savePushbutton(self):
savePath = QFileDialog.getSaveFileName(self,'Save Path','/','txt(*.txt)')
if savePath[0]:# Choose txt File path
self.savepath = str(savePath[0])# Assign a value
Data collection

The program gets url after , The first step is to visit url Extract the... Of the video in the current page cid Parameters ( A series of numbers )


utilize cid Parameter to construct the API Interface , After that, we used the conventional method requests and bs4 Package to achieve text collection


Data acquisition part code :

f = open(self.savepath, 'w+', encoding='utf-8') # open txt file
res = requests.get(url)
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text, 'lxml')
items = soup.find_all('d') # find d label
for item in items:
text = item.text
cid Parameters It's not in the normal html On the label , When extracting, I choose re Regular matching ; But this step consumes a lot of memory , In order to reduce the UI The effect of interface response speed , This step is implemented in a single thread

class Parsetext(QThread):
trigger = pyqtSignal(str) # Signal transmission ;
def __init__(self,text,parent = None):
self.text = text
def __del__(self):
def run(self):
print(' analysis -----------{}'.format(self.text))
result_url = re.findall('.*?"baseUrl":"(.*?)","base_url".*?', self.text)[0]
 Copy code 


Okay , The above is the whole content of this article , Hope the content can help you in your work or study .

Thank you for reading , See you next time ~

The source code for

