[bilibilibili] Python crawler downloads BiliBili video

SunriseCai 2020-11-13 11:28:44
bilibilibili python crawler downloads bilibili


This blog is only for my spare time to record articles , Publish to , Only for users to read , If there is any infringement , Please let me know , I'll delete it .

Preface

In recent days, , A little friend said he wanted to read an article about video download , see , take bilibili Here we go .
Because there is no deep right bilibili Conduct research , Lead to poor readability of the article , No joy, no spray. .
Although it's just a taste . But the goal of downloading video can be achieved .

About video clarity :

Method describe
Not logged in Captured video and audio url Parameters :mid=0
The login status Captured video and audio url Parameters :mid=xxx( A string of numbers )

If it's not logged in , There's no clarity .
If you open a video when you're logged in , Choose HD , that !! Use your string mid The requested videos are all HD ( The video itself has no HD except ).

1. Ideas

Let's talk about the whole idea here !

  1. Find the audio and video url Is the key
  2. After downloading video and audio respectively , Then mix video and audio ( That is to say, the picture and sound are combined into one file ).

Don't talk much , Let's start with !!!

2. analysis

The video in the picture below is an example . In this paper, for Firefox browser , Convenient to show the effect of bag grabbing . Insert picture description here
The first is to open the developer tool to grab the package . Here's the picture :
 Insert picture description here
 Insert picture description here
 Insert picture description here

See the grab page The media classification , All are mp4 Data packets in format , The packet size is from 107 Bytes to 2.30MB Unequal . There is an interesting rule as follows :

  1. 107 Bytes of packets are passed through OPTIONS Method request for .
  2. The bigger packets go through GET Mode request .
  3. There are two repeated URL( Guess it's video and audio )
  4. The request header Range Field , The next file will be followed by the last file's byte mantissa ,1 File is 0-973,2 File is 974-1653...

In that case , Let's use the code to request a look .
 Insert picture description here
After code execution , It generates a 1kb Of test.mp4 file , Obviously , Wrong opening
 Insert picture description here
Why is it wrong ?? I think the file is too small , Modify the request header Range The value is 0-1024000, Run the code again .
This time a 1001kb Of mp4 file , And it can play !! But there's no sound in the video , confirmed 了 bilibili Video files are separated from audio and video .
 Insert picture description here

It says , Most of the packets caught are two duplicates URL, Now ask for another URL have a look .

The code just will URL Made modifications , The request header has not changed .
 Insert picture description here
( No audio in the picture , But there are voices ), such , So I finished this job .
 Insert picture description here


Here we are. , The job of grabbing the bag has been finished , The origin of video and audio is also clear , That's the question , Such a long string URL Where did it come from ?
!!! Look at the source code .
puzzled , But forget the simplest way , View source code of webpage ( It took a long time to discover , The original source code contains !!!)

video Of URL:
 Insert picture description here
audio Of URL:
 Insert picture description here
Here we are. , The rest use regular matching source code inside video and audio Of URL It's not a problem !!!


The remaining questions :

  • You also need to mix video and audio , This can be used ffmpeg Complete or use format factory .

2.1 ffmpeg Composite audio and video

2.1.1 install ffmpeg

ffmpeg Official website :https://ffmpeg.zeranoe.com/builds/

Click in to download the corresponding computer version .
 Insert picture description here

Download the file and unzip it into the following figure :
 Insert picture description here
And then bin The directory of this file is added to System environment variable that will do . Here's the picture :
 Insert picture description here


Explain a little bit about what's used here ffmpeg The meaning of the command is :

Method describe
-c copy Copy all the streams
-i Input file
-loglevel Logging level used
quiet With the loglevel Back , No log output

Interested partners can click ffmepeg Chinese document , Deepen understanding .

I also use subprocess modular , Use to generate subprocesses , And they can be piped into their input / Output / error , And get their return values .

import subprocess
def merge_video_and_audio(video_name):
"""
Audio and video merging function , utilize ffmpeg Merge audio and video
:param video_name: Pass in the title
:return:
"""
cmd = f'ffmpeg -i "{video_name}.m4s" -i "{video_name}.mp3" -c copy "{video_name}.mp4" -loglevel quiet'
subprocess.Popen(command, shell=True)
print(f'{video_name}.mp4 merger !!!')

What it looks like after executing the code :

  • You can see that it has been successfully synthesized mp4 Format video .

 Insert picture description here

2.2 Format factory synthesis audio and video

ps: Other videos are used here as examples .
1. Download good audio and video :
 Insert picture description here
2. Open the format factory :
 Insert picture description here
3. Audio and video streaming :
 Insert picture description here
Last , Wait for the mixed model to finish .
Above code , Download... Here .
Code self access : https://github.com/SunriseCai/spiderCode

3. Words behind

If you guys are interested , Can improve the code . Some suggestions are as follows :

  1. Add the function of manually inputting search video
  2. utilize Pyqt5 Make it visual bilibili Video download widget

Friends, if you improve the code, remember to send me a copy !!!

版权声明
本文为[SunriseCai]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database