python的多线程的网络爬虫,待改进

VisIf 2021-04-07 22:01:03
Python BeautifulSoup def lxml tqdm


#!_*_coding:utf-8- -*- #! @time :2021/4/4 7:58 #!name :Vislf import threading,queue import time import requests from bs4 import BeautifulSoup from tqdm import tqdm #文本写入线程 def write_txt(url_list,Q): print('启动了') f=open(book_name,'w',encoding='utf-8') f.close() # 开始写入 for i in tqdm(range(len(url_list))): va=Q.get() url_list[va[-1]].append(va[:-1]) f=open(book_name,'a',encoding='utf-8') print('打开文本开始写入') print('开始写入文本') for i in tqdm(range(len(url_list))): f.write(url_list[i][1]) f.write('\n') f.write('\n') f.write('\n') f.write('\n'.join(url_list[i][-1])) f.write('\n') f.write('\n') f.write('\n') f.flush() #网页下载,解析线程 def get_txt(n,q): req = requests.get(url=n[0]) req.encoding = 'utf-8' html = req.text bf = BeautifulSoup(html, 'lxml') texts = bf.find('div', id='content') content = texts.text.strip().split('\xa0' * 4) content.append(n[2]) q.put(content,1) #主线程 def main(): print('程序启动:') q=queue.Queue(2048) threads=[] url_list_data=[] ks_time=time.time() #获取小说章节信息 req = requests.get(url=target) req.encoding = 'utf-8' html = req.text chapter_bs = BeautifulSoup(html, 'lxml') chapters = chapter_bs.find('div', id='list') chapters = chapters.find_all('a') #得到url列表,标号列表 for i in range(len(chapters)): data0 = server + chapters[i].get('href') data1 = chapters[i].string data2 = i url_list_data.append([]) url_list_data[i].append(data0) url_list_data[i].append(data1) url_list_data[i].append(data2) #网页线程启动 for i in tqdm(range(len(url_list_data))): t=threading.Thread(target=get_txt,args=(url_list_data[i],q)) threads.append(t) threads[i].start() t=threading.Thread(target=write_txt,args=(url_list_data,q)) t.start() for i in range(len(url_list_data)-500): threads[i].join() t.join() print('python 程序结束的地方-') sj=time.time()-ks_time print(sj) if __name__=="__main__": server = 'https://www.xsbiquge.com' book_name = '诡秘之主.txt' target = 'https://www.xsbiquge.com/15_15338/' main() exit()
版权声明
本文为[VisIf]所创,转载请带上原文链接,感谢
https://my.oschina.net/u/4201451/blog/5011515

  1. Want to know what kids will look like in the future? Python face fusion tells you
  2. I made a big screen of global epidemic data with Python
  3. python你TM太皮了——区区30行代码就能记录键盘的一举一动
  4. Python you TM too skinny - just 30 lines of code can record every move of the keyboard
  5. python的装饰器概念学习基础基础版
  6. Python decorator concept learning basic edition
  7. SQL配合Python-Flask的中转注入
  8. python3使用kivy生成安卓程序
  9. 不到 150 行代码写一个 Python 版的贪吃蛇
  10. Transfer injection of SQL and python flash
  11. Using Kivy to generate Android program in Python 3
  12. Less than 150 lines of code to write a python version of the snake
  13. Python面向对象练习题
  14. Python数据分析入门(八):Pandas统计计算和描述
  15. Python面向对象练习题
  16. Python object oriented exercises
  17. Introduction to Python data analysis (8): Pandas statistical calculation and description
  18. Python object oriented exercises
  19. WEB4-通过python获得flag
  20. python-web5
  21. Pandas-二进制操作
  22. python入门教程14-01 (python语法入门之python内存泄露)
  23. Web4 - get flag through Python
  24. python-web5
  25. Pandas binary operation
  26. python入门教程13-06 (python语法入门之视图、触发器、事务、存储过程、函数)
  27. python入门教程13-07 (python语法入门之ORM框架SQLAlchemy)
  28. python入门教程13-08 (python语法入门之python索引原理与慢查询优化)
  29. 定投指数到底能不能赚钱?Python 来告诉你答案
  30. Python入门学习之:10分钟1500访问量
  31. Getting started with Python 14-01
  32. 用 Python 画哆啦 A 梦
  33. Python 图表利器 pyecharts
  34. 用 Python 抓取公号文章保存成 HTML
  35. Introduction to Python 13-06 (view, trigger, transaction, stored procedure, function of introduction to Python syntax)
  36. Getting started with Python 13-07 (ORM framework Sqlalchemy for getting started with Python syntax)
  37. Introduction to Python 13-08
  38. Can fixed investment index make money? Python will tell you the answer
  39. Introduction to Python: 1500 visits in 10 minutes
  40. 用 Python 获取股市交易数据
  41. Drawing Doraemon in Python
  42. Python charts
  43. 用 Python 来了解一下《安家》
  44. 用 Python 抓取公号文章保存成 PDF
  45. 用 Python 生成炫酷二维码及解析
  46. Using Python to grab articles with public number and save them as HTML
  47. Getting stock market trading data with Python
  48. Learn about settle down in Python
  49. Using Python to grab articles with public number and save them as PDF
  50. Using Python to generate cool two dimensional code and analysis
  51. 20210225-1 Python错误与异常
  52. 20210225-1 Python errors and exceptions
  53. 使用Python拆分、合并PDF
  54. Using Python to split and merge pdf
  55. 真工程师:20块钱做了张「名片」,可以跑Linux和Python
  56. Implementation of LSB steganography based on MATLAB and python
  57. Real Engineer: 20 yuan to make a "business card", can run Linux and python
  58. python修改微信和支付宝步数
  59. Python changes WeChat and Alipay steps
  60. Python空间分析| 01 利用Python计算全局莫兰指数(Global Moran's I)