Python爬虫知乎文章,采集新闻60秒

GreenSunIT 2021-04-07 23:17:01
Python 爬虫 文章 采集 新闻


前言

发现很多人需要新闻的接口,所以自己去搜索了下,发现知乎上正好有对应的用户每天发布新闻简讯,所以自己想写一个新闻的爬虫。如果想做成接口的话,可以加上flask模块即可,这里就暂时只进行爬虫部分的编写。

目标站点

网址:https://www.zhihu.com/people/mt36501
通过这个网址进去,我只想要今天的内容,所以还要进行过滤。

开始编写代码

# 导入要使用的库
import requests, re, time
# 目标网址
url = 'https://www.zhihu.com/people/mt36501'
# 模拟请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362',
'Accept': 'image/png, image/svg+xml, image/*; q=0.8, */*; q=0.5',
}
# 请求网址返回内容
resp = requests.get(url,headers=headers).text
# 过滤标题
h2 = re.findall(r'<h2 class="ContentItem-title">.*?</h2>', resp, re.S)
# 遍历每一个标题,因为发现有时候会发与新闻不想关的内容
for i in h2:
# 获取当前日期
now_time = time.strftime("%#m月%#d日", time.localtime())
# 过滤出链接
link = re.findall(r'href="(.*?)"', str(i), re.S)[0]
# 过滤出标题
title = re.findall(r'Title">(.*?)</a>', str(i), re.S)
# 如果为空跳过
if title == []:
continue
else:
# 获取文章的日期
title = str(title[0]).split(',')[0]
# 文章日期与当前日期比较
if title == now_time and link != '':
#print(title, link)
# 如果日期为今天,请求对应的网址,获取对应文章的内容
con_resp = requests.get('https:' + link, headers=headers).text
# 只要我们想要的内容,并过滤掉一些字符
p = re.findall(r'<p>(.*?)</p>', con_resp.replace('&#34;', '"').replace('&amp;', '&'), re.S)
sum = 0
text = ''
# 遍历每一条获取到的新闻赋值给text
for index, i in enumerate(p):
sum += 1
if sum == 1 | sum == 3:
continue
print(i)
elif i == '':
print(i)
continue
else:
if index == len(p) - 1:
text += i
else:
text += i + '\n\n'
print(text)
版权声明
本文为[GreenSunIT]所创,转载请带上原文链接,感谢
https://www.cnblogs.com/greensunit/p/14630111.html

  1. Want to know what kids will look like in the future? Python face fusion tells you
  2. I made a big screen of global epidemic data with Python
  3. python你TM太皮了——区区30行代码就能记录键盘的一举一动
  4. Python you TM too skinny - just 30 lines of code can record every move of the keyboard
  5. python的装饰器概念学习基础基础版
  6. Python decorator concept learning basic edition
  7. SQL配合Python-Flask的中转注入
  8. python3使用kivy生成安卓程序
  9. 不到 150 行代码写一个 Python 版的贪吃蛇
  10. Transfer injection of SQL and python flash
  11. Using Kivy to generate Android program in Python 3
  12. Less than 150 lines of code to write a python version of the snake
  13. Python面向对象练习题
  14. Python数据分析入门(八):Pandas统计计算和描述
  15. Python面向对象练习题
  16. Python object oriented exercises
  17. Introduction to Python data analysis (8): Pandas statistical calculation and description
  18. Python object oriented exercises
  19. WEB4-通过python获得flag
  20. python-web5
  21. Pandas-二进制操作
  22. python入门教程14-01 (python语法入门之python内存泄露)
  23. Web4 - get flag through Python
  24. python-web5
  25. Pandas binary operation
  26. python入门教程13-06 (python语法入门之视图、触发器、事务、存储过程、函数)
  27. python入门教程13-07 (python语法入门之ORM框架SQLAlchemy)
  28. python入门教程13-08 (python语法入门之python索引原理与慢查询优化)
  29. 定投指数到底能不能赚钱?Python 来告诉你答案
  30. Python入门学习之:10分钟1500访问量
  31. Getting started with Python 14-01
  32. 用 Python 画哆啦 A 梦
  33. Python 图表利器 pyecharts
  34. 用 Python 抓取公号文章保存成 HTML
  35. Introduction to Python 13-06 (view, trigger, transaction, stored procedure, function of introduction to Python syntax)
  36. Getting started with Python 13-07 (ORM framework Sqlalchemy for getting started with Python syntax)
  37. Introduction to Python 13-08
  38. Can fixed investment index make money? Python will tell you the answer
  39. Introduction to Python: 1500 visits in 10 minutes
  40. 用 Python 获取股市交易数据
  41. Drawing Doraemon in Python
  42. Python charts
  43. 用 Python 来了解一下《安家》
  44. 用 Python 抓取公号文章保存成 PDF
  45. 用 Python 生成炫酷二维码及解析
  46. Using Python to grab articles with public number and save them as HTML
  47. Getting stock market trading data with Python
  48. Learn about settle down in Python
  49. Using Python to grab articles with public number and save them as PDF
  50. Using Python to generate cool two dimensional code and analysis
  51. 20210225-1 Python错误与异常
  52. 20210225-1 Python errors and exceptions
  53. 使用Python拆分、合并PDF
  54. Using Python to split and merge pdf
  55. 真工程师:20块钱做了张「名片」,可以跑Linux和Python
  56. Implementation of LSB steganography based on MATLAB and python
  57. Real Engineer: 20 yuan to make a "business card", can run Linux and python
  58. python修改微信和支付宝步数
  59. Python changes WeChat and Alipay steps
  60. Python空间分析| 01 利用Python计算全局莫兰指数(Global Moran's I)