Introduction to Python: 1500 visits in 10 minutes

To accompany carp 2021-04-06 23:16:33
introduction python visits minutes


See the effect :

 Insert picture description here

It's no use not talking , Go straight to the code :

# author : sunzd
# date : 2019/9/01
# position : beijing
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
from urllib import request
from urllib import error
import re
import time
def html_request(url):
if url is None:
return
print("download html is :{0}".format(url))
# If url Including Chinese , You need to code 
# Simulate browser behavior 
headers = {
'UserAgent': str(UserAgent().random)}
req = request.Request(url, headers=headers)
try:
html = request.urlopen(req).read().decode('utf-8')
except error.URLError as e:
if hasattr(e, "code"):
print(e.code)
if hasattr(e, "reason"):
print(e.reason)
return None
# print(html)
return html
def html_parser(url, html):
if url is None or html is None:
return
# pattern = '<main>(.+?)</main>' # because <main> When it comes next ‘\n’ So you need to ignore the use pattern modifier re.S send '.' Can match any character 
# articles = re.compile(pattern, re.S).findall(html)
# articles = articles[0]
pattern_art = '<div class="article-item-box csdn-tracking-statistics" data(.+?)</div>'
# print(articles)
articles = re.compile(pattern_art, re.S).findall(html.replace('\n', ''))
print(articles.__len__())
for article in articles:
soup = BeautifulSoup(article, 'html.parser')
title = soup.find('a', attrs={
'target': '_blank'})
# print(title)
print(
" Article title :{0}\n Type of article :{1}".format(title.text.replace(' ', '').replace(" primary ", "").replace(" turn ", ""), title.span.text))
print(" The article links :{0}".format(title.attrs['href']))
html_request(title.attrs['href'])
infors = soup.find('div', attrs={
'class': 'info-box d-flex align-content-center'})
# for infor in infors.p.next_siblings: next_siblings : Because it doesn't include myself , So the first one will be p The node information is removed .
# for infor in infors.children:
# if infor == ' ': # ‘ ’ The space will also be identified as his child , So you need to filter out 
# continue
# # print("======{0}".format(infor))
# if infor.span: # It only needs <span > Node information 
# print("{0}".format(infor.span.text))
pattern_next = '<li class="js-page-next js-page-action ui-pager ui-pager-disabled">'
next = re.compile(pattern_next).findall(html)
# print(html)
print(" Last page or not :{0}----{1}".format(len(next), next))
if len(next) == 0:
return 0
else:
return 0
if __name__ == '__main__':
name = ' Your own name '
page = 1
url = "https://blog.csdn.net/" + name + "/article/list/" + str(page) + '?'
while page < 7:
html = html_request(url)
# print(html)
next = html_parser(url, html)
page += 1
if page > 6:
page = 1
url = "https://blog.csdn.net/" + name + "/article/list/" + str(page) + '?'
版权声明
本文为[To accompany carp]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210406230254857I.html

  1. Spark Delta Lake 0.4.0 发布,支持 Python API 和部分 SQL
  2. How to transfer office files to PDF
  3. Are you still worried about multiple excel summary statistics? Python second processing really fragrant!
  4. Making music aggregate downloader with Python
  5. Spark delta Lake 0.4.0 is released, supporting Python API and part of SQL
  6. Python信息搜集
  7. Python information gathering
  8. Python - 关于类(self/cls) 以及 多进程通讯的思考
  9. Python - thinking about class (self / CLS) and multi process communication
  10. Python - 关于类(self/cls) 以及 多进程通讯的思考
  11. Python - thinking about class (self / CLS) and multi process communication
  12. Python信用评分卡建模(附代码)
  13. Python credit score card modeling (with code)
  14. 学Python需要学数据库吗?Python学习教程!
  15. Do you need to learn database to learn Python!
  16. Python私有变量如何定义?Python学习教程!
  17. How to define Python private variables? Python tutorial!
  18. Python数据分析入门(六):Pandas的函数应用
  19. Introduction to Python data analysis (6): function application of pandas
  20. 学Python需要学数据库吗?Python学习教程!
  21. Do you need to learn database to learn Python!
  22. Python描述 LeetCode 80. 删除有序数组中的重复项 II
  23. C++/python描述 AcWing 94. 递归实现排列型枚举
  24. C++/python描述 AcWing 92. 递归实现指数型枚举
  25. Python描述 LeetCode 88. 合并两个有序数组
  26. 苏州大学计算机考研 复试机试真题2013-2021真题及Python题解
  27. Python描述 LeetCode 781. 森林中的兔子
  28. 字典和json的区别是什么?Python学习
  29. Python describes leetcode 80. Removing duplicate items from ordered arrays II
  30. C + + / Python description acwing 94. Recursive implementation of permutation enumeration
  31. C + + / Python description acwing 92. Recursive implementation of exponential enumeration
  32. Python describes leetcode 88. Merging two ordered arrays
  33. Real computer test questions 2013-2021 of computer postgraduate entrance examination of Soochow University and python solutions
  34. The rabbit in the forest
  35. Python中的魔法属性
  36. What's the difference between dictionary and JSON? Python learning
  37. Magic properties in Python
  38. 字典和json的区别是什么?Python学习
  39. What's the difference between dictionary and JSON? Python learning
  40. python刷题-字母图形
  41. Python brush questions - letter graphics
  42. Python数据分析入门(七):Pandas层级索引
  43. Introduction to Python data analysis (7): Pandas hierarchical index
  44. Python 操作腾讯云短信(sms)详细教程
  45. Python operation Tencent cloud SMS (SMS) detailed tutorial
  46. Python数据可视化,完整版实操指南 !
  47. Python data visualization, full version of the practical guide!
  48. 上手Pandas,带你玩转数据(2)-- 使用pandas从多种文件中读取数据
  49. 上手Pandas,带你玩转数据(1)-- 实例详解pandas数据结构
  50. Using pandas to read data from various files
  51. Hands on pandas, take you to play with data (1) -- detailed explanation of pandas data structure with examples
  52. Pandas数据结构基础用法
  53. Basic usage of pandas data structure
  54. Python读取ini配置文件,保存到对象属性
  55. Python reads the INI configuration file and saves it to the object properties
  56. Foundation of Python: classes in Python
  57. python刷题-闰年判断
  58. python刷题-01字串
  59. How to judge leap year
  60. Python brush title-01 string