Python crawler to get the starting point of Chinese network popularity ranking Top100 (quick start, novice necessary!)

Homo sapiens 2021-01-22 16:49:39
python crawler starting point chinese


What this blog brings to you is to use python Crawler get the starting point of Chinese network popularity ranking Top100 The share of , I hope you can feel the charm of reptiles in the process of learning ! Let's start with the website https://www.qidian.com/all/ Come to the home page of the starting point Chinese website !

according to url And the number of pages that need to get resources , We can write the list derivation of the URL first http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,6)

The specific code is shown below :

"""
@File : Get the starting point of Chinese network popularity ranking Top100.py
@Time : 2019/10/21 22:31
@Author : Fengming Jingjun
@Software: PyCharm
Reprint please indicate the original author
It's not easy to create , For sharing only
"""
# Import related libraries
import xlwt
import requests
from lxml import etree
import time
# Initialization list , Save crawler data
all_info_list = []
def get_info(url):
html = requests.get(url)
selector = etree.HTML(html.text)
# Positioning big tags , In turn, cycle , Get links to the details of each novel on each page url
infos = selector.xpath('//ul[@class="all-img-list cf"]/li')
# Traverse links , Get the details of each novel
for info in infos:
# title
title = info.xpath('div[2]/h4/a/text()')[0]
# author
author = info.xpath('div[2]/p[1]/a[1]/text()')[0]
# style 1
style1 = info.xpath('div[2]/p[1]/a[2]/text()')[0]
# style 2
style2 = info.xpath('div[2]/p[1]/a[3]/text()')[0]
# style
style = style1 + style2
# Degree of closure
complete = info.xpath('div[2]/p[1]/span/text()')[0]
# Introduction to the novel
introduce = info.xpath('div[2]/p[2]/text()')[0].strip()
info_list = [title, author, style, complete, introduce]
# Put the data in a list
all_info_list.append(info_list)
# Set sleep time
time.sleep(1)
# The main entry of the program
if __name__ == '__main__':
urls = ['http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,6)]
for url in urls:
get_info(url)
time.sleep(5)
# Define header
header = ['title', 'author', 'style', 'complete', 'introduce']
# Create Workbook
book = xlwt.Workbook(encoding='utf_8')
# Create sheet
sheet = book.add_sheet('Shee1')
# python range() Function to create a list of integers , Generally used in for In circulation .
# Python len() Method returns the object ( character 、 list 、 Tuples etc. ) Length or number of items .
for h in range(len(header)):
# Write header
sheet.write(0, h, header[h])
i = 1
# Through the loop traversal , Put the data in xls In the table
for list in all_info_list:
j = 0
for data in list:
sheet.write(i, j, data)
# View results
print(data)
j += 1
i += 1
# Out of data storage , Save the workbook to the local path
book.save('qidianxiaoshuo.xls')

Effect validation :

Because we saved the final results in our local xls In file , So just open it and you can see !

qidianxiaoshuo.xls file

If you see the effect above , Congratulations on your success ! Does it feel interesting ~~ That's all for this sharing , Don't forget to pay attention to it , Xiaojun will launch more simple and fun technologies one after another ٩(๑>◡<๑)۶

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

版权声明
本文为[Homo sapiens]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/01/20210122164410660Z.html

  1. python中hmac模块的使用
  2. Python crawler_ Garbage man idle fish crawler turn crawler data integration self use second hand rapid response garbage collection platform_ 3 (with continuous source update)
  3. Using Python javaserialization tools module to generate 8u20 gadget
  4. The basic syntax of Python
  5. The use of HMAC module in Python
  6. 攻防世界web进阶区Web_python_block_chain详解
  7. Attack and defense world web advanced zone Web_ python_ block_ Details of chain
  8. pandas DataFrame的新增行列,修改、删除、筛选、判断元素以及转置操作
  9. Add rows and columns, modify, delete, filter, judge elements and transpose operations in pandas dataframe
  10. pandas DataFrame的新增行列,修改、删除、筛选、判断元素以及转置操作
  11. Add rows and columns, modify, delete, filter, judge elements and transpose operations in pandas dataframe
  12. 虚言妙诀终虚见,面试躬行是致知,Python技术面试策略与技巧实战记录
  13. The interview practice is knowledge, python technology interview strategy and skills of the actual record
  14. 用tqdm和rich为固定路径和目标的python算法代码实现进度条
  15. Using tqdm and rich as the fixed path and target of Python algorithm code to realize the progress bar
  16. 我来记笔记啦-Django开发流程与配置
  17. Let me take notes - Django development process and configuration
  18. python数据类型的强制转换
  19. Django报错:'Key 'id' not found in 'xxx'. Choices are: xxx'
  20. Python400集大型视频,从正确的方向出发学习,全套完整送给大家
  21. Mandatory conversion of Python data type
  22. Django reported an error: 'key' ID 'not found in' xxx '. Choices are: xxx'
  23. Python 400 sets of large video, starting from the right direction to learn, a complete set to you
  24. 只需十四步:从零开始掌握Python机器学习(附资源)
  25. Just 14 steps: Master Python machine learning from scratch (resources attached)
  26. Python|文件读写
  27. 安利一个Python界神奇得网站
  28. Python | file reading and writing
  29. Amway is a marvelous website in Python world
  30. 第二热门语言:从入门到精通,Python数据科学简洁教程
  31. The second popular language: from introduction to mastery, python data science concise tutorial
  32. 以我的亲身经历,聊聊学python的流程,同时推荐学python的书
  33. With my own experience, I'd like to talk about the process of learning Python and recommend books for learning python
  34. 以我的亲身经历,聊聊学python的流程,同时推荐学python的书
  35. With my own experience, I'd like to talk about the process of learning Python and recommend books for learning python
  36. Django url 路由匹配过程
  37. Django URL routing matching process
  38. 强者一出,谁与争锋?与Python相比,C++的运行速度究竟有多快?
  39. Who will fight against the strong? How fast is C + + running compared with Python?
  40. python 学习体会
  41. Experience of learning Python
  42. python7、8章
  43. Chapter 7 and 8 of Python
  44. python bool和str转换
  45. python——循环(for循环、while循环)及练习
  46. python变量和常量命名、注释规范
  47. python自定义异常捕获异常处理异常
  48. python 类型转换与数值操作
  49. python 元组(tuple)和列表(list)区别
  50. 解决python tkinter 与 sleep 延迟问题
  51. python字符串截取操作
  52. Python bool and STR conversion
  53. Python -- loop (for loop, while loop) and Practice
  54. Specification for naming and annotating variables and constants in Python
  55. Python custom exception capture exception handling exception
  56. Python type conversion and numerical operation
  57. The difference between tuple and list in Python
  58. Solve the delay problem of Python Tkinter and sleep
  59. Python string interception operation
  60. Python 100天速成中文教程,GitHub标星7700