Python爬虫的简单入门

柒是幸运 2022-09-09 01:45:44 阅读数:318

Python爬虫简单入门
import csv
import requests
import re
header = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
count =0
#url = 'https://movie.**.com/top250'
for id in range(0,250,25):
url = f'https://movie.***.com/top250?start={
id}&filter='
resp = requests.get(url, headers=header)
contents = resp.text
obj = re.compile(
r'<li>.*?<div class="item">.*?<span class="title">(?P<name>.*?)</span>.*?<div class="bd">.*?<div class="bd">.*?'
r'<p class="">(?P<director>.*?)&nbsp;.*?主演(?P<main>.*?)...<br>'
r'.*?<span class="rating_num" property="v:average">(?P<score>.*?)</span>.*?'
r'<span>(?P<num>.*?)人评价</span>', re.S)
result = obj.finditer(contents)
f = open("**top250.csv", mode="a",newline='',encoding='utf-8-sig')
csvwriter = csv.writer(f)
for i in result:
dic = i.groupdict()
dic['director'] = dic['director'].strip()
dic['score'] = '\n' + '评分:' + dic['score']
dic['main'] = '主演' + dic['main']
dic['num'] = dic['num'] + '人评价'
csvwriter.writerow(dic.values())
f.close()
count+=1
print("over!!!!"+str(count))

对于某网站的简单爬取

爬取的结果:
在这里插入图片描述
对于爬取的数据进行整理:
在这里插入图片描述

哈哈哈,爬虫好帅
涉及网站名称已经用**代替

版权声明:本文为[柒是幸运]所创,转载请带上原文链接,感谢。 https://blog.csdn.net/m0_46599939/article/details/126675631