This blog is only for my spare time to record articles , Publish to , Only for users to read , If there is any infringement , Please let me know , I'll delete it .
This article is pure and wild , There is no reference to other people's articles or plagiarism . Insist on originality !!
Hello . Here is Python Reptiles from getting started to giving up series of articles . I am a SunriseCai.
Combined with video viewing , Better taste !
This article mainly introduces the use of crawler download Of the League of heroes The skin of all heroes .
Hero League hero Library :https://lol.qq.com/data/info-heros.shtml
Check out the League of heroes website , As shown in the following figures :
As can be seen from the above pictures , This one is still a Russian one !!!
that , The next step is to download the pictures with the code .
It says , The home page of this article is https://lol.qq.com/data/info-heros.shtml.
Browser open homepage , Click on F12, Get into Developer model . Look at the page structure , Found out Secondary page The link to <li> Inside the label .perfect !!!
Home page request code :
import requests
url = 'https://lol.qq.com/data/info-heros.shtml'
headers = {
'User-Agent': 'Mozilla/5.0'
}
def get_hero_list():
res = requests.get(url, headers=headers)
if res.status_code == 200:
print(res.text)
else:
print('your code is fail')
!!!
After executing the above code , I found that there is no <li> Content of the label , What's going on here ?<li> The content of the tag is most likely through xhr Load asynchronously The document that came out , Let's grab the bag and have a look !!
When I asked for the homepage again, I found , stay xhr here , There is one hero_list.js file , Which translates as List of Heroes .
notice hero_list.js Of url by :https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js
After clicking , Find out that this is exactly what we need !!!
This is it , Pay attention to the red part of the frame heroId, This is the secondary page mentioned above url Ass Id
Type in the browser hero_list.js File address , Here's the picture :
very nice , The request code is also very simple , Just put the code above url Replace with https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js that will do .
Here we use The daughter of darkness – Anne For example , notice Anne share 13 A skin .
Bag grabbing discovery , There is one 1.js The data in the file just corresponds to Anne Of 13 A skin .
1.js Of documents url by :https://game.gtimg.cn/images/lol/act/img/js/hero/1.js
2.js Of documents url by :https://game.gtimg.cn/images/lol/act/img/js/hero/2.js
Yes, of course , The one behind this one Id It's for every hero heroId.
Open with a browser 1.js Of documents url, As shown in the figure below :
notice 1.js There are several documents img Of url, The pixel ratio of the image they represent is as follows :
name | Pixel ratio |
---|---|
mainImg | 980x500 |
iconImg | 60x60 |
loadingImg | 308x560 |
videoImg | 130x75 |
sourceImg | 1920x470 |
This article uses mainImg Do a download Demo .
ad locum , Clear your mind :
import requests
url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
headers = {
'User-Agent': 'Mozilla/5.0'
}
def get_hero_list():
"""
:return: Get the hero name and heroId
"""
res = requests.get(url, headers=headers)
if res.status_code == 200:
data = json.loads(res.text)
for item in data['hero']:
id = item['heroId']
name = item['name']
title = item['title']
print(id, name, title)
else:
print('your code is fail')
get_hero_list()
import requests
skinUrl = 'https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js'
headers = {
'User-Agent': 'Mozilla/5.0'
}
def get_skin_url(Id):
"""
:param Id: hero ID, For splicing url
:return:
"""
res = requests.get(skinUrl.format(Id), headers=headers)
if res.status_code == 200:
data = json.loads(res.text)
for item in data['skins']:
url = item['mainImg']
name = item['name'].replace('/', '')
print(url, name)
else:
print('your code is fail')
get_hero_list()
# -*- coding: utf-8 -*-
# @Time : 2020/1/28 21:12
# @Author : SunriseCai
# @File : YXLMSpider.py
# @Software: PyCharm
import os
import json
import time
import requests
""" League of heroes skin crawler program """
class YingXLMSpider(object):
def __init__(self):
self.onePageUrl = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
self.skinUrl = 'https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js'
self.headers = {
'User-Agent': 'Mozilla/5.0'
}
def get_heroList(self):
"""
:return: Get the hero's heroId, And the name of the hero
"""
res = requests.get(url=self.onePageUrl, headers=self.headers)
if res.status_code == 200:
data = json.loads(res.text)
for item in data['hero']:
Id = item['heroId']
title = item['title']
self.get_skin_url(Id, title)
else:
print('your code is fail')
def get_skin_url(self, Id, folder):
"""
:param Id: hero ID, For splicing url
:param folder: A folder named after the hero
:return:
"""
url = self.skinUrl.format(Id)
res = requests.get(url, headers=self.headers)
if res.status_code == 200:
data = json.loads(res.text)
for item in data['skins']:
url = item['mainImg']
name = item['name'].replace('/', '')
self.download_picture(url, name, folder)
else:
print('your code is fail')
def download_picture(self, url, name, folder):
"""
:param url: Skin address
:param name: Skin name
:param folder: Folder
:return:
"""
# If the folder does not exist, create
if not os.path.exists(folder):
os.makedirs(folder)
# Judge url Not empty and If the image does not exist, download it locally ( It is mainly used for breakpoint reconnection )
if not url == '' and not os.path.exists('%s/%s.jpg' % (folder, name)):
time.sleep(1)
res = requests.get(url, headers=self.headers)
with open('%s/%s.jpg' % (folder, name), 'wb') as f:
f.write(res.content)
print('%s.jpg' % name, ' Download successful ')
f.close()
def main(self):
self.get_heroList()
if __name__ == '__main__':
spider = YingXLMSpider()
spider.main()
Let's see the results :
Finally, I will summarize the content of this chapter :
Next article , be known as 《Python Reptiles from getting started to giving up 09 | Python Reptile battle – Download Netease cloud music 》.