Project description : The project passed requests and re Regular expressions crawl the skin posters of all heroes in the League of Heroes


Reptiles

    Ideas

Open the heroes league website (http://lol.qq.com/) You can see that the content inside is dynamically loaded , There is no link to the hero page in the returned source code, but there is a link to generate the hero list JavaScript Function of , So you can't get all the links of poster images through the normal crawler method . But through observation, we can see that the address of all hero posters is : "http://ossweb-img.qq.com/images/lol/web201310/skin/big+ Skin number .jpg" The format of , It's easy to see that skin numbers are actually heroes id+3 It's made up of numbers ,3 The number of digits represents the number of skin posters , So we just need to get all the heroes id You can get all the hero's poster Links .


Get the hero's name and id

Open the heroes league website , According to the analysis of the source code of the official website, get the link to visit this "http://lol.qq.com/biz/hero/champion.js" Found this champion.js It just has all the hero names we want and id.

The following is the content of the web page

 picture

Make up poster Links

Through observation and analysis, we can find that the address of all hero posters is :"http://ossweb-img.qq.com/images/lol/web201310/skin/big+ Skin number .jpg" The format of , So we get heroes from above id Can be spliced into a poster download link , And store it in an array

Download the hero skin poster

Download each poster one by one , The picture is named after the hero

Code

 picture

Crawling results

 picture

This time lol The official website is dynamically loaded , Therefore, it will be troublesome to parse the web page , We find the rule by analyzing the link characteristics of its hero skin poster , Thus greatly reducing the workload of our project