This blog is only for my spare time to record articles , Publish to , Only for users to read , If there is any infringement , Please let me know , I'll delete it .
This article is pure and wild , There is no reference to other people's articles or plagiarism . Insist on originality ！！
Hello . Here is Python Reptiles from getting started to giving up series of articles . I am a SunriseCai.
Combined with video food , Better taste .
This article mainly introduces the use of Selenium The crawler program goes obtain Jingdong Mall The goods .
With the help of
Selenium Can be very convenient for some simulation of human browser operation .
In other words ,selenium Will drive the browser to execute your commands .
First , It is suggested to move to official documents for systematic learning
Selenium,Selenium Official documents
Selenium There is a lot of knowledge , Please refer to the official documents to learn the system .
selenium It only takes three steps to run the environment ：
pip install selenium
webdriverdrive , This article uses chrome Do a demonstration .
webdriverDownload page here , To find the corresponding chrome Version of the driver can be downloaded .（ As long as the version number is matched before , It doesn't have to be all the same ）
There are two ways to configure environment variables ：
（1） Put it in Python Install under directory Scripts Inside the folder
（2） Specify... At the time of execution webdriver Drive path .
For example, I will
chromedriver.exeDrive on F The root directory of the disk , That's going on selenium It can be used when executable_path Appoint webdriver drive .
from selenium import webdriver url = 'http://www.baidu.com' browser = webdriver.Chrome(executable_path=r'F:\chromedriver.exe') browser.get(url)
Or the code above , Use Baidu as an example .
from selenium import webdriver url = 'http://www.baidu.com' browser = webdriver.Chrome(executable_path=r'F:\chromedriver.exe') # Create a Google browser object browser.get(url) # request url browser.save_screenshot('baidu.jpg') # Screenshot browser.quit() # Exit browser object
After code execution , You will find that there is an additional local folder named
baidu.jpg Pictures of the , Look at the picture below .
Open the home page , In the search box, type the product you want to search . Then click search
Here we use
masks Product demonstration .
Get the search box's xpath expression , Here's the picture ：
The code is as follows ：
from selenium import webdriver class JdSpider(object): def __init__(self): self.url = 'https://www.jd.com/' options = webdriver.ChromeOptions() options.add_argument("--start-maximized") # Maximize the run window Don't add self.browser = webdriver.Chrome(executable_path=r'F:\chromedriver.exe', options=options) def __del__(self): self.browser.close() def search_goods(self): self.browser.get(self.url) self.browser.find_element_by_xpath('//*[@id="key"]').send_keys(' masks ') # Input content self.browser.find_element_by_xpath('//*[@id="search"]/div/div/button').click() # Simulation click button if __name__ == '__main__': spider = JdSpider() spider.search_goods()
The result of running the code is as follows ：
The next thing to extract is
Name of commodity ,
Number of evaluators ,
Shop name .
The following parts will not be parsed , Have some homework left ！！！
The complete code can refer to this blog ,
python Introduction to reptiles * selenium Crawl all the commodity information of JD – Click through , you 're right , It's also my water blog .
Finally, I will summarize the content of this chapter ：
Next article , be known as 《Python Reptiles from getting started to giving up 12 | Conclusion 》.