This blog is only for my spare time to record articles , Publish to , Only for users to read , If there is any infringement , Please let me know , I'll delete it .
This article is pure and wild , There is no reference to other people's articles or plagiarism . Insist on originality !!
Hello . Here is Python Reptiles from getting started to giving up series of articles . I am a SunriseCai.
Combined with video food , Better taste .
This article mainly introduces the use of Selenium The crawler program goes obtain Jingdong Mall The goods .
With the help of Selenium
Can be very convenient for some simulation of human browser operation .
In other words ,selenium Will drive the browser to execute your commands .
First , It is suggested to move to official documents for systematic learning Selenium
,Selenium Official documents
Selenium
There is a lot of knowledge , Please refer to the official documents to learn the system .
Computer configuration selenium
It only takes three steps to run the environment :
selenium
modular webdriver
drive Selenium
modular .pip install selenium
webdriver
drive , This article uses chrome Do a demonstration .browser | |
---|---|
Chrome | http://npm.taobao.org/mirrors/chromedriver/ |
Edge | https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ |
Bold style | https://github.com/mozilla/geckodriver/releases |
Safari | https://webkit.org/blog/6900/webdriver-support-in-safari-10/ |
80.0.3987.149
webdriver
Download page here , To find the corresponding chrome Version of the driver can be downloaded .( As long as the version number is matched before , It doesn't have to be all the same )There are two ways to configure environment variables :
(1) Put it in Python Install under directory Scripts Inside the folder
(2) Specify... At the time of execution webdriver Drive path .
For example, I will
chromedriver.exe
Drive on F The root directory of the disk , That's going on selenium It can be used when executable_path Appoint webdriver drive .
from selenium import webdriver
url = 'http://www.baidu.com'
browser = webdriver.Chrome(executable_path=r'F:\chromedriver.exe')
browser.get(url)
Or the code above , Use Baidu as an example .
from selenium import webdriver
url = 'http://www.baidu.com'
browser = webdriver.Chrome(executable_path=r'F:\chromedriver.exe') # Create a Google browser object
browser.get(url) # request url
browser.save_screenshot('baidu.jpg') # Screenshot
browser.quit() # Exit browser object
After code execution , You will find that there is an additional local folder named baidu.jpg
Pictures of the , Look at the picture below .
Open the home page , In the search box, type the product you want to search . Then click search
Here we use masks
Product demonstration .
Get the search box's xpath expression , Here's the picture :
The code is as follows :
from selenium import webdriver
class JdSpider(object):
def __init__(self):
self.url = 'https://www.jd.com/'
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized") # Maximize the run window Don't add
self.browser = webdriver.Chrome(executable_path=r'F:\chromedriver.exe', options=options)
def __del__(self):
self.browser.close()
def search_goods(self):
self.browser.get(self.url)
self.browser.find_element_by_xpath('//*[@id="key"]').send_keys(' masks ') # Input content
self.browser.find_element_by_xpath('//*[@id="search"]/div/div[2]/button').click() # Simulation click button
if __name__ == '__main__':
spider = JdSpider()
spider.search_goods()
The result of running the code is as follows :
The next thing to extract is Price
, Name of commodity
, Number of evaluators
, Shop name
.
The following parts will not be parsed , Have some homework left !!!
The complete code can refer to this blog ,python Introduction to reptiles * selenium Crawl all the commodity information of JD
– Click through , you 're right , It's also my water blog .
Finally, I will summarize the content of this chapter :
Next article , be known as 《Python Reptiles from getting started to giving up 12 | Conclusion 》.