Multi person learning python, I don't know where to start .
Many people study python, After mastering the basic grammar , I don't know where to look for cases to start .
A lot of people who have done cases , But I don't know how to learn more advanced knowledge .
So for these three kinds of people , I'll give it up
Home provides a good learning platform , Get a free video tutorial , electronic text , And the source code of the course ！??¤
QQ Group ：1057034340
Big data era , For data analysis , First of all, there must be data sources , Just rely on the drizzle of the company ( data ), It's not enough to analyze loneliness , Only by learning to crawl , From the outside ( Website ) Crawl into something related to 、 Useful data , To give the boss a basis for making business decisions , And you , It's also the boss .
When it comes to the boss , Beautiful little MM, I'm so excited , Ask out loud at once ： You IT world , The most handsome is not that does search engine boss Li ?
I'm a little unconvinced , A little unhappy , But how can I get , After all, in terms of web crawlers , He ( Boss Li ) Technology is better than it is . He knows how to use reptiles , Crawling through massive Internet information every day , Crawling up high-quality information and recording it in his database . When users are in search engines , When entering keywords , The engine system will analyze and process the keywords , Find out from the relevant pages , Sort according to certain ranking rules and present the results to users .
The thought of ranking makes money, Boss Li doesn't give me a cent , I'll talk to people MM say ： Okay , I won't talk to you , I want to talk to my old fellow about the principle of web crawler. , You're a creep , Go see your boss .
- What is a reptile
Web crawler is also called web spider 、 Internet ants 、 Network machines, etc , It follows the rules we set , Crawling data on the network . There will be in the results of climbing HTML Code 、JSON data 、 picture 、 Audio or video . Programmer according to the actual requirements , Filter data , Extract the useful , For storage .
White point , Just use Python Programming language simulation browser , Visit the designated website , Return the result , Filter according to the rules and extract the data you need , Store and use , For use .
You've seen me 《 The first 10 God | 12 Sky fix Python, File operations 》 and 《 The first 11 God | 12 Sky fix Python, Database operation 》 Old fellow iron , You should know , Data often exists in a file or database .
- Crawling process
How users access network data through browser ： Open the browser -> Enter url -> Browser submit request -> Download Web code -> Parse to page .
Crawler programming , Specify the web address , Impersonate a browser to send a request ( Get web code )-> Extract useful data -> Stored in a file or database .
Crawler programming , Recommend to use Python, Because Python The crawler library is easy to use , stay Python In the built-in environment , Can satisfy most functions . It can ：
(1) use http The library makes a request to the target site , Send a Request( Including request header and request body );
(2) To the server Response, Use the built-in Library (html、json、 Regular expressions ) It will be analyzed
(3) Store the required data in a file or database .
If Python If the built-in library is not enough , It can be used pip install Library name , Quick download page 3 And use it .
- Climbing point positioning
In the process of writing crawler code , It is often necessary to specify the node or path to crawl . If I told you ,Chrome browser , You can quickly get the node or path , I'll see if you can install the computer right away ？
If so , That's right , I won't , Go ahead and install it .
In the page , Press the keyboard F2 key , Display source code . Select the node you want to get , Right click 【 Check 】 You can locate it in the code , Right click code , choice 【Copy】-【Copy Selector 】 or 【Copy XPath】 You can copy the contents of the node or path .
Okay , About crawler principle , Lao Chen is finished , If you think it helps you , I hope the old fellow can forward the praise. , Let more people see this article . Your forwarding and likes , It is the greatest encouragement for Lao Chen to continue to create and share .