Before we know what anti crawler means , Let's first take a look at what a reptile is ?
What is a reptile
In modern society , The Internet is full of useful data , We just need to be patient , Plus some technical means , You can get a lot of valuable data . there " Technical means " It's a web crawler .
A crawler is a program that automatically gets the content of a web page , Search engines, for example ,Google,Baidu etc. , There's a huge crawler system running every day , Crawling data from websites around the world , For users to use when searching .
Malicious crawlers not only take up a lot of website traffic , Cause users with real needs can't enter the website , At the same time, it may also cause the leakage of key information of the website , Affect website or app Normal operation of .
So for websites with high data value , Website developers will give some technical means for web crawlers .
Want to achieve a simple case of crawler , You can go to the article I wrote before :
Common anti crawler measures
generally speaking , We will subdivide the anti crawler methods from the characteristics , Can be divided into information verification anti crawler 、 Dynamic rendering anti crawler 、 Text obfuscation anti crawler 、 Behavior verification, anti crawler, etc .
The text obfuscation anti crawler is the most interesting one , The behavior verification anti crawler is the most difficult one .
Text obfuscation anti crawler
Text obfuscation is simply how to effectively avoid crawler acquisition Web Important text data in application . The premise of anti crawler is that it can't affect the user's normal browsing and reading text content , It's easy to see directly confused text , Therefore, developers usually use the mapping relationship between fonts to achieve confusion .
for example : Text mapping of Auto Home Forum .
Here, through font mapping for some special characters , When a web crawler collects data, it can't get complete data directly , And it does not affect the normal reading of normal users .
Dynamic rendering anti crawler
With the continuous iteration of technology in the era , More and more websites have changed from static data loading to dynamic data loading , And in the process of dynamic loading is accompanied by more and more data encryption .
Dynamic data loading is easy to understand , Let the browser load the general framework of the website first , After completion, send asynchronous request to complete data filling , In the process of sending the request, by encrypting the request parameters , To block very low-level crawler scripts .
for example : Red man data set ---js Parameter encryption
Here, when sending an asynchronous request , Verify key parameters , Directly intercept some of the most basic crawler requests , It is necessary to simulate the process of parameter encryption , In order to get the data normally .
Behavior verification anti crawler
Behavioral captcha is a kind of popular captcha . Understand... Literally , It is to complete the verification through the user's operation behavior , Without the need to read distorted pictures and words . There are two kinds of common : Drag and touch .
for example :12306 Login verification code --- Touch behavior verification
After identifying the image according to the user , Make a choice to judge , Whether the request is currently made by a normal user , Used to block out low tech crawler programs .
Finally, crawler and anti crawler are the battle of wits and bravery among Internet development engineers . As a website developer, we should master the technology of crawler , We need to learn more about how to implement anti crawler .
If you want to further study, you can continue to pay attention to , Next, we will update a series of specific anti crawler solutions for websites .
Thank you for attention ~
Need more python Related to the source code , It can be in mine git Take it from the warehouse , There are also Java And big data related code , If you want to learn, you can take it by yourself, and it will be updated in the future
For starters , stay readme in , I also wrote about python Some initial introductions of , You can check it yourself
Official account :Java Architects Alliance , Be a versatile coder