Preface

I found that many people need news interface , So I went to search , It is found that there is a corresponding user on Zhihu who publishes news bulletins every day , So I want to write a news crawler . If you want to make an interface , You can add flask The module can , Here, I will just write the crawler part for the time being .

Target site

website :https://www.zhihu.com/people/mt36501

Go in through this website , I just want today's content , So you have to filter .

Start writing code

# Import the library to use 
import requests, re, time
# Target website
url = 'https://www.zhihu.com/people/mt36501'
# Simulation request header
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362',
'Accept': 'image/png, image/svg+xml, image/*; q=0.8, */*; q=0.5',
}
# Request URL to return content
resp = requests.get(url,headers=headers).text
# Filter title
h2 = re.findall(r'<h2 class="ContentItem-title">.*?</h2>', resp, re.S)
# Traverse every title , Because I find that sometimes I will send some content that I don't want to do with the news
for i in h2:
# Get current date
now_time = time.strftime("%#m month %#d Japan ", time.localtime())
# Filter out links
link = re.findall(r'href="(.*?)"', str(i), re.S)[0]
# Filter out the title
title = re.findall(r'Title">(.*?)</a>', str(i), re.S)
# If it's empty, skip
if title == []:
continue
else:
# Get the date of the article
title = str(title[0]).split(',')[0]
# Compare article date with current date
if title == now_time and link != '':
#print(title, link)
# If the date is today , Request the corresponding URL , Get the content of the corresponding article
con_resp = requests.get('https:' + link, headers=headers).text
# As long as we want the content , And filter out some characters
p = re.findall(r'<p>(.*?)</p>', con_resp.replace('"', '"').replace('&amp;', '&'), re.S)
sum = 0
text = ''
# Go through each piece of news and assign it to text
for index, i in enumerate(p):
sum += 1
if sum == 1 | sum == 3:
continue
print(i)
elif i == '':
print(i)
continue
else:
if index == len(p) - 1:
text += i
else:
text += i + '\n\n'
print(text)

Python The reptile knows the article , Gathering news 60 More related articles in seconds

  1. python Reptiles know how to answer questions

    python Reptiles know how to answer questions import cookielibimport base64import reimport hashlibimport jsonimport rsaimport binasci ...

  2. python Reptiles know

    I've written an article about using python Crawler crawls the blog of movie paradise resources , The key is how to parse the page and improve the efficiency of the crawler . Because all people have the same access to resources in the movie paradise , So you don't need to do login verification , After writing that article, I spent some time studying ...

  3. How to use it Python Crawler sends morning news to wechat group ?( detailed )

    1. scene There are always friends asking me in the communication group , How to get the morning news every day ? Actually , Early use of the program , The crawler is used to get some news headlines , And then did some simple data cleaning , The use of itchat Send to the specified community . ...

  4. python Reptiles - Zhihu login

    #!/usr/bin/env python3 # -*- coding: utf-8 -*- ''' Required - requests ( must ) - pillow ( Optional ) ''' import ...

  5. python Reptiles , Crawling through a series of news

    The requirements for this assignment come from :https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/2941. Due to multiple requests , So encapsulate the request a little bit as follows def tr ...

  6. Python Reptiles —— Know it selenium Simulated landing access cookies+requests.Session() visit +session serialize

    The code is as follows : # coding:utf-8 from selenium import webdriver import requests import sys import time from lxml ...

  7. [python Reptiles ] Selenium Directional climbing Tiger Leaping basketball massive exquisite pictures

    Preface : As a basketball fan since childhood , I often go to the forum of Hupu basketball and dampness , There will be many beautiful pictures in the forum , Include NBA The team .CBA star . Fringe news . Shoes, beauty, etc , If a right key is saved as, it's really a little painful . As a programmer ...

  8. Python Introduction to reptiles 26-100 Zhihu article picture crawler 2

    1. Zhihu article picture crawler 2 blog background Yesterday, I wrote a part of the code of zhihu article image crawler , Answer to zhihu question json Data was captured , Some of the dead writing appeared on the blog , Get that information adjusted today , And download the picture into the code ...

  9. Python Introduction to reptiles 25-100 One of the picture crawlers in Zhihu article

    1. Zhihu article picture written in front Try to climb zhihu today , Have a look at this website all have what fun content to be able to crawl , I may write a few articles from time to time , Let's get the easiest one first , All answers to a single article , How easy is it to climb . Find out what we want ...

  10. Python The reptile simulated landing

    I've written an article about using python Crawler crawls the blog of movie paradise resources , The key is how to parse the page and improve the efficiency of the crawler . Because all people have the same access to resources in the movie paradise , So you don't need to do login verification , After writing that article, I spent some time studying ...

Random recommendation

  1. BPM End to end process solution sharing

    One . Demand analysis 1. The continuous development of enterprise scale . The continuous improvement of management level , It is usually accompanied by a more detailed division of labor among the business sectors of an enterprise . More professional ,IT There are also more and more systems . More and more specialized Industrialization . ineluctable , Departmental walls and information silos have emerged , The process of an enterprise is controlled by a department or I ...

  2. Two js Function intermodulation between files

    According to common sense , stay <body> Before the end of the tag, introduce two js file <script src="a.js"></script> <scri ...

  3. frequently-used JS HTML DOM event

    HTML DOM event HTML DOM Events allow Javascript stay HTML Register different event handlers in document elements . Events are often used in conjunction with functions , The function is not executed before the event ! ( If the user clicks the button ). Tips :  stay ...

  4. apache ab Download tests

    http://httpd.apache.org/docs/2.0/programs/ab.html-->http://httpd.apache.org/docs/current/platform ...

  5. 【HDU4578 Transformation】 Line segment tree

    Topic link :http://acm.hdu.edu.cn/showproblem.php?pid=4578 The question : There's a sequence , There are four operations : 1: Section [l,r] Add all the numbers in c. 2: Section [l,r] All the numbers in are ...

  6. 【 primary 】 Just IOS Release app A way to protect text resources when using text

    A recent one app Is local , The data source comes from a local .json file , The data inside is this app Soul . We're going to release this app 了 , I'm worried about the post release .ipa The package is unwrapped by competitors and the information leaks . My strategy is : I put it when I packed it ...

  7. EASYUI+MVC4 General rights management platform -- Preface

    After years of development of management information system , I learned from some problems in my work , After my own summary , It has formed a relatively complete development platform for general authority management of management information system . In the process of software development, the first thing we need to solve is UI problem , And then there's the browser ...

  8. Memcached note ——( Four ) Dealing with high concurrency attacks 【 turn 】

    http://snowolf.iteye.com/blog/1677495 Nearly half a month has been very painful , It's mainly after the product goes online , It has attracted numerous malicious attacks from machine users , Constantly refresh all service portals of products , Creating garbage data , Consumption of resources . Their most ...

  9. Linux difference chown and chmod Usage of

    chown Usage to change the user name and user group of a directory or file chown user name : Group name File path ( It can be an opposite path or a relative path ) example 1:chown root:root /tmp/tmp1 Is to put tmp Under the tmp ...

  10. Random forest and decision tree model combination GBDT( turn )

    Copyright notice : This paper is written by LeftNotEasy Published on http://leftnoteasy.cnblogs.com, This article can be reprinted in full or used in part , But please note the source , If there are questions , Please contact the wheeleast@gm ...