python Crawl all people's location information —— Tencent location big data
Tencent location big data
Website
https://heat.qq.com/
Introduction to website functions
Tencent location big data will provide all users with the data information of Tencent location , But there's no user information ( Only location information ). From this website, we can get all the products using Tencent location service ( WeChat 、QQ、 Tencent map 、 Jingdong and meituan ) The number of times a person has been positioned anywhere in the world , It's convenient for us to estimate the population 、 Business analysis and scientific research .
Data analysis
POST request
Open developer tools , find POST request . What needs to be noted here is , The website will be every five minutes POST once getXingyunPoints request , So it takes five minutes to see this post request .
Request parsing
wait until post When the request appears , Let's click on the details , every time getXingyunPoints request , All four . In the request message , Here's what we need to focus on locs This content , It's made up of a lot of data , In English “,” separate ;
Every three consecutive data is a set of data , This set of data contains longitude 、 latitude 、 The number of , According to the geographical standards , Two places are reserved after the decimal point of latitude and longitude, that is to say , In the form of xxx.xx The format of , The requested data is as follows :
3220,11895,2,3075,11535,2,......
It can be interpreted as :
latitude 32.20, longitude 118.95, The number of people 2
latitude 30.75, longitude 115.35, The number of people 2
…
python Code
"""
@author: food C++ chicken Java Jiangzl
@Description: Used to crawl Tencent location big data information , Once again , Tencent location big data has never been said to be accurate data , It's not the complete data , Hair paper Think twice
"""
import requests
import json
import pandas as pd
import time
def get_TecentData(count=4, rank=0, increNum=0): # The default is from rank from 0 Start (tecent once post Meeting post The four time )
url = 'https://xingyun.map.qq.com/api/getXingyunPoints'
content = ''
paload = {
'count': count, 'rank': rank}
response = requests.post(url, data=json.dumps(paload))
datas = response.text
dictdatas = json.loads(datas) # dumps Yes, it will dict Turn it into str Format ,loads Yes, it will str Turn it into dict Format
locs = dictdatas["locs"] # re-extract content( This needs further analysis to extract longitude and latitude and positioning times )
locss = locs.split(",")
temp = [] # Make a makeshift container
for i in range(int(len(locss) / 3)):
lat = locss[0 + 3 * i] # Get latitude
lon = locss[1 + 3 * i] # Get longitude
count = locss[2 + 3 * i]
# Get Shaanxi data --- Get data from every place , Just change here
# Take a chestnut -- The requested metadata is an integer , North latitude 10 To 20 Between degrees
# Namely 1000<int(lat)<2000
if(3142<int(lat)<3935 and 10529<int(lon)<11115):
temp.append([int(lat) / 100, int(lon) / 100, count]) # Storing data in containers : latitude , Longitude and positioning times
# Data collation
result = pd.DataFrame(temp)
result.dropna()
result.columns = ['lat', 'lon', 'count']
result.to_csv('TecentData'+str(increNum)+'.txt', mode='a', index=False) # model="a",a It means append, We can get the data all the way to TecentData.txt Middle append
if __name__ == '__main__':
# If you want to generate a file every how often , Just release the comments below , Just get rid of that # Well No ,0 Change to k
#sleep(number) number: Time interval between , Get the data every other minute, and that's it sleep(60)
#while (1):
#for k in range(1000000):
for i in range(4):
get_TecentData(4, i, 0)
#time.sleep(60)
Add
When parsing data, pay attention to the running time and data filtering , If you don't screen , Each request is 10M Traffic , Press once a second , The average disk may be full in an hour .
Any questions , Can be private chat or comments to add , We can also actively explore , learn from each other .