Hands on hands to teach you to use Python crawling storage data, but also automatic visualization in Excel!

Watch the corner. 2020-11-14 11:21:30
hands hands teach use python


Hello everyone , We talked about how to use Python Build a with GUI The crawler applet for , A lot of this article will cater to the hot spots , Continuation of the last NBA Reptiles GUI, To explore how to climb the Tiger flutter NBA Official website data .  And write the data to Excel At the same time, the line chart is automatically generated in , There are mainly the following steps

This article will be divided into the following two parts to explain

  • In the tiger NBA Crawler in player page of official website , Get player data .
  • Cleaning and finishing Crawling player data , On the visualization .

The main project involves Python modular :

  • requests
  • pandas
  • bs4

 


Many people study python, I don't know where to start .
Many people study python, After mastering the basic grammar , I don't know where to look for cases to start .
A lot of people who have done cases , But I don't know how to learn more advanced knowledge .
So for these three kinds of people , I will provide you with a good learning platform , Get a free video tutorial , electronic text , And the source code of the course !
QQ Group :1156465813

 

   The reptile part

Crawler part of the finishing ideas are as follows

Observe URL1 The source code to find the team name and corresponding URL2 Observe URL2 Source code to find the player corresponding to URL3 Observe URL3 Source code to find the corresponding player basic information and game data and filter storage

Actually, reptiles are just html On the operation , and html Its structure is very simple, there is only one , It's a big box for a small one , The small frame is covered with a small frame , Such layers of nesting .

The goal is URL as follows :

  • URL1:http://nba.hupu.com/players/
  • URL2( Take the Laker team as an example ):https://nba.hupu.com/players/lakers
  • URL3( Take James for example ):https://nba.hupu.com/players/lebronjames-650.html

First reference the module

from bs4 import BeautifulSoup
import requests
import xlsxwriter
import os

see URL1 Source code , You can see the team name and its corresponding URL2 stay span In the label <span class><a href = “..."> Next , And then find its parent box and grandfather box , The following ideas are the same , The graph is as follows :

 

here , Can pass requests Module and bs4 Modules are indexed purposefully , Get a list of team names .

def Teamlists(url):
    TeamName=[] 
    TeamURL=[] 
    GET=requests.get(URL1)
    soup=BeautifulSoup(GET.content,'lxml')
    lables=soup.select('html body div div div ul li span a') 
    for lable in lables:
        ballname=lable.get_text()
        TeamName.append(ballname)
        print(ballname)
    teamname=input(" Please enter the name of the team you want to query :")# Here it can be changed to GUI Key values in the interface
    c=TeamName.index(teamname)
    for item in lables:
     HREF=item.get('href')
     TeamURL.append(HREF)
    URL2=TeamURL[c] 
    return URL2

So we got the team's URL2, Then observe URL2 The content of the web page , You can see the name of the player in the label a in <a target = "_blank" href = ....> Next , At the same time, it also stores the corresponding player's URL3, Here's the picture :

here , So it's still through requests Module and bs4 Modules are indexed accordingly , Get a list of player names and the corresponding URL3.

# User defined function to get the list of team members and the corresponding URL
def playerlists(URL2):
    PlayerName=[] 
    PlayerURL=[] 
    GET2=requests.get(URL1)
    soup2=BeautifulSoup(GET2.content,'lxml')
    lables2=soup2.select('html body div div table tbody tr td b a')
    for lable2 in lables2:
        playername=lable2.get_text()
        PlayerName.append(playername)
        print(playername)
    name=input(" Please enter player name :") # Here it can be changed to GUI Key values in the interface
    d=PlayerName.index(name)
    for item2 in lables2:
     HREF2=item2.get('href')
     PlayerURL.append(HREF2)
    URL3=PlayerURL[d]
    return URL3,name

Now you've got the team's URL3, Then observe URL3 The content of the web page , You can see the player's basic information in the label p Next , Players' regular season career data and playoff career data are in the label td Next , Here's the picture :

Again , Still pass requests Module and bs4 Modules are indexed accordingly , Get the player's basic information and career data , And for the player's regular season and playoff career data will be screened and stored , obtain data list .

def Competition(URL3):
    data=[]
    GET3=requests.get(URL3)
    soup3=BeautifulSoup(GET3.content,'lxml')
    lables3=soup3.select('html body div div div div div div div div p')
    lables4=soup3.select('div div table tbody tr td')
    for lable3 in lables3:
     introduction=lable3.get_text() 
     print(introduction)  # Basic player information
    for lable4 in lables4:
        competition=lable4.get_text()
        data.append(competition) 
    for i in range(len(data)):
        if data[i]==' Career regular season averages ':
            a=data[i+31]
            a=data.index(a)
    del(data[:a]) 
    for x in range(len(data)):
        if data[x]==' Career playoff averages ':
            b=data[x]
            b=data.index(b)
    del(data[b:])
    return data

Through the above crawler, we get the following data , While providing visual data Easy to bind after GUI Interface key events

  • obtain NBA Standard names for all teams in ;
  • Get the standard names of all players in the team through a designated team ;
  • Get the corresponding basic information and regular season and playoff data through designated players ;

   Visualization part

Ideas : Create folder   Create tables and line charts

Custom functions create tables , Application os Module to write , Return to the path of the created folder , The code is as follows :

def file_add(path):  # The internal function at this point path But with GUI Interface Statictext binding
    creatpath=path+'\\Basketball' 
    try:
     if not os.path.isdir(creatpath):
      os.makedirs(creatpath)       
    except:
     print(" Folder exists ")
    return creatpath

Application xlsxwriter Modules in creatpath Under the path Custom function creation excel Tables put data and construct line graphs at the same time , The code is as follows :

def player_chart(name,data,creatpath):
    # This is the table name —— Player name +chart
    EXCEL=xlsxwriter.Workbook(creatpath+'\\'+name+'chart.xlsx')
    worksheet=EXCEL.add_worksheet(name) 
    bold=EXCEL.add_format({'bold':1}) 
    headings=data[:18]
    worksheet.write_row('A1',headings,bold) # Write header
    num=(len(data))//18
    a=0
    for i in range(num):
        a=a+18
        c=a+18
        i=i+1
        worksheet.write_row('A'+str(i+1),data[a:c]) # Write data
    chart_col = EXCEL.add_chart({'type': 'line'}) # Create a line chart
    chart_col.add_series({
        'name': '='+name+'!$R$1', # Set the broken line description name
        'categories':'='+name+'!$A$2:$A$'+str(num), # Set chart category label range
        'values': '='+name+'!$R$2:$R$'+str(num-1),    # Set chart data range
        'line': {'color': 'red'}, })   # Set chart line properties
    # Set the title of the icon and want to x,y Axis information
    chart_col.set_title({'name': name+' Career average regular season scores '}) 
    chart_col.set_x_axis({'name': ' year  ( year )'}) 
    chart_col.set_y_axis({'name': ' Average score ( branch )'})
    chart_col.set_style(1) # Set chart style
    worksheet.insert_chart('A14', chart_col, {'x_offset':25, 'y_offset':3,}) # Insert the icon into the workbench and set the offset
    EXCEL.close()

Data table effect display , Take James as an example

And turn on auto generated Excel, The corresponding line chart is displayed directly , There's no need to rearrange !

Now combine task one's crawler with task two's data visualization , Can get real-time player regular season data and playoff data summary , There's also a real-time line chart of players' careers .

 

Be careful : If you're looking for python Well paid jobs . I suggest you write more about real enterprise projects and accumulate experience . Or you won't find a job , Of course, a lot of people have never been in a business , How can there be project experience ? So you have to find more enterprise projects and practice more . If you're lazy and don't want to find , You can also enter my Python Circle of communication :1156465813. There are some real enterprise project cases that I have written before in the group file . You can take it to study , If you don't understand, you can find me in your skirt , I'll answer you patiently when I have time .

 

The following is useless , For this blog to be crawled by search engines
(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)
python What is it Zero basis to learn python How long will it take? python Why is it called a reptile
python Novice reptile tutorial python Crawler universal code python How do reptiles make money
python Basic course Web crawler python python Classic examples of reptiles
python Reptiles
(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)
The above is useless , For this blog to be crawled by search engines

版权声明
本文为[Watch the corner.]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database