Python data visualization, full version of the practical guide!

Python cola 2021-04-06 16:57:41
python data visualization version practical


Hello everyone , Today let's take a look at using Python The main libraries for data visualization and all types of charts that can be done with them . We'll also see which libraries are recommended for each case and the unique features of each library .

We'll start with the most basic visualization , Check the data directly , Then continue with the chart , Finally, make an interactive chart .

 picture

Data sets

We will use two datasets to accommodate the visualization shown in this article , The dataset can be downloaded from the link below .

Data sets :github.com/albertsl/dat

These data sets are all three terms related to artificial intelligence ( Data Science , Machine learning and deep learning ) Search the Internet for popularity data , Extracted from search engines .

The dataset contains two files temporal.csv and mapa.csv. In this tutorial , The first one we're going to use more includes over time ( from 2004 Year to 2020 year ) Popularity data of three terms . in addition , I added a categorical variable (1 and 0) To demonstrate the function of a chart with classified variables .

mapa.csv The document contains information by country / Popularity data by Region . In the final visualization map , We'll use it .

Pandas

Before introducing more complex methods , Let's start with the most basic method of visualizing data . We will only use pandas to look at the data and understand how they are distributed .

The first thing we need to do is visualize some examples , See which columns these examples contain 、 What information and how to encode values, etc .


import pandas as pd
df = pd.read_csv('temporal.csv')
df.head(10) #View first 10 data rows

 picture

Use the command to describe , We'll see how the data is distributed , Maximum , minimum value , mean value ……


df.describe()

 picture

Use info command , We'll see the type of data each column contains . We can find a list of things , When using head When the command is checked , The column appears to be numeric , But if we look at the follow-up data , Then the value in string format will be encoded as a string .


df.info()

 picture

Usually ,pandas Will limit the number of rows and columns displayed . This can be bothering many programmers , Because we all want to be able to visualize all the data .

 picture

Use these commands , We can add restrictions , And you can visualize the whole data . For large datasets , Please use this option carefully , Otherwise they may not be displayed .


pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)

Use Pandas style , We can get more information when we look at the form . First , We define a format Dictionary , So that the numbers can be displayed in a clear way ( Display a certain number of decimals in a certain format 、 Dates and hours , And use percentages 、 Currency, etc ). Don't panic , This is just showing without changing the data , There won't be any problems in the future .

To give examples of each type , I added currency and percentage symbols , Even if they don't make any sense to this data .


format_dict = {'data science':'${0:,.2f}', 'Mes':'{:%m-%Y}', 'machine learning':'{:.2%}'}
#We make sure that the Month column has datetime format
df['Mes'] = pd.to_datetime(df['Mes'])
#We apply the style to the visualization
df.head().style.format(format_dict)

We can use color to highlight the maximum and minimum values .


format_dict = {'Mes':'{:%m-%Y}'} #Simplified format dictionary with values that do make sense for our data
df.head().style.format(format_dict).highlight_max(color='darkgreen').highlight_min(color='#ff0000')

 picture

We use color gradients to display data values .


df.head(10).style.format(format_dict).background_gradient(subset=['data science', 'machine learning'], cmap='BuGn')

 picture

We can also use bars to display data values .


df.head().style.format(format_dict).bar(color='red', subset=['data science', 'deep learning'])

 picture

Besides , We can also combine the above functions and generate more complex visualizations .


df.head(10).style.format(format_dict).background_gradient(subset = ['data science','machine learning'],cmap ='BuGn').highlight_max(color ='yellow')

 picture

Pandas analysis

Pandas Analysis is a library , We can use our data to generate interactive reports , We can see the distribution of the data , Types of data and possible problems . It's very easy to use , Just three lines , We can generate a report , The report can be sent to anyone , Even if you don't know programming, you can use .


from pandas_profiling import ProfileReport
prof = ProfileReport(df)
prof.to_file(output_file='report.html')

 picture

Matplotlib

Matplotlib Is the most basic library for visualizing data graphically . It contains a lot of graphics that we can think of . Just because it's basic doesn't mean it's not powerful , Many of the other data visualization libraries we're going to talk about are based on it .

Matplotlib There are two main parts to the chart , I.e. axis ( The line that defines the area of the chart ) And graphics ( We draw the axis in it , Titles and things from the axis area ), Now let's create the simplest diagram :


import matplotlib.pyplot as plt
plt.plot(df['Mes'], df['data science'], label='data science') #The parameter label is to indicate the legend. This doesn't mean that it will be shown, we'll have to use another command that I'll explain later.

 picture

We can plot multiple variables in the same graph , And then compare them .


plt.plot(df ['Mes'],df ['data science'],label ='data science')
plt.plot(df ['Mes'],df ['machine learning'],label ='machine learning ')
plt.plot(df ['Mes'],df ['deep learning'],label ='deep learning')

 picture

It's not clear which variable each color represents . We will improve the chart by adding legends and titles .


plt.plot(df['Mes'], df['data science'], label='data science')
plt.plot(df['Mes'], df['machine learning'], label='machine learning')
plt.plot(df['Mes'], df['deep learning'], label='deep learning')
plt.xlabel('Date')
plt.ylabel('Popularity')
plt.title('Popularity of AI terms by date')
plt.grid(True)
plt.legend()

 picture

If you are using it from a terminal or script Python, After using the function definition diagram we wrote above , Please use plt.show(). If you are using Jupyter Notebook, Before making the chart , take %matplotlib Add inline to the beginning of the file and run it .

We can make multiple figures in one figure . This is very useful for comparing charts or for easily sharing data from several chart types through a single image .


fig, axes = plt.subplots(2,2)
axes[0, 0].hist(df['data science'])
axes[0, 1].scatter(df['Mes'], df['data science'])
axes[1, 0].plot(df['Mes'], df['machine learning'])
axes[1, 1].plot(df['Mes'], df['deep learning'])

 picture

We can draw graphs with different styles for the points of each variable :


plt.plot(df ['Mes'],df ['data science'],'r-')
plt.plot(df ['Mes'],df ['data science'] * 2,'bs')
plt .plot(df ['Mes'],df ['data science'] * 3,'g ^')

 picture

Now let's look at some uses Matplotlib Examples of different graphics that can be done . Let's start with a scatter plot :


plt.scatter(df['data science'], df['machine learning'])

 picture

Bar chart example :


plt.bar(df ['Mes'],df ['machine learning'],width = 20)

 picture

Histogram example :


plt.hist(df ['deep learning'],bins = 15)

 picture

We can add text to the graph , And indicate the position of the text in the same units as you see in the drawing . In text , We can even follow it TeX Language adds special characters

We can also add markers that point to specific points on the graph .


plt.plot(df['Mes'], df['data science'], label='data science')
plt.plot(df['Mes'], df['machine learning'], label='machine learning')
plt.plot(df['Mes'], df['deep learning'], label='deep learning')
plt.xlabel('Date')
plt.ylabel('Popularity')
plt.title('Popularity of AI terms by date')
plt.grid(True)
plt.text(x='2010-01-01', y=80, s=r'$\lambda=1, r^2=0.8$') #Coordinates use the same units as the graph
plt.annotate('Notice something?', xy=('2014-01-01', 30), xytext=('2006-01-01', 50), arrowprops={'facecolor':'red', 'shrink':0.05})

 picture

Seaborn

Seaborn Is based on Matplotlib The library of . Basically , It gives us better graphics and functions , You can make complex types of graphics with just one line of code .

We import the library and use sns.set() Initialize the graphic style , Without this command , Graphics will still have the same characteristics as Matplotlib The same pattern . We show one of the simplest graphics , Scatter plot :


import seaborn as sns
sns.set()
sns.scatterplot(df['Mes'], df['data science'])

 picture

We can add more than two variables to the same graph . So , We use color and size . We also made a different graph based on the value of the category column :


sns.relplot(x='Mes', y='deep learning', hue='data science', size='machine learning', col='categorical', data=df)

 picture

Seaborn One of the most popular graphics available is the heat map . It is usually used to show all the correlations between variables in a dataset :


sns.heatmap(df.corr(),annot = True,fmt ='.2f')

 picture

The other most popular is the pairing graph , It shows us the relationship between all the variables . If you have a big data set , Please use this function carefully , Because it has to show all data points the same number of times as it has columns , That means by increasing the dimension of the data , Processing time will multiply .


sns.pairplot(df)

 picture

Now let's make a pair diagram , Displays a chart broken down by the value of the classified variable .


sns.pairplot(df,hue ='categorical')

 picture

Union graph is a very useful graph , It allows us to look at the scatter plot and the histogram of two variables , And see how they're distributed :


sns.jointplot(x='data science', y='machine learning', data=df)

 picture

Another interesting figure is ViolinPlot:


sns.catplot(x='categorical', y='data science', kind='violin', data=df)

 picture

We can use it like Matplotlib Create multiple graphics in one image as well :


fig, axes = plt.subplots(1, 2, sharey=True, figsize=(8, 4))
sns.scatterplot(x="Mes", y="deep learning", hue="categorical", data=df, ax=axes[0])
axes[0].set_title('Deep Learning')
sns.scatterplot(x="Mes", y="machine learning", hue="categorical", data=df, ax=axes[1])
axes[1].set_title('Machine Learning')

 picture

Bokeh

Bokeh Is a library , Can be used to generate interactive graphics . We can export them to HTML In the document , And with Web Share with anyone in the browser .

When we are interested in finding things in a graph and want to be able to zoom in and move around the graph , It's a very useful library . perhaps , When we want to share them and explore the possibility of data for others .

Let's first import the library and define the file that will hold the drawing :


from bokeh.plotting import figure, output_file, save
output_file('data_science_popularity.html')

We draw what we need and save it in a file :


p = figure(title='data science', x_axis_label='Mes', y_axis_label='data science')
p.line(df['Mes'], df['data science'], legend='popularity', line_width=2)
save(p)

 picture

Add multiple graphics to a single file :


output_file('multiple_graphs.html')
s1 = figure(width=250, plot_height=250, title='data science')
s1.circle(df['Mes'], df['data science'], size=10, color='navy', alpha=0.5)
s2 = figure(width=250, height=250, x_range=s1.x_range, y_range=s1.y_range, title='machine learning') #share both axis range
s2.triangle(df['Mes'], df['machine learning'], size=10, color='red', alpha=0.5)
s3 = figure(width=250, height=250, x_range=s1.x_range, title='deep learning') #share only one axis range
s3.square(df['Mes'], df['deep learning'], size=5, color='green', alpha=0.5)
p = gridplot([[s1, s2, s3]])
save(p)

 picture

Altair

In my submission Altair It won't bring anything new to what we've discussed with other libraries , therefore , I will not discuss it in depth . I want to mention this library , Because maybe in their sample gallery , We can find some specific graphics that can help us .

 picture

Folium

Folium It's a study , Let's make maps , Mark , You can also plot data on it .Folium Let's choose the provider of the map , This determines the style and quality of the map . In this paper , For the sake of simplicity , We will only OpenStreetMap As a map provider .

Using maps is very complicated , Worth reading . ad locum , We're just looking at the basics , And draw a few maps with the data we have .

Let's start with the basics , We're going to make a simple map , There's nothing on it .


import folium
m1 = folium.Map(location=[41.38, 2.17], tiles='openstreetmap', zoom_start=18)
m1.save('map1.html')

 picture

We generate an interactive file for the map , You can move and zoom freely in it .

We can add markers to the map :


m2 = folium.Map(location=[41.38, 2.17], tiles='openstreetmap', zoom_start=16)
folium.Marker([41.38, 2.176], popup='<i>You can use whatever HTML code you want</i>', tooltip='click here').add_to(m2)
folium.Marker([41.38, 2.174], popup='<b>You can use whatever HTML code you want</b>', tooltip='dont click here').add_to(m2)
m2.save('map2.html')

 picture

You can see the interactive map file , Where you can click the tag .

In the dataset provided at the beginning , We have the popularity of country names and AI terms . After a quick visualization , You will find that some countries lack one of these values . We will eliminate these countries , To make it easier . then , We will use Geopandas Will the country / Area names are converted to coordinates that can be drawn on a map .


from geopandas.tools import geocode
df2 = pd.read_csv('mapa.csv')
df2.dropna(axis=0, inplace=True)
df2['geometry'] = geocode(df2['País'], provider='nominatim')['geometry'] #It may take a while because it downloads a lot of data.
df2['Latitude'] = df2['geometry'].apply(lambda l: l.y)
df2['Longitude'] = df2['geometry'].apply(lambda l: l.x)

 picture

Now? , We've coded the data in terms of latitude and longitude , Now let's show it on the map . We will start from BubbleMap Start , Draw circles of countries in it . Their size will depend on the popularity of the term , And the color will be red or green , It depends on whether they are more popular than a certain value .


m3 = folium.Map(location=[39.326234,-4.838065], tiles='openstreetmap', zoom_start=3)
def color_producer(val):
if val <= 50:
return 'red'
else:
return 'green'
for i in range(0,len(df2)):
folium.Circle(location=[df2.iloc[i]['Latitud'], df2.iloc[i]['Longitud']], radius=5000*df2.iloc[i]['data science'], color=color_producer(df2.iloc[i]['data science'])).add_to(m3)
m3.save('map3.html')

 picture

When and which library to use ?

With all kinds of Libraries , How to choose ? The quick answer is a library that allows you to easily create the graphics you need .

For the initial phase of the project , Use Pandas and Pandas analysis , We're going to do a quick visualization to understand the data . If you need to visualize more information , It can be used in matplotlib You can find simple graphs in as scatter or histogram .

For the advanced stages of the project , We can do it in the main library (Matplotlib,Seaborn,Bokeh,Altair) Search for the graphics we like and suitable for the project in the gallery of . These graphs can be used to provide information in the report , Make interactive reports , Search for specific values, etc .

版权声明
本文为[Python cola]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210406165550195p.html

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database