Python data visualization, full version of the practical guide!

Python cola 2021-04-06 16:57:41
python data visualization version practical

Hello everyone , Today let's take a look at using Python The main libraries for data visualization and all types of charts that can be done with them . We'll also see which libraries are recommended for each case and the unique features of each library .

We'll start with the most basic visualization , Check the data directly , Then continue with the chart , Finally, make an interactive chart .


Data sets

We will use two datasets to accommodate the visualization shown in this article , The dataset can be downloaded from the link below .

Data sets

These data sets are all three terms related to artificial intelligence ( Data Science , Machine learning and deep learning ) Search the Internet for popularity data , Extracted from search engines .

The dataset contains two files temporal.csv and mapa.csv. In this tutorial , The first one we're going to use more includes over time ( from 2004 Year to 2020 year ) Popularity data of three terms . in addition , I added a categorical variable (1 and 0) To demonstrate the function of a chart with classified variables .

mapa.csv The document contains information by country / Popularity data by Region . In the final visualization map , We'll use it .


Before introducing more complex methods , Let's start with the most basic method of visualizing data . We will only use pandas to look at the data and understand how they are distributed .

The first thing we need to do is visualize some examples , See which columns these examples contain 、 What information and how to encode values, etc .

import pandas as pd
df = pd.read_csv('temporal.csv')
df.head(10) #View first 10 data rows


Use the command to describe , We'll see how the data is distributed , Maximum , minimum value , mean value ……



Use info command , We'll see the type of data each column contains . We can find a list of things , When using head When the command is checked , The column appears to be numeric , But if we look at the follow-up data , Then the value in string format will be encoded as a string .


Usually ,pandas Will limit the number of rows and columns displayed . This can be bothering many programmers , Because we all want to be able to visualize all the data .


Use these commands , We can add restrictions , And you can visualize the whole data . For large datasets , Please use this option carefully , Otherwise they may not be displayed .


Use Pandas style , We can get more information when we look at the form . First , We define a format Dictionary , So that the numbers can be displayed in a clear way ( Display a certain number of decimals in a certain format 、 Dates and hours , And use percentages 、 Currency, etc ). Don't panic , This is just showing without changing the data , There won't be any problems in the future .

To give examples of each type , I added currency and percentage symbols , Even if they don't make any sense to this data .

format_dict = {'data science':'${0:,.2f}', 'Mes':'{:%m-%Y}', 'machine learning':'{:.2%}'}
#We make sure that the Month column has datetime format
df['Mes'] = pd.to_datetime(df['Mes'])
#We apply the style to the visualization

We can use color to highlight the maximum and minimum values .

format_dict = {'Mes':'{:%m-%Y}'} #Simplified format dictionary with values that do make sense for our data


We use color gradients to display data values .

df.head(10).style.format(format_dict).background_gradient(subset=['data science', 'machine learning'], cmap='BuGn')


We can also use bars to display data values .

df.head().style.format(format_dict).bar(color='red', subset=['data science', 'deep learning'])


Besides , We can also combine the above functions and generate more complex visualizations .

df.head(10).style.format(format_dict).background_gradient(subset = ['data science','machine learning'],cmap ='BuGn').highlight_max(color ='yellow')


Pandas analysis

Pandas Analysis is a library , We can use our data to generate interactive reports , We can see the distribution of the data , Types of data and possible problems . It's very easy to use , Just three lines , We can generate a report , The report can be sent to anyone , Even if you don't know programming, you can use .

from pandas_profiling import ProfileReport
prof = ProfileReport(df)



Matplotlib Is the most basic library for visualizing data graphically . It contains a lot of graphics that we can think of . Just because it's basic doesn't mean it's not powerful , Many of the other data visualization libraries we're going to talk about are based on it .

Matplotlib There are two main parts to the chart , I.e. axis ( The line that defines the area of the chart ) And graphics ( We draw the axis in it , Titles and things from the axis area ), Now let's create the simplest diagram :

import matplotlib.pyplot as plt
plt.plot(df['Mes'], df['data science'], label='data science') #The parameter label is to indicate the legend. This doesn't mean that it will be shown, we'll have to use another command that I'll explain later.


We can plot multiple variables in the same graph , And then compare them .

plt.plot(df ['Mes'],df ['data science'],label ='data science')
plt.plot(df ['Mes'],df ['machine learning'],label ='machine learning ')
plt.plot(df ['Mes'],df ['deep learning'],label ='deep learning')


It's not clear which variable each color represents . We will improve the chart by adding legends and titles .

plt.plot(df['Mes'], df['data science'], label='data science')
plt.plot(df['Mes'], df['machine learning'], label='machine learning')
plt.plot(df['Mes'], df['deep learning'], label='deep learning')
plt.title('Popularity of AI terms by date')


If you are using it from a terminal or script Python, After using the function definition diagram we wrote above , Please use If you are using Jupyter Notebook, Before making the chart , take %matplotlib Add inline to the beginning of the file and run it .

We can make multiple figures in one figure . This is very useful for comparing charts or for easily sharing data from several chart types through a single image .

fig, axes = plt.subplots(2,2)
axes[0, 0].hist(df['data science'])
axes[0, 1].scatter(df['Mes'], df['data science'])
axes[1, 0].plot(df['Mes'], df['machine learning'])
axes[1, 1].plot(df['Mes'], df['deep learning'])


We can draw graphs with different styles for the points of each variable :

plt.plot(df ['Mes'],df ['data science'],'r-')
plt.plot(df ['Mes'],df ['data science'] * 2,'bs')
plt .plot(df ['Mes'],df ['data science'] * 3,'g ^')


Now let's look at some uses Matplotlib Examples of different graphics that can be done . Let's start with a scatter plot :

plt.scatter(df['data science'], df['machine learning'])


Bar chart example : ['Mes'],df ['machine learning'],width = 20)


Histogram example :

plt.hist(df ['deep learning'],bins = 15)


We can add text to the graph , And indicate the position of the text in the same units as you see in the drawing . In text , We can even follow it TeX Language adds special characters

We can also add markers that point to specific points on the graph .

plt.plot(df['Mes'], df['data science'], label='data science')
plt.plot(df['Mes'], df['machine learning'], label='machine learning')
plt.plot(df['Mes'], df['deep learning'], label='deep learning')
plt.title('Popularity of AI terms by date')
plt.text(x='2010-01-01', y=80, s=r'$\lambda=1, r^2=0.8$') #Coordinates use the same units as the graph
plt.annotate('Notice something?', xy=('2014-01-01', 30), xytext=('2006-01-01', 50), arrowprops={'facecolor':'red', 'shrink':0.05})



Seaborn Is based on Matplotlib The library of . Basically , It gives us better graphics and functions , You can make complex types of graphics with just one line of code .

We import the library and use sns.set() Initialize the graphic style , Without this command , Graphics will still have the same characteristics as Matplotlib The same pattern . We show one of the simplest graphics , Scatter plot :

import seaborn as sns
sns.scatterplot(df['Mes'], df['data science'])


We can add more than two variables to the same graph . So , We use color and size . We also made a different graph based on the value of the category column :

sns.relplot(x='Mes', y='deep learning', hue='data science', size='machine learning', col='categorical', data=df)


Seaborn One of the most popular graphics available is the heat map . It is usually used to show all the correlations between variables in a dataset :

sns.heatmap(df.corr(),annot = True,fmt ='.2f')


The other most popular is the pairing graph , It shows us the relationship between all the variables . If you have a big data set , Please use this function carefully , Because it has to show all data points the same number of times as it has columns , That means by increasing the dimension of the data , Processing time will multiply .



Now let's make a pair diagram , Displays a chart broken down by the value of the classified variable .

sns.pairplot(df,hue ='categorical')


Union graph is a very useful graph , It allows us to look at the scatter plot and the histogram of two variables , And see how they're distributed :

sns.jointplot(x='data science', y='machine learning', data=df)


Another interesting figure is ViolinPlot:

sns.catplot(x='categorical', y='data science', kind='violin', data=df)


We can use it like Matplotlib Create multiple graphics in one image as well :

fig, axes = plt.subplots(1, 2, sharey=True, figsize=(8, 4))
sns.scatterplot(x="Mes", y="deep learning", hue="categorical", data=df, ax=axes[0])
axes[0].set_title('Deep Learning')
sns.scatterplot(x="Mes", y="machine learning", hue="categorical", data=df, ax=axes[1])
axes[1].set_title('Machine Learning')



Bokeh Is a library , Can be used to generate interactive graphics . We can export them to HTML In the document , And with Web Share with anyone in the browser .

When we are interested in finding things in a graph and want to be able to zoom in and move around the graph , It's a very useful library . perhaps , When we want to share them and explore the possibility of data for others .

Let's first import the library and define the file that will hold the drawing :

from bokeh.plotting import figure, output_file, save

We draw what we need and save it in a file :

p = figure(title='data science', x_axis_label='Mes', y_axis_label='data science')
p.line(df['Mes'], df['data science'], legend='popularity', line_width=2)


Add multiple graphics to a single file :

s1 = figure(width=250, plot_height=250, title='data science')['Mes'], df['data science'], size=10, color='navy', alpha=0.5)
s2 = figure(width=250, height=250, x_range=s1.x_range, y_range=s1.y_range, title='machine learning') #share both axis range
s2.triangle(df['Mes'], df['machine learning'], size=10, color='red', alpha=0.5)
s3 = figure(width=250, height=250, x_range=s1.x_range, title='deep learning') #share only one axis range
s3.square(df['Mes'], df['deep learning'], size=5, color='green', alpha=0.5)
p = gridplot([[s1, s2, s3]])



In my submission Altair It won't bring anything new to what we've discussed with other libraries , therefore , I will not discuss it in depth . I want to mention this library , Because maybe in their sample gallery , We can find some specific graphics that can help us .



Folium It's a study , Let's make maps , Mark , You can also plot data on it .Folium Let's choose the provider of the map , This determines the style and quality of the map . In this paper , For the sake of simplicity , We will only OpenStreetMap As a map provider .

Using maps is very complicated , Worth reading . ad locum , We're just looking at the basics , And draw a few maps with the data we have .

Let's start with the basics , We're going to make a simple map , There's nothing on it .

import folium
m1 = folium.Map(location=[41.38, 2.17], tiles='openstreetmap', zoom_start=18)'map1.html')


We generate an interactive file for the map , You can move and zoom freely in it .

We can add markers to the map :

m2 = folium.Map(location=[41.38, 2.17], tiles='openstreetmap', zoom_start=16)
folium.Marker([41.38, 2.176], popup='<i>You can use whatever HTML code you want</i>', tooltip='click here').add_to(m2)
folium.Marker([41.38, 2.174], popup='<b>You can use whatever HTML code you want</b>', tooltip='dont click here').add_to(m2)'map2.html')


You can see the interactive map file , Where you can click the tag .

In the dataset provided at the beginning , We have the popularity of country names and AI terms . After a quick visualization , You will find that some countries lack one of these values . We will eliminate these countries , To make it easier . then , We will use Geopandas Will the country / Area names are converted to coordinates that can be drawn on a map .

from import geocode
df2 = pd.read_csv('mapa.csv')
df2.dropna(axis=0, inplace=True)
df2['geometry'] = geocode(df2['País'], provider='nominatim')['geometry'] #It may take a while because it downloads a lot of data.
df2['Latitude'] = df2['geometry'].apply(lambda l: l.y)
df2['Longitude'] = df2['geometry'].apply(lambda l: l.x)


Now? , We've coded the data in terms of latitude and longitude , Now let's show it on the map . We will start from BubbleMap Start , Draw circles of countries in it . Their size will depend on the popularity of the term , And the color will be red or green , It depends on whether they are more popular than a certain value .

m3 = folium.Map(location=[39.326234,-4.838065], tiles='openstreetmap', zoom_start=3)
def color_producer(val):
if val <= 50:
return 'red'
return 'green'
for i in range(0,len(df2)):
folium.Circle(location=[df2.iloc[i]['Latitud'], df2.iloc[i]['Longitud']], radius=5000*df2.iloc[i]['data science'], color=color_producer(df2.iloc[i]['data science'])).add_to(m3)'map3.html')


When and which library to use ?

With all kinds of Libraries , How to choose ? The quick answer is a library that allows you to easily create the graphics you need .

For the initial phase of the project , Use Pandas and Pandas analysis , We're going to do a quick visualization to understand the data . If you need to visualize more information , It can be used in matplotlib You can find simple graphs in as scatter or histogram .

For the advanced stages of the project , We can do it in the main library (Matplotlib,Seaborn,Bokeh,Altair) Search for the graphics we like and suitable for the project in the gallery of . These graphs can be used to provide information in the report , Make interactive reports , Search for specific values, etc .

本文为[Python cola]所创,转载请带上原文链接,感谢

  1. 商业数据分析从入门到入职(7)Python基础数据结构及其操作
  2. 商业数据分析从入门到入职(6)Python程序结构和函数
  3. Business data analysis from entry to entry (9) Python Network Data Acquisition
  4. Business data analysis from entry to entry (8) Python module, file IO and object oriented
  5. Business data analysis from entry to entry (7) Python basic data structure and its operation
  6. Business data analysis from entry to entry (6) Python program structure and function
  7. 简简单单实现 Python Web 的登录注册页面,还包含一半逻辑。
  8. Simple implementation of Python web login registration page, but also contains half of the logic.
  9. 什么是pip?Python新手入门指南
  10. What is PIP? Getting started with Python
  11. Python uses for... Else to jump out of double nested loop
  12. Python基础之:Python中的内部对象
  13. 人工智能入门:Python实现机器学习
  14. The foundation of Python: inner objects in Python
  15. Introduction to artificial intelligence: machine learning in Python
  16. Python基础之:Python中的内部对象
  17. The foundation of Python: inner objects in Python
  18. Python 小技之 Office 文件转 PDF
  19. 还在为多张Excel汇总统计发愁?Python 秒处理真香!
  20. 用 Python 制作音乐聚合下载器
  21. Spark Delta Lake 0.4.0 发布,支持 Python API 和部分 SQL
  22. How to transfer office files to PDF
  23. Are you still worried about multiple excel summary statistics? Python second processing really fragrant!
  24. Making music aggregate downloader with Python
  25. Spark delta Lake 0.4.0 is released, supporting Python API and part of SQL
  26. Python信息搜集
  27. Python information gathering
  28. Python - 关于类(self/cls) 以及 多进程通讯的思考
  29. Python - thinking about class (self / CLS) and multi process communication
  30. Python - 关于类(self/cls) 以及 多进程通讯的思考
  31. Python - thinking about class (self / CLS) and multi process communication
  32. Python信用评分卡建模(附代码)
  33. Python credit score card modeling (with code)
  34. 学Python需要学数据库吗?Python学习教程!
  35. Do you need to learn database to learn Python!
  36. Python私有变量如何定义?Python学习教程!
  37. How to define Python private variables? Python tutorial!
  38. Python数据分析入门(六):Pandas的函数应用
  39. Introduction to Python data analysis (6): function application of pandas
  40. 学Python需要学数据库吗?Python学习教程!
  41. Do you need to learn database to learn Python!
  42. Python描述 LeetCode 80. 删除有序数组中的重复项 II
  43. C++/python描述 AcWing 94. 递归实现排列型枚举
  44. C++/python描述 AcWing 92. 递归实现指数型枚举
  45. Python描述 LeetCode 88. 合并两个有序数组
  46. 苏州大学计算机考研 复试机试真题2013-2021真题及Python题解
  47. Python描述 LeetCode 781. 森林中的兔子
  48. 字典和json的区别是什么?Python学习
  49. Python describes leetcode 80. Removing duplicate items from ordered arrays II
  50. C + + / Python description acwing 94. Recursive implementation of permutation enumeration
  51. C + + / Python description acwing 92. Recursive implementation of exponential enumeration
  52. Python describes leetcode 88. Merging two ordered arrays
  53. Real computer test questions 2013-2021 of computer postgraduate entrance examination of Soochow University and python solutions
  54. The rabbit in the forest
  55. Python中的魔法属性
  56. What's the difference between dictionary and JSON? Python learning
  57. Magic properties in Python
  58. 字典和json的区别是什么?Python学习
  59. What's the difference between dictionary and JSON? Python learning
  60. python刷题-字母图形