compile : Almost Human , author :Aaron Frederick
Like to use Python It is inevitable that the project partners will encounter this situation : When making a chart , What kind of good-looking and practical visual chemical tool bag ? When beautiful charts appeared in the previous article , There are always readers who leave a message in the background asking what tools are used to make the chart . below , The author introduces eight species in Python Visual toolkit implemented in , Some of these packages can also be used in other languages . Come and try which one you like ?
use Python There are many ways to create graphics , But which method is the best ? Before we do visualization , Let's first clarify some questions about the image target : You want to know the distribution of the data ? Want to impress people when they want to show ? Maybe you want to show someone an inner image , An image of the mean ?
This article will introduce some commonly used Python Visualization package , It includes the advantages and disadvantages of these packages and what scenarios they are suitable for . This article only extends to 2D chart , For the next lecture 3D Charts and business statements (dashboard) Left some space , But this time it's in the bag , Many of them can support 3D Charts and business statements .
Matplotlib、Seaborn and Pandas
There are several reasons to put these three bags together : First Seaborn and Pandas It's based on Matplotlib Above , When you are in use Seaborn or Pandas Medium df.plot() when , It's actually used by others Matplotlib Written code . therefore , These pictures are similar in terms of beautification , The syntax used for custom graphs is very similar .
When it comes to these visualization tools , I think of three words : Explore (Exploratory)、 data (Data)、 analysis (Analysis). These packages are great for exploring data for the first time , But these bags are not enough for a demonstration .
Matplotlib It's a lower level library , But the degree of customization it supports is incredible ( So don't simply exclude it from the package used in the demo !), But there are other tools that are more suitable for presentation .
Matplotlib You can also choose the style (style selection), It simulates something like ggplot2 and xkcd And other popular beautification tools . Here is what I use Matplotlib And related tools to make an example of the diagram :
When processing basketball team salary data , I want to find the team with the highest median salary . To show the results , I color each team's salary as a bar chart , To show which team a player can join to get better treatment .
The second graph is the regression of experimental residuals Q-Q chart . The main purpose of this picture is to show how to make a useful picture with as few lines as possible , Of course, maybe it's not so beautiful .
import matplotlib.pyplot as plt
import scipy.stats as stats
#model2 is a regression model
log_resid = model2.predict(X_test)-y_test
stats.probplot(log_resid, dist="norm", plot=plt)
plt.title("Normal Q-Q plot")
plt.show()
In the end prove ,Matplotlib And its related tools are very efficient , But they're not the best tools in terms of demos .
ggplot(2)
You may ask ,「Aaron,ggplot yes R The most commonly used visualization package in , But you're not going to write Python Is it a bag ?」. People are already Python Implemented in the ggplot2, Copy everything from beautification to grammar in this package .
Of all the materials I've seen , Everything about it and ggplot2 It's like , But the advantage of this package is that it depends on Pandas Python package . however Pandas Python Bao recently abandoned some methods , Lead to Python Version incompatible .
If you want to R Real ggplot( In addition to dependency , What they look like 、 The feeling and grammar are the same ), I discussed this in another article .
in other words , If you have to be in Python of use ggplot, Then you have to install 0.19.2 Version of Pandas, But I suggest you don't lower it in order to use a lower level drawing package Pandas Version of .
ggplot2( I think it also includes Python Of ggplot) The reason why they matter is that they use 「 Graphic Syntax 」 To build images . The basic premise is that you can instantiate graphs , Then add different features ; in other words , You can separate the headlines 、 Axis 、 Beautify data points and trend lines .
Here is ggplot A simple example of code . We use first ggplot Instantiation diagram , Set beautification properties and data , And then add a little bit 、 Topic with axis and title labels .
#All Salaries
ggplot(data=df, aes(x=season_start, y=salary, colour=team)) +
geom_point() +
theme(legend.position="none") +
labs(title = 'Salary Over Time', x='Year', y='Salary ($)')
Bokeh
Bokeh Very beautiful . conceptually ,Bokeh Be similar to ggplot, They all use graphic syntax to build images , but Bokeh It has a user-friendly interface that can make professional graphics and business reports . To illustrate this point , I base 538 Masculinity Survey Data set to write the histogram of the code :
import pandas as pd
from bokeh.plotting import figure
from bokeh.io import show
# is_masc is a one-hot encoded dataframe of responses to the question:
# "Do you identify as masculine?"
#Dataframe Prep
counts = is_masc.sum()
resps = is_masc.columns
#Bokeh
p2 = figure(title='Do You View Yourself As Masculine?',
x_axis_label='Response',
y_axis_label='Count',
x_range=list(resps))
p2.vbar(x=resps, top=counts, width=0.6, fill_color='red', line_color='black')
show(p2)
#Pandas
counts.plot(kind='bar')
use Bokeh It means the result of the investigation
The red bar chart shows 538 Personal about 「 Do you think you are manly ?」 The answer to this question .9~14 Yes Bokeh The code builds an elegant and professional response count histogram —— font size 、y Axis scale and format are very reasonable .
Most of the code I write is used to mark axes and titles , And add colors and borders to the bar chart . When making beautiful and expressive pictures , I prefer to use Bokeh—— It has done a lot of beautification for us .
use Pandas Represents the same data
The blue picture is number one above 17 Line code . The values of the two histograms are the same , But the purpose is different . In the exploratory setting , use Pandas It's easy to write a line of code to see the data , but Bokeh The beautification function is very powerful .
Bokeh All the conveniences provided should be in matplotlib Custom in , Include x The angle of the axis label 、 Background line 、y Axis scale and font ( size 、 Italics 、 bold ) etc. . The figure below shows some random trends , It has a higher degree of customization : Use the legend and different colors and lines .
Bokeh It's also a great tool for making interactive business reports .
Plotly
Plotly Very powerful , But it takes a lot of time to set up and create graphics with it , And it's not intuitive . In use Plotly After working for most of the morning , I almost didn't do anything , Just go straight to dinner . I've just created bar charts without coordinates labels , And can't delete lines 「 Scatter plot 」.Ploty There are some points to pay attention to when you start :
Install with API Secret key , And sign up , Not just pip Installation is OK ;
Plotly The data and layout objects drawn are unique , But it's not intuitive ;
Picture layout doesn't work for me (40 Lines of code are meaningless !)
But it also has advantages , And there are solutions to all the shortcomings in the setup :
You can Plotly Website and Python Editing images in the environment ;
Support interactive images and business reports ;
Plotly And Mapbox cooperation , You can customize the map ;
There's great potential for graphics .
Here is the code I wrote for this package :
#plot 1 - barplot
# **note** - the layout lines do nothing and trip no errors
data = [go.Bar(x=team_ave_df.team,
y=team_ave_df.turnovers_per_mp)]
layout = go.Layout(
title=go.layout.Title(
text='Turnovers per Minute by Team',
xref='paper',
x=0
),
xaxis=go.layout.XAxis(
title = go.layout.xaxis.Title(
text='Team',
font=dict(
family='Courier New, monospace',
size=18,
color='#7f7f7f'
)
)
),
yaxis=go.layout.YAxis(
title = go.layout.yaxis.Title(
text='Average Turnovers/Minute',
font=dict(
family='Courier New, monospace',
size=18,
color='#7f7f7f'
)
)
),
autosize=True,
hovermode='closest')
py.iplot(figure_or_data=data, layout=layout, filename='jupyter-plot', sharing='public', fileopt='overwrite')
#plot 2 - attempt at a scatterplot
data = [go.Scatter(x=player_year.minutes_played,
y=player_year.salary,
marker=go.scatter.Marker(color='red',
size=3))]
layout = go.Layout(title="test",
xaxis=dict(title='why'),
yaxis=dict(title='plotly'))
py.iplot(figure_or_data=data, layout=layout, filename='jupyter-plot2', sharing='public')
[Image: image.png]
Represent different NBA A bar chart of the team's average mistakes per minute .
Pay and in NBA A scatter diagram of the relationship between playing time and time
On the whole , The out of the box beautification tool looks great , But many times I tried to copy the document word for word and modify the axis label, but I failed . But the picture below shows Plotly The potential of , And why I spend hours on it :
Plotly Some examples on the page
Pygal
Pygal It's not that famous , Like other common drawing packages , It also uses graphical framework syntax to build images . Because the drawing target is relatively simple , So this is a relatively simple drawing package . Use Pygal It's simple :
Instantiate images ;
Format with image target properties ;
use figure.add() Add data to the picture .
I am using Pygal The main problem encountered in the process of is image rendering . It must be used. render_to_file Options , And then in web Open the file in the browser , To see what I just built .
In the end, it's worth it , Because pictures are interactive , It has a satisfying and easy to customize beautification function . To make a long story short , This bag looks good , But in the file creation and rendering part is more troublesome .
Networkx
although Networkx Is based on matplotlib Of , But it's still a great solution for graphic analysis and Visualization . Graphics and the Internet are not my area of expertise , but Networkx It can quickly and easily graphically represent the connection between networks . Here are the different representations I build for a simple graph , And some from Stanford SNAP Download the code ( About drawing small Facebook The Internet ).
I press the number (1~10) Each node is color coded , The code is as follows :
options = {
'node_color' : range(len(G)),
'node_size' : 300,
'width' : 1,
'with_labels' : False,
'cmap' : plt.cm.coolwarm
}
nx.draw(G, **options)
Used to visualize the sparsity mentioned above Facebook The graphic code is as follows :
import itertools
import networkx as nx
import matplotlib.pyplot as plt
f = open('data/facebook/1684.circles', 'r')
circles = [line.split() for line in f]
f.close()
network = []
for circ in circles:
cleaned = [int(val) for val in circ[1:]]
network.append(cleaned)
G = nx.Graph()
for v in network:
G.add_nodes_from(v)
edges = [itertools.combinations(net,2) for net in network]
for edge_group in edges:
G.add_edges_from(edge_group)
options = {
'node_color' : 'lime',
'node_size' : 3,
'width' : 1,
'with_labels' : False,
}
nx.draw(G, **options)
This graph is very sparse ,Networkx This sparseness is demonstrated by maximizing the interval between each cluster .
There are lots of data visualization packages , But there's no way to say which is the best . I hope that after reading this article , You can see that in different situations , How to use different beautification tools and code .
Link to the original text :https://towardsdatascience.com/reviewing-python-visualization-packages-fa7fe12e622b
[ Essences ] Python Collection of excellent articles from scratch
Python use 10 Line code teaches you to draw sunflowers
Dry only ! Large factory Python Interview resources summary
Python Climb to know 9674 A question and answer , Uncover the most popular 98 This book !
Know how to build a high-performance long connection gateway
How to use Python Learn about your girlfriend's emotional changes ?
Pay attention to me
Make a little progress every day
Some praise is The biggest support