Hello everyone , Today let's take a look at using Python The main libraries for data visualization and all types of charts that can be done with them . We'll also see which libraries are recommended for each case and the unique features of each library .

We'll start with the most basic visualization , Check the data directly , Then continue with the chart , Finally, make an interactive chart .

## Data sets

These data sets are all three terms related to artificial intelligence （ Data Science , Machine learning and deep learning ） Search the Internet for popularity data , Extracted from search engines .

The dataset contains two files temporal.csv and mapa.csv. In this tutorial , The first one we're going to use more includes over time （ from 2004 Year to 2020 year ） Popularity data of three terms . in addition , I added a categorical variable （1 and 0） To demonstrate the function of a chart with classified variables .

mapa.csv The document contains information by country / Popularity data by Region . In the final visualization map , We'll use it .

## Pandas

Before introducing more complex methods , Let's start with the most basic method of visualizing data . We will only use pandas to look at the data and understand how they are distributed .

The first thing we need to do is visualize some examples , See which columns these examples contain 、 What information and how to encode values, etc .

import pandas as pddf = pd.read_csv('temporal.csv')df.head(10) #View first 10 data rows

Use the command to describe , We'll see how the data is distributed , Maximum , minimum value , mean value ……

df.describe（）

Use info command , We'll see the type of data each column contains . We can find a list of things , When using head When the command is checked , The column appears to be numeric , But if we look at the follow-up data , Then the value in string format will be encoded as a string .

df.info（）

Usually ,pandas Will limit the number of rows and columns displayed . This can be bothering many programmers , Because we all want to be able to visualize all the data .

Use these commands , We can add restrictions , And you can visualize the whole data . For large datasets , Please use this option carefully , Otherwise they may not be displayed .

pd.set_option（'display.max_rows',500）pd.set_option（'display.max_columns',500）pd.set_option（'display.width',1000）

Use Pandas style , We can get more information when we look at the form . First , We define a format Dictionary , So that the numbers can be displayed in a clear way （ Display a certain number of decimals in a certain format 、 Dates and hours , And use percentages 、 Currency, etc ）. Don't panic , This is just showing without changing the data , There won't be any problems in the future .

To give examples of each type , I added currency and percentage symbols , Even if they don't make any sense to this data .

## Seaborn

Seaborn Is based on Matplotlib The library of . Basically , It gives us better graphics and functions , You can make complex types of graphics with just one line of code .

We import the library and use sns.set（） Initialize the graphic style , Without this command , Graphics will still have the same characteristics as Matplotlib The same pattern . We show one of the simplest graphics , Scatter plot ：

import seaborn as snssns.set()sns.scatterplot(df['Mes'], df['data science'])

We can add more than two variables to the same graph . So , We use color and size . We also made a different graph based on the value of the category column ：

sns.relplot(x='Mes', y='deep learning', hue='data science', size='machine learning', col='categorical', data=df)

Seaborn One of the most popular graphics available is the heat map . It is usually used to show all the correlations between variables in a dataset ：

sns.heatmap（df.corr（）,annot = True,fmt ='.2f'）

The other most popular is the pairing graph , It shows us the relationship between all the variables . If you have a big data set , Please use this function carefully , Because it has to show all data points the same number of times as it has columns , That means by increasing the dimension of the data , Processing time will multiply .

sns.pairplot（df）

Now let's make a pair diagram , Displays a chart broken down by the value of the classified variable .

sns.pairplot（df,hue ='categorical'）

Union graph is a very useful graph , It allows us to look at the scatter plot and the histogram of two variables , And see how they're distributed ：

sns.jointplot(x='data science', y='machine learning', data=df)

Another interesting figure is ViolinPlot：

sns.catplot(x='categorical', y='data science', kind='violin', data=df)

We can use it like Matplotlib Create multiple graphics in one image as well ：

fig, axes = plt.subplots(1, 2, sharey=True, figsize=(8, 4))sns.scatterplot(x="Mes", y="deep learning", hue="categorical", data=df, ax=axes[0])axes[0].set_title('Deep Learning')sns.scatterplot(x="Mes", y="machine learning", hue="categorical", data=df, ax=axes[1])axes[1].set_title('Machine Learning')

## Bokeh

Bokeh Is a library , Can be used to generate interactive graphics . We can export them to HTML In the document , And with Web Share with anyone in the browser .

When we are interested in finding things in a graph and want to be able to zoom in and move around the graph , It's a very useful library . perhaps , When we want to share them and explore the possibility of data for others .

Let's first import the library and define the file that will hold the drawing ：

from bokeh.plotting import figure, output_file, saveoutput_file('data_science_popularity.html')

We draw what we need and save it in a file ：

p = figure(title='data science', x_axis_label='Mes', y_axis_label='data science')p.line(df['Mes'], df['data science'], legend='popularity', line_width=2)save(p)

Add multiple graphics to a single file ：

output_file('multiple_graphs.html')s1 = figure(width=250, plot_height=250, title='data science')s1.circle(df['Mes'], df['data science'], size=10, color='navy', alpha=0.5)s2 = figure(width=250, height=250, x_range=s1.x_range, y_range=s1.y_range, title='machine learning') #share both axis ranges2.triangle(df['Mes'], df['machine learning'], size=10, color='red', alpha=0.5)s3 = figure(width=250, height=250, x_range=s1.x_range, title='deep learning') #share only one axis ranges3.square(df['Mes'], df['deep learning'], size=5, color='green', alpha=0.5)p = gridplot([[s1, s2, s3]])save(p)

## Altair

In my submission Altair It won't bring anything new to what we've discussed with other libraries , therefore , I will not discuss it in depth . I want to mention this library , Because maybe in their sample gallery , We can find some specific graphics that can help us .

## Folium

Folium It's a study , Let's make maps , Mark , You can also plot data on it .Folium Let's choose the provider of the map , This determines the style and quality of the map . In this paper , For the sake of simplicity , We will only OpenStreetMap As a map provider .

Using maps is very complicated , Worth reading . ad locum , We're just looking at the basics , And draw a few maps with the data we have .

Let's start with the basics , We're going to make a simple map , There's nothing on it .

import foliumm1 = folium.Map(location=[41.38, 2.17], tiles='openstreetmap', zoom_start=18)m1.save('map1.html')

We generate an interactive file for the map , You can move and zoom freely in it .

We can add markers to the map ：

m2 = folium.Map(location=[41.38, 2.17], tiles='openstreetmap', zoom_start=16)folium.Marker([41.38, 2.176], popup='<i>You can use whatever HTML code you want</i>', tooltip='click here').add_to(m2)folium.Marker([41.38, 2.174], popup='<b>You can use whatever HTML code you want</b>', tooltip='dont click here').add_to(m2)m2.save('map2.html')

You can see the interactive map file , Where you can click the tag .

In the dataset provided at the beginning , We have the popularity of country names and AI terms . After a quick visualization , You will find that some countries lack one of these values . We will eliminate these countries , To make it easier . then , We will use Geopandas Will the country / Area names are converted to coordinates that can be drawn on a map .

from geopandas.tools import geocodedf2 = pd.read_csv('mapa.csv')df2.dropna(axis=0, inplace=True)df2['geometry'] = geocode(df2['País'], provider='nominatim')['geometry'] #It may take a while because it downloads a lot of data.df2['Latitude'] = df2['geometry'].apply(lambda l: l.y)df2['Longitude'] = df2['geometry'].apply(lambda l: l.x)

Now? , We've coded the data in terms of latitude and longitude , Now let's show it on the map . We will start from BubbleMap Start , Draw circles of countries in it . Their size will depend on the popularity of the term , And the color will be red or green , It depends on whether they are more popular than a certain value .

m3 = folium.Map(location=[39.326234,-4.838065], tiles='openstreetmap', zoom_start=3)def color_producer(val):if val <= 50:return 'red'else:return 'green'for i in range(0,len(df2)):folium.Circle(location=[df2.iloc[i]['Latitud'], df2.iloc[i]['Longitud']], radius=5000*df2.iloc[i]['data science'], color=color_producer(df2.iloc[i]['data science'])).add_to(m3)m3.save('map3.html')

## When and which library to use ？

With all kinds of Libraries , How to choose ？ The quick answer is a library that allows you to easily create the graphics you need .

For the initial phase of the project , Use Pandas and Pandas analysis , We're going to do a quick visualization to understand the data . If you need to visualize more information , It can be used in matplotlib You can find simple graphs in as scatter or histogram .

For the advanced stages of the project , We can do it in the main library （Matplotlib,Seaborn,Bokeh,Altair） Search for the graphics we like and suitable for the project in the gallery of . These graphs can be used to provide information in the report , Make interactive reports , Search for specific values, etc .

