Introduction to pandas

Pan Chuang AI 2020-11-17 14:38:43
introduction pandas

author |Billy Fetzner compile |VK source |Towards Data Science

In my submission , Now that you click on this page , You may have a lot of data to analyze , You may be thinking of the best and most effective way to solve some of your data problems . The answer to your question can be answered by Pandas solve .

How to contact Pandas

because Pandas The popularity of , It has its own traditional abbreviation , So whenever there will be Pandas Import python, Please use the following name :

import pandas as pd

Pandas The main use of packages is data frames

Pandas API take Pandas Data frames are defined as :

A two-dimensional 、 Variable size 、 Potential heterogeneous table data . Data structures also contain axes ( Row and column ). Arithmetic operations align row and column labels . It can be thought of as something similar to dict The container of , Used to store sequence objects . yes Pandas The main data structure .

Basically , This means that you have data contained in the format , As shown below . Data found in rows and columns :

Data frames are very useful , Because they provide an efficient way to visualize data , And then manipulate the data the way you want it to .

These rows can be easily referenced by the index , The index is the leftmost number in the data frame . The index will be a zero based number , Unless you specify the name of each line . Columns can also be easily named by column names ( for example “Track name”) Or its position in the data frame . We'll discuss reference rows and columns in detail later in this article .

Create data frames

establish Pandas There are several ways to frame data :

  1. from .csv File import data ( Or other file types , for example Excel、SQL database )

  2. From the list

  3. From the dictionary

  4. from numpy Array

  5. other

Usually , You will mainly .csv Data from a file or some type of data source is put into Pandas In the data framework , Not from the beginning , Because it will take a very long time to complete , It depends on the amount of data you have . Here are python A quick word in the dictionary 、 A simple example :

import pandas as pd
dict1 = {'Exercises': ['Running','Walking','Cycling'],
'Mileage': [250, 1000, 550]}
df = pd.DataFrame(dict1)

Output :

Dictionary key (“Exercises” and “Mileage”) Become the corresponding column heading . The values in the dictionary are the list in this example , Become a single data point in a data frame .Running yes “Exercises” The first one in the list ,250 Will be listed first in the second column . in addition , You'll notice , Because I didn't specify a label for the index of the data frame , So it's automatically marked as 0、1 and 2.

however , As I said before , establish Pandas The most likely way to frame data is from csv Or other types of files , You will import the file to analyze the data . It's easy to do just the following :

df = pd.read_csv("file_location.../file_name.csv")

pd.read_csv It's a very powerful and versatile approach , Depending on how you want to import data , It will be very useful . If csv The file already has a header or index attached to it , You can specify when importing . In order to fully understand pd.read_csv, I suggest you look at the PandasAPI:

The first thing

Now you're ready for this huge data set , You have to look at it , Look at what it looks like . As a person who analyzes these data , First, you have to be familiar with data sets , And really understand what's going on in the dataset . I like to understand my data in four ways .

  1. .head() & .tail()
  2. .info()
  3. .describe()
  4. .sample()

It shows the front of the data frame 5 Rows and columns , So that you can easily summarize the appearance of the data . You can also specify a certain number of lines in the method , To show more rows .

.tail Show only the last 5 That's ok .


From these two quick methods , I have a general idea of what column names and data look like , This is just a small sample of the dataset . These methods are also very useful , Especially for Spotify A dataset like this , Handle more than 300 Million lines of data , You can easily display data sets and quickly understand data , And your computer doesn't take long to display data .

.info It's also very useful. , It shows me all the columns 、 Their data types and whether they have null The data points ., null_counts=True)

If you have full integer or floating-point Columns ( namely 'Position'、'Streams'), that .describe It's a useful way , Can help you better understand the dataset , Because it will display a lot of descriptive statistics about these columns .


Last ,.sample Will allow you to randomly sample data frames , And check to see if any of your actions have incorrectly changed something in the dataset , And when you first explore data sets , You can also have a good idea of what the dataset contains


When exploring and preparing data sets for analysis , I always use these methods . Whenever I change the data in a column 、 Change the column name or add / Delete row / Column time , I'm going to run at least fast in front of 5 Some of these methods to make sure that all changes are made the way I want them to be .

Select rows or columns

fantastic , Now you know how to look at data sets as a whole , But actually you just want to look at a few columns or rows , And then exclude the rest .

.loc[] and .iloc[]

These two approaches will do this in different ways , It depends on how you can refer to specific rows or columns .

If you know the label of a row or column , Please use .loc[].

If you know the index of a row or column , Please use .iloc[].

If you know both , Just choose your favorite .

therefore , go back to Spotify Data sets . You can use .loc[] or .iloc[] View columns “Track Name”. If you know the label of the column, you can use .loc[], So I'll use the following :

raw_song.loc[:,'Track Name']

The colon after the first bracket specifies the line I'm referring to , Because I want all lines to be in “Track Name” In the column , So I use “:”.

I will receive with .iloc[] Same output , But this time you need to specify “Track Name” Column index :


.loc[] and .iloc[] It has the same effect on the line , But in this case , Because the labels and indexes of the rows are the same , So they look exactly the same .


Another way to get DataFrame Part of the simple way is to use [] And specify the column name in square brackets .


If you only use a column and a set of parentheses , You will get Pandas Series.


Add rows from data frames 、 Column

Using what we've done from .loc[] Information obtained , We can use this or .insert Add a row or column to a data frame .

add rows

If you decide to use .loc[] Add rows to dataframe, You can only add it to dataframe The bottom of . Appoint dataframe Any other index in , Delete the data currently in the row , And replace it with the data you want to insert .

raw_song.loc[3441197] = [0,'hello','bluemen',1,"", '2017-02-05','ec']

You can also use .loc[] Add columns to the data frame .

raw_song.loc[:,'new_col'] = 0

Except at the end , There are two other ways to insert new columns into data frames .

insert Method allows you to specify where to put the column in the data frame . It accepts 3 Parameters 、 The index to place it 、 The name of the new column and the value to place as column data .


Add columns to dataframe The second way is by using [] Name the new column and make it equal to the new data , So that it becomes dataframe Part of .

raw_song['new_col'] = 0

In this way , I can't specify the location of the new column , But it's another useful way to do that .

Delete rows from data frames 、 Column

If you want to delete some rows or columns , It's very simple , Just delete them .

Just specify the axis to delete ( Behavior 0, As a 1) And the name of the row or column to delete , It's time to start !


Rename index or column

If you want to dataframe The index of is changed to dataframe The other columns in , Please use .set_index And specify the name of the column in brackets . however , If you know exactly what to name the index , Please use .rename Method .


To be on the list , Please be there. .rename Method to specify the column to rename and in the {} The name you want to name it in , It's like renaming an index .


How to iterate data frames

A lot of times , When you process data in a data frame , You need to change the data in some way and iterate over all the values in the data frame . The easiest way is in pandas Built in for loop :

for index, col in raw_song.iterrows():
# Manipulate the data here 

How to write data frames to a file

After completing all operations on the data frame , Now it's time to export data frames , So that it can be sent to other places . Similar to importing a dataset from a file , Now it's the opposite .Pandas There are many different file types , You can write data frames into it , But the most common is to write it into csv file .


Now you know Pandas And the basic knowledge of data frames . These are very powerful tools in the data analysis toolbox .

Link to the original text :

Welcome to join us AI Blog station :

sklearn Machine learning Chinese official documents :

Welcome to pay attention to pan Chuang blog resource summary station :

本文为[Pan Chuang AI]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database