Hands on pandas, take you to play with data (1) -- detailed explanation of pandas data structure with examples

Look at the future 2021-04-06 17:05:17
hands pandas play data detailed


 Insert picture description here

About pandas

When I wrote a small project yesterday , Want to use pandas Write data to Excel In the middle , It turns out that the one I wrote pandas The course is really rubbish .
Still, , I decided to rewrite a .

pandas Founder right pandas Explanation

stay pandas Its official website (Python Data Analysis Library) On , We can see a passage pandas founder Wes McKinney Yes pandas Explanation , From the founder's point of view, we can directly understand pandas This python The main characteristics and development direction of the data analysis library .

McKinney A total of 9 A feature , Let's go through it one by one .

1. The reading and output speed of table type data is very fast .( Personal comparison excel and pandas, You bet pandas It won't crash ....) In his presentation , We can see that reading 489597 That's ok ,6 The data of the column is just 0.9s.
2. Time series processing . Often used in financial applications .
3. Data queue . You can perform basic operations on the data of different queues .
4. Processing missing data .
5. Grouping operations . For example, we were in the Titanic ahead groupby.
6. Hierarchical index .
7. Merging and adding data .
8. PivotTable .
9. Data induction and Analysis .

pandas The heat of the

 Insert picture description here

pandas The reason for this heat , It has something to do with all of you here !!!


pandas For data analysis

pandas Fully support the R & D process of data analysis project :
 Insert picture description here


pandas Data structure introduction

I learned before pandas, It's access , And then there's the processing , There's no way to get to the back , Learn about data structure , It's not that I said , I really don't know how the teacher arranged the class ?

pandas Deal with the following data structure :

 series (Series)
Data frame (DataFrame)
panel (Panel)

Tell the truth , I've never been in contact with the third one .

perhaps , Let's understand it another way :Series It's one-dimensional ,FataFrame It's two-dimensional ,Panel It's three-dimensional .

data structure Dimensions describe
Sequence 1 1D A homogeneous array of markers ,sizeimmutable.
Data frame 2 General two-dimensional labels , Variable size table structure , Columns with potentially non-uniform types .
panel 3 commonly 3D label , Variable size arrays .

Series

A series is a one-dimensional array structure with uniform data .( To put it bluntly, it's arrays )
 Insert picture description here

Generate Series:

import numpy as np
import pandas as pd
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)

DataFrame

DataFrame, I've already said that , It's just a two-dimensional array .

Let's take a look at the generation DataFrame The way , Shame! , The previous series hasn't completely talked about the content of this section .

use Series Dictionary object generation DataFrame:

df = pd.DataFrame(
{

'A': 1.,
'B': pd.Timestamp('20130102'),
'C': pd.Series(1, index=list(range(4)), dtype='float32'),
'D': np.array([3] * 4, dtype='int32'),
'E': pd.Categorical(["test", "train", "test", "train"]),
'F': 'foo'
}
)

The object generated in this way is like this :

 A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo

It's equivalent to the column by column insertion just now .

If you want to insert line by line ?

from numpy Import data :

df = pd.DataFrame([[1,5,8],[2,np.nan,np.nan],[2,3,np.nan],[np.nan,np.nan,np.nan]])

That's good. .

 0 1 2
0 1.0 5.0 8.0
1 2.0 NaN NaN
2 2.0 3.0 NaN
3 NaN NaN NaN

pandas Data structure method detailed explanation

Series

Pandas The sequence can be created using the following constructor :

pandas.Series( data, index, dtype, copy)

Parameter interpretation :

data: Data takes various forms , Such as :ndarray,list,constants
index: Index values must be unique and hashed , The same length as the data . Default np.arange(n) If no index is passed .
dtype:dtype For data types . without , The data type will be inferred
copy: Copy the data , The default is false.

Create sequence

Create an empty sequence :s = pd.Series()


from ndarray Create a sequence :

data = np.array(['a','b','c','d'])
s = pd.Series(data)
0 a
1 b
2 c
3 d
dtype: object

If the data is ndarray, Then the index passed must have the same length . If no index is passed , So by default , The index will be range(n)

data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
100 a
101 b
102 c
103 d
dtype: object

Create a sequence from the dictionary :

data = {
'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
a 0.0
b 1.0
c 2.0
dtype: float64

One Dictionaries Can be passed as input , If no index is specified , Then the dictionary keys will be indexed in the sort order . If Indexes Delivered , Indexes The data value corresponding to the tag in will be taken out .

data = {
'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

Index order persists , The missing element uses NaN( Not numbers ) fill .


Create a sequence from a scalar :

s = pd.Series(5, index=[0, 1, 2, 3])
0 5
1 5
2 5
3 5
dtype: int64

Access sequence

Accessing data from a sequence of locations :

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0])
print(s[:3])

Use tags to retrieve data ( Indexes ):

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s['a'])
print(s[['a','c','d']])
print(s['f']) # Trigger exception 

DataFrame

You can use the following constructor to create a pandas DataFrame:

pandas.DataFrame( data, index, columns, dtype, copy)

Parameter interpretation :

 Parameters and instructions
data: Data comes in various forms , Such as ndarray, Sequence , Map , list , Dictionaries , Constant and another DataFrame.
index: For row labels , If no index is passed , The index to be used for the result frame is an optional default value np.arrange(n).
columns: For column labels , The optional default syntax is - np.arrange(n). This is true only if it does not pass the index .
dtype: The data type of each column .
copy: If the default is False, Then use the command ( Or other ) Copy the data .

establish DataFrame

Create an empty DataFrame:df = pd.DataFrame()


Create a... From the list DataFrame:

data = [1,2,3,4,5]
df = pd.DataFrame(data)
0
0 1
1 2
2 3
3 4
4 5
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0

from ndarrays / Lists Of Dict Create a DataFrame:

data = {
'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky

Create a... From the list DataFrame:

data = [{
'a': 1, 'b': 2},{
'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
a b c
0 1 2 NaN
1 5 10 20.0
df = pd.DataFrame(data, index=['first', 'second'])
a b c
first 1 2 NaN
second 5 10 20.0

The list of dictionaries can be passed as input data to create DataFrame. The dictionary key defaults to the column name .


Create a... From the sequence dictionary DataFrame:

d = {
'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

visit DataFrame

Column processing
d = {
'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df['one'])

Liezeng :

df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)
df['four']=df['one']+df['three']
print(df)
# Don't look ahead , There is no suspense , Just look at the last output 
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN

Delete :

del df['one']
print(df)
df.pop('two')
print(df)

Line processing

Select by tag :

d = {
'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(f.loc['b'])
one 2.0
two 2.0
Name: b, dtype: float64

Select by number of rows :

print(df.iloc[2])
print(df[2:4])

Xingzeng :

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)
print(df)
a b
0 1 2
1 3 4
0 5 6
1 7 8

Delete the line :
Use index tags from DataFrame Delete or delete lines from . If the label is reused , Multiple rows will be deleted .

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2) # Notice the line labels above 
df = df.drop(0)
print(df)
a b
1 3 4
1 7 8

In the example above , Two lines are deleted , Because these two lines contain the same label 0.


panel

Panels can be created using the following constructor :

pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)

Parameter interpretation :

data: Data comes in various forms , Such as ndarray, Sequence , Map , list , Dictionaries , Constant and another DataFrame
items:axis=0
major_axis:axis=1
minor_axis:axis=2
dtype: The data type of each column
copy: Copy the data . Default , **false**

establish Panel

Panels can be created in many ways :

 from ndarrays
come from DataFrames Dictionary

I don't talk too much about this module , After all, I really haven't used it .


from 3D ndarray establish :

data = np.random.rand(2,4,5)
p = pd.Panel(data)
print(p)
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4

come from DataFrame Dictionary of objects :

data = {
'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print(p)
# The same as above 

Create an empty panel :

p = pd.Panel()
<class 'pandas.core.panel.Panel'>
Dimensions: 0 (items) x 0 (major_axis) x 0 (minor_axis)
Items axis: None
Major_axis axis: None
Minor_axis axis: None

from panel Select data

data = {
'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print(p['Item1'])
0 1 2
0 0.488224 -0.128637 0.930817
1 0.417497 0.896681 0.576657
2 -2.775266 0.571668 0.290082
3 -0.400538 -0.144234 1.110535

Use major_axis:

print(p.major_xs(1))
Item1 Item2
0 -0.128637 -1.047032
1 0.896681 -0.557322
2 0.571668 0.431953
3 -0.144234 1.302466

Basic methods quick check

Series The basic method

Properties or methods describe
axes Returns the list of row axis labels .
dtype Return object's dtype.
empty If series It's empty , Then return to True.
ndim According to the definition 1 Returns the number of dimensions of the underlying data .
size Returns the number of elements in the underlying data .
values Take this sequence as ndarray return .
head() Return to the former n That's ok .
tail() Back to the end n That's ok .

DataFrame The basic method

Properties or methods describe
Ť Transpose rows and columns .
axes Return the list with row axis label and column axis label as the only members .
dtypes Returns... In this object dtypes.
empty If NDFrame Completely empty [ There are no projects ], Then for true; If the length of any axis is 0.
ndim Axis / The number of array sizes .
shape Return to indicate DataFrame Tuples of dimensions .
size NDFrame The number of elements in .
values NDFrame Of Numpy Express .
head() Return to the former n That's ok .
tail() Back to the end n That's ok .

Good product recommendation

Found a few CSDN Good content of the College , If you like, you can collect it .

Big data search :Python Big data coding practice
Python Data analysis and mining
Python Enterprise recruitment , Million level information crawling
Python Data cleaning practice

No more .


I'm starving to death , Here we are today , I'm in a hurry to grab food in the canteen ...

 Insert picture description here

 Insert picture description here

版权声明
本文为[Look at the future]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210406170401254o.html

  1. 商业数据分析从入门到入职(7)Python基础数据结构及其操作
  2. 商业数据分析从入门到入职(6)Python程序结构和函数
  3. Business data analysis from entry to entry (9) Python Network Data Acquisition
  4. Business data analysis from entry to entry (8) Python module, file IO and object oriented
  5. Business data analysis from entry to entry (7) Python basic data structure and its operation
  6. Business data analysis from entry to entry (6) Python program structure and function
  7. 简简单单实现 Python Web 的登录注册页面,还包含一半逻辑。
  8. Simple implementation of Python web login registration page, but also contains half of the logic.
  9. 什么是pip?Python新手入门指南
  10. What is PIP? Getting started with Python
  11. Python uses for... Else to jump out of double nested loop
  12. Python基础之:Python中的内部对象
  13. 人工智能入门:Python实现机器学习
  14. The foundation of Python: inner objects in Python
  15. Introduction to artificial intelligence: machine learning in Python
  16. Python基础之:Python中的内部对象
  17. The foundation of Python: inner objects in Python
  18. Python 小技之 Office 文件转 PDF
  19. 还在为多张Excel汇总统计发愁?Python 秒处理真香!
  20. 用 Python 制作音乐聚合下载器
  21. Spark Delta Lake 0.4.0 发布,支持 Python API 和部分 SQL
  22. How to transfer office files to PDF
  23. Are you still worried about multiple excel summary statistics? Python second processing really fragrant!
  24. Making music aggregate downloader with Python
  25. Spark delta Lake 0.4.0 is released, supporting Python API and part of SQL
  26. Python信息搜集
  27. Python information gathering
  28. Python - 关于类(self/cls) 以及 多进程通讯的思考
  29. Python - thinking about class (self / CLS) and multi process communication
  30. Python - 关于类(self/cls) 以及 多进程通讯的思考
  31. Python - thinking about class (self / CLS) and multi process communication
  32. Python信用评分卡建模(附代码)
  33. Python credit score card modeling (with code)
  34. 学Python需要学数据库吗?Python学习教程!
  35. Do you need to learn database to learn Python!
  36. Python私有变量如何定义?Python学习教程!
  37. How to define Python private variables? Python tutorial!
  38. Python数据分析入门(六):Pandas的函数应用
  39. Introduction to Python data analysis (6): function application of pandas
  40. 学Python需要学数据库吗?Python学习教程!
  41. Do you need to learn database to learn Python!
  42. Python描述 LeetCode 80. 删除有序数组中的重复项 II
  43. C++/python描述 AcWing 94. 递归实现排列型枚举
  44. C++/python描述 AcWing 92. 递归实现指数型枚举
  45. Python描述 LeetCode 88. 合并两个有序数组
  46. 苏州大学计算机考研 复试机试真题2013-2021真题及Python题解
  47. Python描述 LeetCode 781. 森林中的兔子
  48. 字典和json的区别是什么?Python学习
  49. Python describes leetcode 80. Removing duplicate items from ordered arrays II
  50. C + + / Python description acwing 94. Recursive implementation of permutation enumeration
  51. C + + / Python description acwing 92. Recursive implementation of exponential enumeration
  52. Python describes leetcode 88. Merging two ordered arrays
  53. Real computer test questions 2013-2021 of computer postgraduate entrance examination of Soochow University and python solutions
  54. The rabbit in the forest
  55. Python中的魔法属性
  56. What's the difference between dictionary and JSON? Python learning
  57. Magic properties in Python
  58. 字典和json的区别是什么?Python学习
  59. What's the difference between dictionary and JSON? Python learning
  60. python刷题-字母图形