Hands on pandas, take you to play with data (1) -- detailed explanation of pandas data structure with examples

Look at the future 2021-04-06 17:05:17
hands pandas play data detailed


 Insert picture description here

About pandas

When I wrote a small project yesterday , Want to use pandas Write data to Excel In the middle , It turns out that the one I wrote pandas The course is really rubbish .
Still, , I decided to rewrite a .

pandas Founder right pandas Explanation

stay pandas Its official website (Python Data Analysis Library) On , We can see a passage pandas founder Wes McKinney Yes pandas Explanation , From the founder's point of view, we can directly understand pandas This python The main characteristics and development direction of the data analysis library .

McKinney A total of 9 A feature , Let's go through it one by one .

1. The reading and output speed of table type data is very fast .( Personal comparison excel and pandas, You bet pandas It won't crash ....) In his presentation , We can see that reading 489597 That's ok ,6 The data of the column is just 0.9s.
2. Time series processing . Often used in financial applications .
3. Data queue . You can perform basic operations on the data of different queues .
4. Processing missing data .
5. Grouping operations . For example, we were in the Titanic ahead groupby.
6. Hierarchical index .
7. Merging and adding data .
8. PivotTable .
9. Data induction and Analysis .

pandas The heat of the

 Insert picture description here

pandas The reason for this heat , It has something to do with all of you here !!!


pandas For data analysis

pandas Fully support the R & D process of data analysis project :
 Insert picture description here


pandas Data structure introduction

I learned before pandas, It's access , And then there's the processing , There's no way to get to the back , Learn about data structure , It's not that I said , I really don't know how the teacher arranged the class ?

pandas Deal with the following data structure :

 series (Series)
Data frame (DataFrame)
panel (Panel)

Tell the truth , I've never been in contact with the third one .

perhaps , Let's understand it another way :Series It's one-dimensional ,FataFrame It's two-dimensional ,Panel It's three-dimensional .

data structure Dimensions describe
Sequence 1 1D A homogeneous array of markers ,sizeimmutable.
Data frame 2 General two-dimensional labels , Variable size table structure , Columns with potentially non-uniform types .
panel 3 commonly 3D label , Variable size arrays .

Series

A series is a one-dimensional array structure with uniform data .( To put it bluntly, it's arrays )
 Insert picture description here

Generate Series:

import numpy as np
import pandas as pd
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)

DataFrame

DataFrame, I've already said that , It's just a two-dimensional array .

Let's take a look at the generation DataFrame The way , Shame! , The previous series hasn't completely talked about the content of this section .

use Series Dictionary object generation DataFrame:

df = pd.DataFrame(
{

'A': 1.,
'B': pd.Timestamp('20130102'),
'C': pd.Series(1, index=list(range(4)), dtype='float32'),
'D': np.array([3] * 4, dtype='int32'),
'E': pd.Categorical(["test", "train", "test", "train"]),
'F': 'foo'
}
)

The object generated in this way is like this :

 A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo

It's equivalent to the column by column insertion just now .

If you want to insert line by line ?

from numpy Import data :

df = pd.DataFrame([[1,5,8],[2,np.nan,np.nan],[2,3,np.nan],[np.nan,np.nan,np.nan]])

That's good. .

 0 1 2
0 1.0 5.0 8.0
1 2.0 NaN NaN
2 2.0 3.0 NaN
3 NaN NaN NaN

pandas Data structure method detailed explanation

Series

Pandas The sequence can be created using the following constructor :

pandas.Series( data, index, dtype, copy)

Parameter interpretation :

data: Data takes various forms , Such as :ndarray,list,constants
index: Index values must be unique and hashed , The same length as the data . Default np.arange(n) If no index is passed .
dtype:dtype For data types . without , The data type will be inferred
copy: Copy the data , The default is false.

Create sequence

Create an empty sequence :s = pd.Series()


from ndarray Create a sequence :

data = np.array(['a','b','c','d'])
s = pd.Series(data)
0 a
1 b
2 c
3 d
dtype: object

If the data is ndarray, Then the index passed must have the same length . If no index is passed , So by default , The index will be range(n)

data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
100 a
101 b
102 c
103 d
dtype: object

Create a sequence from the dictionary :

data = {
'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
a 0.0
b 1.0
c 2.0
dtype: float64

One Dictionaries Can be passed as input , If no index is specified , Then the dictionary keys will be indexed in the sort order . If Indexes Delivered , Indexes The data value corresponding to the tag in will be taken out .

data = {
'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

Index order persists , The missing element uses NaN( Not numbers ) fill .


Create a sequence from a scalar :

s = pd.Series(5, index=[0, 1, 2, 3])
0 5
1 5
2 5
3 5
dtype: int64

Access sequence

Accessing data from a sequence of locations :

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0])
print(s[:3])

Use tags to retrieve data ( Indexes ):

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s['a'])
print(s[['a','c','d']])
print(s['f']) # Trigger exception 

DataFrame

You can use the following constructor to create a pandas DataFrame:

pandas.DataFrame( data, index, columns, dtype, copy)

Parameter interpretation :

 Parameters and instructions
data: Data comes in various forms , Such as ndarray, Sequence , Map , list , Dictionaries , Constant and another DataFrame.
index: For row labels , If no index is passed , The index to be used for the result frame is an optional default value np.arrange(n).
columns: For column labels , The optional default syntax is - np.arrange(n). This is true only if it does not pass the index .
dtype: The data type of each column .
copy: If the default is False, Then use the command ( Or other ) Copy the data .

establish DataFrame

Create an empty DataFrame:df = pd.DataFrame()


Create a... From the list DataFrame:

data = [1,2,3,4,5]
df = pd.DataFrame(data)
0
0 1
1 2
2 3
3 4
4 5
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0

from ndarrays / Lists Of Dict Create a DataFrame:

data = {
'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky

Create a... From the list DataFrame:

data = [{
'a': 1, 'b': 2},{
'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
a b c
0 1 2 NaN
1 5 10 20.0
df = pd.DataFrame(data, index=['first', 'second'])
a b c
first 1 2 NaN
second 5 10 20.0

The list of dictionaries can be passed as input data to create DataFrame. The dictionary key defaults to the column name .


Create a... From the sequence dictionary DataFrame:

d = {
'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

visit DataFrame

Column processing
d = {
'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df['one'])

Liezeng :

df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)
df['four']=df['one']+df['three']
print(df)
# Don't look ahead , There is no suspense , Just look at the last output 
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN

Delete :

del df['one']
print(df)
df.pop('two')
print(df)

Line processing

Select by tag :

d = {
'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(f.loc['b'])
one 2.0
two 2.0
Name: b, dtype: float64

Select by number of rows :

print(df.iloc[2])
print(df[2:4])

Xingzeng :

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)
print(df)
a b
0 1 2
1 3 4
0 5 6
1 7 8

Delete the line :
Use index tags from DataFrame Delete or delete lines from . If the label is reused , Multiple rows will be deleted .

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2) # Notice the line labels above 
df = df.drop(0)
print(df)
a b
1 3 4
1 7 8

In the example above , Two lines are deleted , Because these two lines contain the same label 0.


panel

Panels can be created using the following constructor :

pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)

Parameter interpretation :

data: Data comes in various forms , Such as ndarray, Sequence , Map , list , Dictionaries , Constant and another DataFrame
items:axis=0
major_axis:axis=1
minor_axis:axis=2
dtype: The data type of each column
copy: Copy the data . Default , **false**

establish Panel

Panels can be created in many ways :

 from ndarrays
come from DataFrames Dictionary

I don't talk too much about this module , After all, I really haven't used it .


from 3D ndarray establish :

data = np.random.rand(2,4,5)
p = pd.Panel(data)
print(p)
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4

come from DataFrame Dictionary of objects :

data = {
'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print(p)
# The same as above 

Create an empty panel :

p = pd.Panel()
<class 'pandas.core.panel.Panel'>
Dimensions: 0 (items) x 0 (major_axis) x 0 (minor_axis)
Items axis: None
Major_axis axis: None
Minor_axis axis: None

from panel Select data

data = {
'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print(p['Item1'])
0 1 2
0 0.488224 -0.128637 0.930817
1 0.417497 0.896681 0.576657
2 -2.775266 0.571668 0.290082
3 -0.400538 -0.144234 1.110535

Use major_axis:

print(p.major_xs(1))
Item1 Item2
0 -0.128637 -1.047032
1 0.896681 -0.557322
2 0.571668 0.431953
3 -0.144234 1.302466

Basic methods quick check

Series The basic method

Properties or methods describe
axes Returns the list of row axis labels .
dtype Return object's dtype.
empty If series It's empty , Then return to True.
ndim According to the definition 1 Returns the number of dimensions of the underlying data .
size Returns the number of elements in the underlying data .
values Take this sequence as ndarray return .
head() Return to the former n That's ok .
tail() Back to the end n That's ok .

DataFrame The basic method

Properties or methods describe
Ť Transpose rows and columns .
axes Return the list with row axis label and column axis label as the only members .
dtypes Returns... In this object dtypes.
empty If NDFrame Completely empty [ There are no projects ], Then for true; If the length of any axis is 0.
ndim Axis / The number of array sizes .
shape Return to indicate DataFrame Tuples of dimensions .
size NDFrame The number of elements in .
values NDFrame Of Numpy Express .
head() Return to the former n That's ok .
tail() Back to the end n That's ok .

Good product recommendation

Found a few CSDN Good content of the College , If you like, you can collect it .

Big data search :Python Big data coding practice
Python Data analysis and mining
Python Enterprise recruitment , Million level information crawling
Python Data cleaning practice

No more .


I'm starving to death , Here we are today , I'm in a hurry to grab food in the canteen ...

 Insert picture description here

 Insert picture description here

版权声明
本文为[Look at the future]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210406170401254o.html

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database