## Introduction to Python data analysis (8): Pandas statistical calculation and description

Squirrels love biscuits 2021-04-07 16:28:20
introduction python data analysis pandas

Sample code ：

```arr1 = np.random.rand(4,3)
pd1 = pd.DataFrame(arr1,columns=list('ABC'),index=list('abcd'))
f = lambda x: '%.2f'% x
pd2 = pd1.applymap(f).astype(float)
pd2```

Running results ：

``` A B C
a 0.87 0.26 0.67
b 0.69 0.89 0.17
c 0.94 0.33 0.04
d 0.35 0.46 0.29```

# Common statistical calculation

### sum, mean, max, min…

axis=0 Count by column ,axis=1 Count by line

skipna Exclude missing values , The default is True

Sample code ：

```pd2.sum() # By default, this column is Series Calculation , Sum all rows
pd2.sum(axis='columns') # Specify the sum of all columns in each row
pd2.idxmax()# Look at the label index of the maximum value of all rows in each column , We can also go through axis='columns' Find the label index of the maximum value of all columns in each row ```

Running results ：

```A 2.85
B 1.94
C 1.17
dtype: float64
a 1.80
b 1.75
c 1.31
d 1.10
dtype: float64
A c
B b
C a
dtype: object```

# Common statistical description

### describe Generate multiple statistics

Sample code ：

`pd2.describe()# View summary `

Running results ：

``` A B C
count 4.000000 4.00000 4.000000
mean 0.712500 0.48500 0.292500
std 0.263613 0.28243 0.271585
min 0.350000 0.26000 0.040000
25% 0.605000 0.31250 0.137500
50% 0.780000 0.39500 0.230000
75% 0.887500 0.56750 0.385000
max 0.940000 0.89000 0.670000
# percentage : Divide by the original amount
pd2.pct_change() # Look at the percentage change in the row , Specify the same axis='columns' Percentage change from column to column
A B C
a NaN NaN NaN
b -0.206897 2.423077 -0.746269
c 0.362319 -0.629213 -0.764706
d -0.627660 0.393939 6.250000```

### Common statistical description methods

https://pythonmana.com/2021/04/20210407161919430k.html