## Python, numpy and other common functions of data processing and visualization in machine learning

qinjianhuang 2020-11-13 05:26:11
python numpy common functions data

### np.tile()

tile（） Equivalent to copying the current row element or column element

import numpy as np
m1 = np.array([1, 2, 3, 4])
# Make two copies of the line , Columns are copied once into a new array
print(np.tile(m1, (2, 1)))
print("===============")
# Make a copy of the line , Columns are copied twice into a new array
print(np.tile(m1, (1, 2)))
print("===============")
# Make two copies of the line , Columns are copied twice into a new array
print(np.tile(m1, (2, 2)))

Output ：

D:\Python\python.exe E:/ML_Code/test_code.py
[[1 2 3 4]
[1 2 3 4]]
===============
[[1 2 3 4 1 2 3 4]]
===============
[[1 2 3 4 1 2 3 4]
[1 2 3 4 1 2 3 4]]

### sum()

sum Function is to sum the elements , For two-dimensional array or above, according to the parameters axis To sum rows and columns, respectively ,axis=0 For sum by column ,axis=1 On behalf of the bank to sum up .

import numpy as np
m1 = np.array([1, 2, 3, 4])
# Sum elements one by one
print(sum(m1))
m2 = np.array([[6, 2, 2, 4], [1, 2, 4, 7]])
print(m2.sum(axis=0))
print(m2.sum(axis=1))

Output ：

D:\Python\python.exe E:/ML_Code/test_code.py
10
[ 7 4 6 11]
[14 14]
Process finished with exit code 0

### shape and reshape

import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a.shape)
b = np.reshape(a, 6)
print(b)
# -1 It is based on the size of the array to automatically infer dimensions
c = np.reshape(a, (3, -1)) # The value specified for will be inferred as 2
print(c)

Output ：

D:\python-3.5.2\python.exe E:/ML_Code/test_code.py
(2, 3)
---
[1 2 3 4 5 6]
---
[[1 2]
[3 4]
[5 6]]


### numpy.random.rand

import numpy as np
# Create an array of a given type , Fill it in a random sample of uniform distribution [0, 1) in
print(np.random.rand(3))
print(np.random.rand(2, 2))


Output ：

D:\python-3.5.2\python.exe E:/ML_Code/test_code.py
[ 0.03568079 0.68235136 0.64664722]
---
[[ 0.43591417 0.66372315]
[ 0.86257381 0.63238434]]


### zip()

zip() Function to take iteratable objects as parameters , Package the corresponding elements in the object into tuples , Then return a list of these tuples .
If the number of elements in each iterator is inconsistent , Returns a list of the same length as the shortest object , utilize * The operator , Tuples can be unzipped into lists .

import numpy as np
a1 = np.array([1, 2, 3, 4])
a2 = np.array([11, 22, 33, 44])
z = zip(a1, a2)
print(list(z))


Output ：

D:\Python\python.exe E:/ML_Code/test_code.py
[(1, 11), (2, 22), (3, 33), (4, 44)]
Process finished with exit code 0


Be careful ： stay python 3 In later versions zip() Is an iterable object , It must be included in a list in , Easy to show all results at once . Otherwise, the following error will be reported ：

<zip object at 0x01FB2E90>

### Matrix correlation

import numpy as np
# Generate random matrix
myRand = np.random.rand(3, 4)
print(myRand)
# Generate unit matrix
myEye = np.eye(3)
print(myEye)
from numpy import *
# Sum all elements of a matrix
myMatrix = mat([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(sum(myMatrix))
# Calculate the rank of the matrix
print(linalg.det(myMatrix))
# Calculate the inverse of the matrix
print(linalg.inv(myMatrix))


Be careful ：

from numpy import *
import numpy as np
vector1 = mat([[1, 2], [1, 1]])
vector2 = mat([[1, 2], [1, 1]])
vector3 = np.array([[1, 2], [1, 1]])
vector4 = np.array([[1, 2], [1, 1]])
# Python Self contained mat The operation rule of matrix is that both of them operate according to the rule of matrix multiplication
print(vector1 * vector2)
# Python Self contained mat The operation rule of matrix is that both of them operate according to the rule of matrix multiplication
print(dot(vector1, vector2))
# numpy In multiplication "*" It's array elements that are calculated one by one
print(vector3 * vector4)
# numpy In multiplication dot It's based on the rules of matrix multiplication
print(dot(vector3, vector4))

Output ：

D:\python-3.5.2\python.exe D:/PyCharm/py_base/py_numpy.py
[[3 4]
[2 3]]
---
[[3 4]
[2 3]]
---
[[1 4]
[1 1]]
---
[[3 4]
[2 3]]


### Vector correlation

Two n Dimension vector A(X11,X12,X13,...X1n) $A(X_{11},X_{12},X_{13},...X_{1n})$ And B(X21,X22,X23,...X2n) $B(X_{21},X_{22},X_{23},...X_{2n})$ The Euclidean distance between ：

d12=k=1n(x1kx2k)2

In the form of vector operations ：

d12=(AB)(AB)T

from numpy import *
# Calculate the Euclidean distance of two vectors
vector1 = mat([1, 2])
vector2 = mat([3, 4])
print(sqrt((vector1 - vector2) * ((vector1 - vector2).T)))

### Probability correlation

from numpy import *
import numpy as np
arrayOne = np.array([[1, 2, 3, 4, 5], [7, 4, 3, 3, 3]])
# Calculate the average of the first column
mv1 = mean(arrayOne)
# Calculate the average of the second column
mv2 = mean(arrayOne)
# Calculate the standard deviation of the first column
dv1 = std(arrayOne)
# Calculate the standard deviation of the second column
dv2 = std(arrayOne)
print(mv1)
print(mv2)
print(dv1)
print(dv2)