2W + word long article, an article on literacy python, numpy and pandas, recommended collection!

Python learning and data mining 2021-10-28 17:53:56
2w word long article article

As an easy to learn programming language , It's easy to get started , Today we have a super long article , One time literacy Python、NumPy and Pandas, It is provided at the end of the paper Python Technology exchange group , Welcome to join , Like this article , give the thumbs-up 、 Collection .

Build a language environment

Let's first learn how to install and build Python Language environment

Python Version selection

Current popular Python There are two versions ,2.X and 3.X, because 2.X Will no longer be maintained , So I suggest using 3.X Version as your main version .

IDE The choice of

At present, there are many popular Python Editor , such as Sublime,Notebook++ etc. , But I still recommend the following two

PyCharm: This is a cross platform Python development tool , Not only have regular debugging 、 Syntax highlighting , Intelligent prompt and other functions , It also comes with multiple database connectors , So that you can be handy when debugging the database , No longer busy downloading various database clients everywhere .

Jupyter: This is a web An online editor , Run one line of code at a time , You can get immediate results , Very convenient , In the code debugging phase , Unlimited use .

Python Software installation

If you are Linux perhaps MacOS operating system , Then they usually bring their own Python2.6 Version of . If you want to install 3.X Version of , You need to compile and install yourself , without Linux If the operation is basic , Recommended or used Windows.

If it is Windows operating system , You can go straight to Python Download from the official website .exe Installation package , The next step along the way is to complete the installation .

Python Basic grammar

Hello World

I believe everyone has this experience , When learning any language , The entry is the output Hello World, Now let's see how to use Python To input Hello World

print("Hello World")
sum = 1 + 2
print("sum = %d" %sum)
>>>
Hello World
sum = 3

print function , Used to print out on the console ,sum = The syntax is to declare variables and assign values ,%d Is used for string replacement .

Data types and variables

list

list1 = ["1", "2", "test"]
print(list1)
list1.append("hello")
print(lists)
>>>
['1', '2', 'test']
['1', '2', 'test', 'hello']

list yes Python Built in data type , It's an orderly collection , You can add and remove elements at any time .

Tuples

tuple1 = ("zhangsan", "lisi")
print(tuple1[0])
>>>
zhangsan

tuple and list Very similar , however tuple Once initialized, it cannot be modified .

Dictionaries

dict1 = {
"name1": "zhangsan", "name2": "lisi", "name3": "wangwu"}
dict1["name1"]
>>>
'zhangsan'

Python Built in dictionary :dict Full name dictionary, Also known in other languages as map, Use the key - value (key-value) Storage , With extremely fast search speed .

aggregate

s = set([1, 2, 3])
print(s)
>>>
{
1, 2, 3}

set and dict similar , It's also a group. key Set , But no storage. value. because key Can't repeat , therefore , stay set in , No repeat key.

Variable

The concept of variable is basically consistent with the equation variable of junior high school algebra , It's just in a computer program , Variables can be more than numbers , It can also be any data type .

a = 1
a = 3
print(a)
>>>
3

conditional

age = 30
if age >= 18:
print('your age is', age)
print('good')
else:
Print('your are not belong here')
>>>
your age is 30
good

if … else… Is a very classic conditional judgment statement ,if Followed by a conditional expression , If set up , Then execute the following statement , Otherwise execution else Subsequent statements . At the same time, pay attention to ,Python Language uses code indentation to judge code blocks , Usually four spaces or one tab, Don't mix the two .

Loop statement

names = {
"zhangsan", "lisi", "wangwu"}
for name in names:
print(name)
>>>
lisi
zhangsan
wangwu

names It's a collection , Is an iterative object , Use for loop ,name Will be assigned to names The element value in .

sum = 0
n = 99
while n > 0:
sum = sum + n
n = n - 2
print(sum)
>>>
2500

Inside the loop, variables n Constantly decreasing , Until it becomes -1 when , No longer satisfied while Conditions , Loop exit .

Advanced features

section

L = ['zhangsan', 'lisi', 'wangwu', 'zhaoliu']
print(L[1])
print(L[1:3])
>>>
lisi
['lisi', 'wangwu']

Python in , All subscripts are from 0 At the beginning , And they are all left closed and right open intervals

iteration

For a list of 、 Tuples and dictionaries , Are all iteratable objects , have access to for To iterate

L = ['zhangsan', 'lisi', 'wangwu', 'zhaoliu']
D = {
"zhangsan":1, "lisi": 2, "wangwu": 3, "zhaoliu": 4}
for l in L:
print(l)
print('\n')
for k,v in D.items():
print(" key :", k, ",", " value ", v)
>>>
zhangsan
lisi
wangwu
zhaoliu key : zhangsan , value 1
key : lisi , value 2
key : wangwu , value 3
key : zhaoliu , value 4

For dictionaries , Use items(), But traversing key value pairs at the same time

function

Call function

Python Built in a lot of useful functions , We can call .

>>> abs(100)
100
>>> abs(-20)
20
>>> abs(12.34)
12.34
>>> max(1, 2)
2
>>> max(2, 3, 1, -5)
3

When the function is called , If there is a problem with the parameters passed in , The program throws an exception .
It contains Python All built-in functions in :
https://docs.python.org/zh-cn/3/library/functions.html

Defined function

stay Python in , Define a function to use def sentence , Write the function names in turn 、 Brackets 、 Parameters and colons in brackets :, then , Write function bodies in indented blocks , The return value of the function is return Statement returns .

def add(num1, num2):
return num1 + num2result = add(1,2)
print(result)
>>>
3

In the code , It's defined as add Function of , It takes in two parameters , And will return to their sum . After function definition , You can use the function name followed by () To call , If the function has a return value , You can assign a variable to receive .

modular

Call module

Python There are many very useful modules built in , As long as the installation is complete , These modules can be used immediately .

import time
def sayTime():
now = time.time()
return nownowtime = sayTime()
print(nowtime)
>>>
1566550687.642805

Use import To import modules , After that, we can call various method variables provided by the module .

A module is simply a collection of tools , Of course we can write some tools ourselves , Then form your own module , For later programming .

We write our own modules , The general directory structure is as follows

mytest
├─ __init__.py
├─ test1.py
└─ test2.py

Now we can reference and call these two... In other files test The tool file is

import mytest
mytest.test1

You should have noticed __init__.py file , This file can be empty , Contains __init__.py The file folder is a ” package “(Package). If we need to reference files like above , It must include __init__.py file .
Install third party modules

stay Python in , Install third party modules , It's through package management tools pip Accomplished .

Generally speaking , Third party libraries will be in Python Official pypi.python.org Website registration , To install a third-party library , You must first know the name of the library , It can be on the official website or pypi On the search , such as Pillow The name of is Pillow, therefore , install Pillow The order is :

pip install Pillow

object-oriented programming

Classes and instances

The most important concept of object-oriented is class (Class) And examples (Instance), It's important to remember that classes are abstract templates , such as Student class , Instances are created one by one according to the class “ object ”, Each object has the same method , But the data may be different .

stay Python in , Use class Keyword to define the class

class Student(object):
pass

After defining the class , You can instantiate this class

zhangsan = Student()
zhangsan.age = 20
print(Student)
print(zhangsan)
print(zhangsan.age)
>>>
<class '__main__.Student'>
<__main__.Student object at 0x00EA7350>
20

here , Variable zhangsan It's class Student An example of . At the same time, we also give zhangsan Bound a property age And the assignment .

Keep in mind the three basic elements of object orientation : abstract , encapsulation , Inherit . If you don't have many ideas about these at present , It doesn't matter , You can experience it slowly in the later study .

IO Programming

Read the file , This is the operation that will be frequently used later , stay Python in , Use open Function can easily open a file

f = open('/Users/tanxin/test.txt', 'r')
f.read()
f.close()

Identifier ‘r’ Express reading , such , We successfully opened a file , And then use read Function to read the contents of the file , Last use close To close the file .
The file must be closed after use , Because file objects take up operating system resources , And the operating system can open a limited number of files at the same time

Use with To easily open files

with open('/Users/tanxin/test.txt', 'r') as f:
print(f.read())

with Statement helps us complete close The process of

File reading and readline() and readlins() Two functions .readline() Read one row at a time ,readlines() Read everything at once and return to a list by line .

Regular expressions

Regular expressions are a big subject , The content can be written in a single book , Let's just make a brief introduction here .

Python Provided in re Module to do regular

import re
str1 = "010-56765"
res = re.match(r'(\d{3})-(\d{5})', str1)
print(res)
print(res.group(0))
print(res.group(1))
print(res.group(2))
>>>
<re.Match object; span=(0, 9), match='010-56765'>
010-56765
010
56765

match() Method to determine whether it matches , If the match is successful , Return to one Match object , Otherwise return to None
coordination group Method , Can effectively extract the word string .

requests Library profile

requests library , It's a very common HTTP Network request Library , Later reptile lessons , We will use it a lot .

import requests
r = requests.get('https://www.baidu.com')
r = requests.post('http://test.com/post', data = {
'key':'value'})
payload = {
'key1': 'value1', 'key2': 'value2'}
r = requests.get("http://test.com/get", params=payload)

At this time r It's a response object , We can get relevant information from it

r.text # Get response content 
r.content # Read the response information in bytes 
response.encoding = "utf-8" # Change the code 
html = response.text # Get web content 
binary__content = response.content # Get binary data 
raw = requests.get(url, stream=True) # Get the original response content 
headers = {
'user-agent': 'my-test/0.1.1'} # Custom request header 
r = requests.get(url, headers=headers)
cookies = {
"cookie": "# your cookie"} # cookie Use 
r = requests.get(url, cookies=cookies)

This is just a brief introduction Python The grammar of , If you want to learn more , You need to spend more energy . But nothing in the world is difficult , I'm afraid I'm willing to climb . Don't stay at the beginning stage , Usually find more websites that brush questions , such as Leetcode,online Judge wait , While brushing the questions , More able to exercise their programming thinking and algorithm ability .

NunmPy

NumPy not only Python The most used libraries in scientific computing , still SciPy,Pandas Wait for the foundation of the library , It provides a more advanced and efficient data structure , It is a library specially designed for Scientific Computing .

NumPy Usually with SciPy(Scientific Python) and Matplotlib( Drawing library ) Use it together , This combination is widely used to replace MatLab, It's a powerful scientific computing environment , It helps us to get through Python Study data science or machine learning .

ndarray object

NumPy One of the most important features is its N Dimensional array object ndarray, It's a collection of data of the same type , With 0 The subscript is to start indexing the elements in the collection .

ndarray Internal composition

  • A point to data ( A piece of data in a memory or memory mapped file ) The pointer to

  • Data type or dtype, Describes a lattice with a fixed size value in an array

  • An array shape (shape) tuples , A tuple representing the size of each dimension

  • A span tuple (stride), The integer refers to the need to move forward to the next element of the current dimension ” Across “ Bytes of

The above concept , You can experience it slowly in the later study .

Create a ndarray Just call NumPy Of array Function

import numpy as np
a = np.array([1, 2, 2])
b = np.array([[1, 2], [5, 5], [7, 8]])
b[1,1]=10
print(a.shape)
print(b.shape)
print(a.dtype)
print(b)
>>>
(3,)
(3, 2)
int32
[[ 1 2]
[ 5 10]
[ 7 8]]

quote numpy library , call array Function to create ndarray.
To create a one-dimensional array, you only need to pass in a list, Create multidimensional arrays , You need to nest an array as an element first , Put it in another array .
extract array The elements in , You can use slicing operations ,b[1,1].
Use shape Property to get the shape of the array ( size ), Such as b The array is an array of three rows and two columns .
Use dtype Property to get the data type in the array .

data type

NumPy Supported data type ratio Python There are more built-in types , Here are some common types

name describe
bool_ Boolean data type (True perhaps False)
int_ Default integer type
int32 Integers (-2147483648 to 2147483647)
uint32 Unsigned integer (0 to 4294967295)
float32 Single-precision floating-point , Include :1 Sign bits ,8 One digit ,23 One last digit
float64 Double precision floating point , Include :1 Sign bits ,11 One digit ,52 One last digit

Data type object (dtype)

Data type objects can be used to create arrays that meet our expected data structures

numpy.dtype(object, align, copy)
  • object: Data type object to convert

  • align: If True, Fill in the fields to make them look like C The structure of the body

  • copy: Copy dtype object , If False, Is a reference to a built-in data type object

Use dtype Create a structure array

mydtype = np.dtype({

'names': ['name', 'age', 'sex'],
'formats': ['S32', 'i4', 'S32']
})
persons = np.array([
('zhangsan', 20, 'man'),
('lisi', 18, 'woman'),
('wangwu', 30, 'man')
],
dtype=mydtype)
print(persons)
>>>
[(b'zhangsan', 20, b'man') (b'lisi', 18, b'woman') (b'wangwu', 30, b'man')]

First, through dtype Function defines a structure type , And then use array Function to build an array ,dtype Parameters can be defined by us .

Array attribute

NumPy The dimension of an array is called rank (rank), The rank of one-dimensional array is 1, The rank of a two-dimensional array is 2, And so on .

stay NumPy in , Every linear array is called an axis (axis), That is dimension (dimensions). for instance , A two-dimensional array is equivalent to two one-dimensional arrays , Each element in the first one-dimensional array is another one-dimensional array . So a one-dimensional array is NumPy Axis in (axis), The first axis is the same as the underlying array , The second axis is the array in the underlying array . And the number of shafts —— Rank , That's the dimension of the array .

A lot of times you can declare axis.axis=0, Means following the 0 Operate the shaft , That is, to operate each column ;axis=1, Means following the 1 Operate the shaft , That is, to operate on each line .

The following lists the more important ndarray Object properties

attribute explain
ndim Rank , That is, the number of axes or dimensions
shape Dimension of array
size The total number of array elements
dtype Type of element
itemsize The size of each element , In bytes

Create a special array

An empty array

x = np.empty([3,2], dtype=int)
print(x)
>>>
[[0 0]
[0 0]
[0 0]]

numpy.empty Method to create a specified shape (shape)、 data type (dtype) And uninitialized array

0 Array

zero1 = np.zeros(5)
zero2 = np.zeros(4, dtype=int)
print(zero1)
print(zero2)
>>>
[0. 0. 0. 0. 0.]
[0 0 0 0]

1 Array

one1 = np.ones(3)
one2 = np.ones(4, dtype=float)
print(one1)
print(one2)
>>>
[1. 1. 1.]
[1. 1. 1. 1.]

Create an array from an existing array

numpy.asarray, From the list , Tuples , Multidimensional array create array

list1 = [1, 3, 5]
tuple1 = (1, 2, 3)
one = np.ones((2,3), dtype=int)
array1 = np.asarray(list1)
array2 = np.asarray(tuple1)
array3 = np.asarray(one)
print(array1)
print(array2)
print(array3)
>>>
[1 3 5]
[1 2 3]
[[1 1 1]
[1 1 1]]

numpy.frombuffer, Read in as a stream and convert it into an array

str1 = b"Hello world"
buffer1 = np.frombuffer(str1, dtype='S1')
print(buffer1)
>>>
[b'H' b'e' b'l' b'l' b'o' b' ' b'w' b'o' b'r' b'l' b'd']

numpy.fromiter, You can create arrays from iteratable objects

range1 = range(5)
iter1 = np.fromiter(range1, dtype=int)
print(iter1)
>>>
[0 1 2 3 4]

numpy.arange, Create an array from a range of values

myarray1 = np.arange(5)
print(myarray1)
>>>
[0 1 2 3 4]

numpy.linspace, Build an array of arithmetic sequences

myarray2 = np.linspace(1,9,5)
print(myarray2)
>>>
[1. 3. 5. 7. 9.]

Array operation

Slicing and indexing

ndarray The contents of an object can be accessed and modified by index or slice , And Python in list The slice operation is the same .

ndarray Arrays can be based on 0 - n Index the subscripts of , Slice objects through the built-in slice function , And set up start, stop And step Parameters , Cut a new array from the original array .
a = np.arange(10)
print(a)
s = slice(2,7,2) # From the index 2 Start to index 7 stop it , The interval is 2
print (a[s])
>>>
[0 1 2 3 4 5 6 7 8 9]
[2 4 6]

You can also use colons (:) To slice

a = np.arange(10)
print(a)
b = a[2:7:2] # From the index 2 Start to index 7 stop it , The interval is 2
print(b)
>>>
[0 1 2 3 4 5 6 7 8 9]
[2 4 6]

Modify array shape

nunpy.reshape, You can modify the array shape without changing the data

a = np.arange(6)
print(" The original array :", a)
b = a.reshape(3, 2)
print(" Array after transformation :", b)
>>>
The original array : [0 1 2 3 4 5]
Array after transformation : [[0 1]
[2 3]
[4 5]]

numpy.ndarray.flat, Is an array element iterator , You can process each element in turn

a = np.arange(9).reshape(3,3)
print (' The original array :')
for row in a:
print (row)
# Handle every element in the array , have access to flat attribute , This attribute is an array element iterator :
print (' Array after iteration :')
for element in a.flat:
print (element)
>>>
The original array :
[0 1 2]
[3 4 5]
[6 7 8]
Array after iteration :
0
1
2
3
4
5
6
7
8

Flip array

numpy.transpose, You can swap the dimensions of the array

a = np.arange(10).reshape(2, 5)
print(a)
b = a.transpose()
print(b)
>>>
[[0 1 2 3 4]
[5 6 7 8 9]]
[[0 5]
[1 6]
[2 7]
[3 8]
[4 9]]

Linked array

numpy.concatenate, Used to join two or more arrays of the same shape

a = np.array([[1,2],[3,4]])
print (' The first array :')
print (a)b = np.array([[5,6],[7,8]])
print (' The second array :')
print (b)# The dimensions of the two arrays are the same 
print (' Along axis 0 Concatenate two arrays :')
print (np.concatenate((a,b)))
print (' Along axis 1 Concatenate two arrays :')
print (np.concatenate((a,b),axis = 1))
>>>
The first array :
[[1 2]
[3 4]]
The second array :
[[5 6]
[7 8]]
Along axis 0 Concatenate two arrays :
[[1 2]
[3 4]
[5 6]
[7 8]]
Along axis 1 Concatenate two arrays :
[[1 2 5 6]
[3 4 7 8]]

Split array

numpy.split, You can split an array into subarrays

a = np.arange(9)
print (' The first array :')
print (a)
print (' Divide the array into three equal sized subarrays :')
b = np.split(a,3)
print (b)
print (' Divide the position indicated in the one-dimensional array :')
b = np.split(a,[4,7])
print (b)
>>>
The first array :
[0 1 2 3 4 5 6 7 8] Divide the array into three equal sized subarrays :
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])] Divide the position indicated in the one-dimensional array :
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8])]

In addition, there are addition and deletion operations for array elements

function describe
resize Returns a new array of the specified form
append Add values to the end of the array
insert Inserts a value along the specified axis before the specified subscript
delete Delete the subarray of a certain axis , Returns the new array after deletion
unique Find the only element in the array

NumPy Statistical operation

Calculate the maximum and minimum

numpy.amin(), Calculates the minimum value of the specified axis in the array

numpy.amax(), Calculates the maximum value of the specified axis in the array

a = np.array([[3,7,5],[8,4,3],[2,4,9]])
print (' An array is :')
print (a)
print (' call amin() function :')
print (np.amin(a,1))
print (' Call again amin() function :')
print (np.amin(a,0))
print (' call amax() function :')
print (np.amax(a))
print (' Call again amax() function :')
print (np.amax(a, axis = 0))
>>>
An array is :
[[3 7 5]
[8 4 3]
[2 4 9]]
call amin() function :
[3 3 2]
Call again amin() function :
[2 4 3]
call amax() function :
9
Call again amax() function :
[8 7 9]

Don't specify axis when , Will find the maximum or minimum... In the entire array .
axis = 0, Is to operate on each column , That is, think of the array as [3, 8, 2],[7, 4, 4],[5, 3, 9], Choose the largest or smallest
axis = 1, Is to operate on each line , That is, think of the array as [3, 7, 5],[8, 4, 3],[2, 4, 9].

there axis It's not easy to understand , I also hope you can spend more time here , To practice , To understand .

numpy.ptp, You can calculate the difference between the maximum and minimum values of array elements

a = np.array([[3,7,5],[8,4,3],[2,4,9]])
print (' Our array is :')
print (a)
print (' call ptp() function :')
print (np.ptp(a))
print (' Along axis 1 call ptp() function :')
print (np.ptp(a, axis = 1))
print (' Along axis 0 call ptp() function :')
print (np.ptp(a, axis = 0))
>>>
Our array is :
[[3 7 5]
[8 4 3]
[2 4 9]]
call ptp() function :
7
Along axis 1 call ptp() function :
[4 5 7]
Along axis 0 call ptp() function :
[6 3 6]

numpy.percentile, Calculate percentiles , Represents the percentage of observations less than this value

Understand the percentile : The first p Percentiles represent , It makes at least p% The data item of is less than or equal to this value , And at least there is (100 - p)% The data item of is greater than or equal to this value .

for example : A student's Chinese test score is 80, If this score is just the third of all students' grades 80 Percentiles , Then we can see that the score is greater than about 80% people , about 20% One's grades are higher than that of the classmate .

a = np.array([[10, 7, 4], [3, 2, 1]])
print (' An array is :')
print (a)
print (' call percentile() function :')
# 50% Quantile of , Namely a The median after ranking in 
print (np.percentile(a, 50))
# axis by 0, Find... On the column 
print (np.percentile(a, 50, axis=0))
# axis by 1, Ask... On the horizontal line 
print (np.percentile(a, 50, axis=1))
# Keep dimensions the same 
print (np.percentile(a, 50, axis=1, keepdims=True))
>>>
An array is :
[[10 7 4]
[ 3 2 1]]
call percentile() function :
3.5
[6.5 4.5 2.5]
[7. 2.]
[[7.]
[2.]]

numpy.median, Calculate the median of array elements

a = np.array([[10, 7, 4], [3, 2, 1]])
print (' An array is :')
print (a)
print(np.median(a))
>>>
3.5

It can be seen that ,percentile in p be equal to 50 when , That 's the median

numpy.mean, The average

a = np.array([[10, 7, 4], [3, 2, 1]])
print (' An array is :')
print (a)
print(np.mean(a))
>>>
4.5

numpy.average, Calculate the weighted average

a = np.array([1,2,3,4])
print (' An array is :')
print (a)
print (' call average() function :')
print (np.average(a))
wts = np.array([4,3,2,1])
print (' Call again average() function :')
print (np.average(a,weights = wts))
>>>
An array is :
[1 2 3 4]
call average() function :
2.5
Call again average() function :
2.0

Standard deviation and variance

Standard deviation is a measure of the dispersion of the average of a set of data , It's the arithmetic square root of variance .

Variance is the average of the square of the difference between each sample value and the average of all sample values .

print (np.std([1,2,3,4]))
print (np.var([1,2,3,4]))
>>>
1.118033988749895
1.25

NumPy Sort

stay numpy Just sort a line of code in , Call directly sort Function .

numpy.sort(a, axis, kind, order)

By default , Using a quick sort algorithm ; stay kind in , You can specify quicksort、mergesort and heapsort, Express quick sort respectively 、 Merge sort and heap sort ;axis The default is -1, Sort along the last axis , axis=0 Sort by column ,axis=1 Sort by row ; about order Field , If the value contains a field , You can fill in the fields to sort .

a = np.array([[3,7],[9,1]])
print (' An array is :')
print (a)
print (' call sort() function :')
print (np.sort(a))
print (' Sort by column :')
print (np.sort(a, axis = 0))
print (' Sort by row :')
print (np.sort(a, axis = 1))
>>>
An array is :
[[3 7]
[9 1]]
call sort() function :
[[3 7]
[1 9]]
Sort by column :
[[3 1]
[9 7]]
Sort by row :
[[3 7]
[1 9]]

Pandas

In data analysis , We usually use Pandas To do data cleaning . In real work life , The data we get are often untidy , Null value 、 duplicate value 、 Invalid values and other information will interfere with our analysis , At this point, we need to clean up the data step by step . Data cleaning is a very important step in data analysis , It's also a very cumbersome step , Of course , After you have mastered Pandas After the library , It's like you've got a sword that cuts iron like mud , The efficiency of data cleaning will be greatly improved .

data structure

Pandas There are two main data structures , Namely Series and DataFrame, They represent one-dimensional sequences and two-dimensional table structures, respectively .

dimension name describe
1 Series It can be seen as a label ( The default is a sequence of integers RangeIndex; Can be repeated ) One dimensional array of ( Same type ). yes scalars( Scalar ) Set , It's also DataFrame The elements of .
2 DataFrame Generally two-dimensional labels , Variable size table structure , Potentially heterogeneous Columns .

Series

Series Is a fixed length dictionary sequence . It's equivalent to two ndarray, A representative index, A representative values.

import pandas as pd
s = pd.Series(data, index=index)

Here data, It can be the following data types :

  • Python Medium dict

  • One ndarray

  • A scalar , such as :4

and index The default value of is 0,1,2… An increasing sequence of integers .

Appoint index

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
>>>
a -0.595567
b -0.201314
c 1.516812
d 0.102395
e -1.009924
dtype: float64

Don't specify index

s1 = pd.Series(['a', 'b', 'c', 'd'])
print(s1)
>>>
0 a
1 b
2 c
3 d
dtype: object

Create... Through a dictionary Series

d= {
'a': 1, 'b': 2, 'c': 3}
s2 = pd.Series(d)
print(s2)
>>>
a 1
b 2
c 3
dtype: int64

DataFrame

DataFrame It's a two-dimensional data structure , It can be understood as a data table or SQL surface , Or by Series A dictionary of objects .

d = {
"Chinese": [80, 85, 90], "Math": [85, 70, 95], "English": [90, 95, 90]}
df1 = pd.DataFrame(d)
print(df1)
df2 = pd.DataFrame(d, index=['zhangsan', 'lisi', 'wangwu'])
print(df2)
print(df2.columns, df2.index)
>>>
Chinese Math English
0 80 85 90
1 85 70 95
2 90 95 90
Chinese Math English
zhangsan 80 85 90
lisi 85 70 95
wangwu 90 95 90
Index(['Chinese', 'Math', 'English'], dtype='object') Index(['zhangsan', 'lisi', 'wangwu'], dtype='object')

adopt index choice DataFrame Data in

operation grammar Result Type
Choose a column df[col] Series
Select a row by label df.loc[label] Series
Select a line by label position df.iloc[loc] Series
Slice to get some rows df[5:10] DataFrame
Get some rows from Boolean vectors df[bool_vec] DataFrame

Code

print(df2['Chinese'], '\n')
print(df2.loc['zhangsan'], '\n')
print(df2.iloc[-1], '\n')
print(df2[0:2], '\n')
print(df2[df2>85], '\n')
>>>
zhangsan 80
lisi 85
wangwu 90
Name: Chinese, dtype: int64 Chinese 80
Math 85
English 90
Name: zhangsan, dtype: int64 Chinese 90
Math 95
English 90
Name: wangwu, dtype: int64 Chinese Math English
zhangsan 80 85 90
lisi 85 70 95 Chinese Math English
zhangsan NaN NaN 90
lisi NaN NaN 95
wangwu 90.0 95.0 90

Basic use

Read / Save the data

Reading data

df = pd.read_csv("test.csv")
print(df.head())
print('\n')
print(type(df))
>>>
name age score
0 zhangsan 30.0 80.0
1 lisi 20.0 NaN
2 wangwu 25.0 100000.0
3 zhaoliu NaN 32.0
4 maqi 33.0 60.0
<class 'pandas.core.frame.DataFrame'>

Save the data

df.to_csv('my.csv')
df.to_excel('my.xlsx')

View the data

print(df.index, '\n')
print(df.columns, '\n')
print(df.to_numpy(), '\n')
print(df.describe())
>>>
RangeIndex(start=0, stop=5, step=1) Index(['name', 'age', 'score'], dtype='object') [['zhangsan' 30.0 80.0]
['lisi' 20.0 nan]
['wangwu' 25.0 100000.0]
['zhaoliu' nan 32.0]
['maqi' 33.0 60.0]] age score
count 4.000000 4.000000
mean 27.000000 25043.000000
std 5.715476 49971.337211
min 20.000000 32.000000
25% 23.750000 53.000000
50% 27.500000 70.000000
75% 30.750000 25060.000000
max 33.000000 100000.000000

describe Is a very common function , You can use it to see the whole picture of the data , Help understand the data .

Sort

Sort by axis

print(df.sort_index(axis=1, ascending=False))
>>>
score name age
0 80.0 zhangsan 30.0
1 NaN lisi 20.0
2 100000.0 wangwu 25.0
3 32.0 zhaoliu NaN
4 60.0 maqi 33.0

Sort by number

print(df.sort_values(by='score'))
>>>
name age score
3 zhaoliu NaN 32.0
4 maqi 33.0 60.0
0 zhangsan 30.0 80.0
2 wangwu 25.0 100000.0
1 lisi 20.0 NaN

Missing value

View missing values

print(df.isnull(),'\n')
print(df.isnull().any())
>>>
name age score
0 False False False
1 False False True
2 False False False
3 False True False
4 False False False name False
age True
score True
dtype: bool

It is easy to see that , Which columns have null values .

Delete / Fill in empty values

df1 = df.copy()
print(df1, '\n')
print(df1.dropna(how='any'), '\n')
print(df1.fillna(value=50))
>>>
name age score
0 zhangsan 30.0 80.0
1 lisi 20.0 NaN
2 wangwu 25.0 100000.0
3 zhaoliu NaN 32.0
4 maqi 33.0 60.0 name age score
0 zhangsan 30.0 80.0
2 wangwu 25.0 100000.0
4 maqi 33.0 60.0 name age score
0 zhangsan 30.0 80.0
1 lisi 20.0 50.0
2 wangwu 25.0 100000.0
3 zhaoliu 50.0 32.0
4 maqi 33.0 60.0

Common operations

To be ranked high

df1.rename(columns={
'name': 'student'}, inplace = True)
print(df1)
>>>
student age score
0 zhangsan 30.0 80.0
1 lisi 20.0 NaN
2 wangwu 25.0 100000.0
3 zhaoliu NaN 32.0
4 maqi 33.0 60.0

Delete column / That's ok

df1 = df1.drop(columns=['age'])
print(df1, '\n')
df1 = df1.drop(index=[1])
print(df1)
>>>
student score
0 zhangsan 80.0
1 lisi NaN
2 wangwu 100000.0
3 zhaoliu 32.0
4 maqi 60.0 student score
0 zhangsan 80.0
2 wangwu 100000.0
3 zhaoliu 32.0
4 maqi 60.0

Remove duplicate values

df = df.drop_duplicates() # Remove duplicate lines 

Modify the data format

df1['score'].astype('str')

apply The application of function
apply Used to apply functions to data .

df2 = df1['score'].apply(lambda x: x * 2)
print(df2)
>>>
0 160.0
2 200000.0
3 64.0
4 120.0
Name: score, dtype: float64

The above code is equivalent to

list(map(lambda x: x*2, df1['score']))
>>>
[160.0, 200000.0, 64.0, 120.0]

From this we can see that ,apply Is an efficient and concise function , You can quickly apply functions to each element .

Histogram

The so-called histogram , That's the function. value_counts, This function can view the data , How many different values are there in each column , And the number of different values

print(df, '\n')
df3 = df.fillna(60)
df3.loc[5] = ['qianba', 20, 80] # Add a new line 
print(df3['score'].value_counts())
>>>
name age score
0 zhangsan 30.0 80.0
1 lisi 20.0 NaN
2 wangwu 25.0 100000.0
3 zhaoliu NaN 32.0
4 maqi 33.0 60.0 60.0 2
80.0 2
32.0 1
100000.0 1
Name: score, dtype: int64

Table merging and grouping

Merge

1、 Use concat Connect two Pandas object

print(df3, '\n')
df4 = df3.copy()
df3 = pd.concat([df3, df4], ignore_index=True)
print(df3)
>>>
name age score
0 zhangsan 30.0 80.0
1 lisi 20.0 60.0
2 wangwu 25.0 100000.0
3 zhaoliu 60.0 32.0
4 maqi 33.0 60.0
5 qianba 20.0 80.0 name age score
0 zhangsan 30.0 80.0
1 lisi 20.0 60.0
2 wangwu 25.0 100000.0
3 zhaoliu 60.0 32.0
4 maqi 33.0 60.0
5 qianba 20.0 80.0
6 zhangsan 30.0 80.0
7 lisi 20.0 60.0
8 wangwu 25.0 100000.0
9 zhaoliu 60.0 32.0
10 maqi 33.0 60.0
11 qianba 20.0 80.0

2、 Use merge function

Join based on a column

left = pd.DataFrame({
'key': ['foo', 'bar', 'loo'], 'lval': [1, 2, 3]})
right = pd.DataFrame({
'key': ['foo', 'bar', 'roo'], 'rval': [3, 4, 5]})
print(left, '\n')
print(right, '\n')
print(pd.merge(left, right, on='key'))
>>>
key lval
0 foo 1
1 bar 2
2 loo 3
key rval
0 foo 3
1 bar 4
2 roo 5
key lval rval
0 foo 1 3
1 bar 2 4

Internal connection (innert), The intersection of keys

print(pd.merge(left, right, how='inner'))
>>>
key lval rval
0 foo 1 3
1 bar 2 4

And the left link 、 Right connection and outer connection , You can try it yourself , See what's the difference .

grouping

So called grouping , According to some standards , Break the data into groups , Apply the function independently to each group , Finally, the results are combined into a data structure .

df = pd.DataFrame({
'A': ['foo', 'bar', 'bar', 'foo', 'foo', 'foo'],
'B': ['one', 'two', 'three', 'one', 'two', 'two'],
'C':[1, 2, 3, 4, 5, 6]})
print(df, '\n')
print(df.groupby('A').sum(), '\n')
print(df.groupby('B').sum())
>>>
A B C
0 foo one 1
1 bar two 2
2 bar three 3
3 foo one 4
4 foo two 5
5 foo two 6 C
A
bar 5
foo 16 C
B
one 5
three 3
two 13

You can also group by multiple columns

print(df.groupby(['A', 'B']).sum())
>>>
C
A B
bar three 3
two 2
foo one 5
two 11

Draw a simple chart

Pandas It also provides the function of drawing charts

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2018', periods=1000))
print(ts, '\n')
ts = ts.cumsum() # Returns the cumulative value 
ts.plot()
>>>
2018-01-01 1.055229
2018-01-02 0.101467
2018-01-03 -2.083537
2018-01-04 1.178102
2018-01-05 -0.084247
...
2020-09-22 -4.316770
2020-09-23 -0.823494
2020-09-24 0.215199
2020-09-25 1.094516
2020-09-26 0.285788
Freq: D, Length: 1000, dtype: float64 Out[94]:
<matplotlib.axes._subplots.AxesSubplot at 0x4742270>

Okay , Today's sharing is here , Is it long enough ! Originality is not easy. , Give me one “ Fabulous ”


Technical communication

Welcome to reprint 、 Collection 、 Gain some praise and support !

 Insert picture description here

At present, a technical exchange group has been opened , Group friends have exceeded 2000 people , The best way to add notes is : source + Interest direction , Easy to find like-minded friends

  • The way ①、 Send the following picture to wechat , Long press recognition , The background to reply : Add group ;
  • The way ②、 Add microsignals :dkl88191, remarks : come from CSDN
  • The way ③、 WeChat search official account :Python Learning and data mining , The background to reply : Add group

 Long press attention

版权声明
本文为[Python learning and data mining]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/10/20211013005427691i.html

  1. 有关python求众数,中位数和均值的题目
  2. 零基础5天入门Python数据分析:第四课
  3. 零基础5天入门Python数据分析:第三课(上)
  4. 零基础5天入门Python数据分析:第一课
  5. python redis自带门神 lock 方法
  6. 【算法学习】LCP 01. 猜数字(java / c / c++ / python / go)
  7. 【Python量化分析100例】Day2-星期几最容易被割韭菜
  8. python逆推年份,前两问写好了,第三问不会
  9. Python 爬取百度网页如何绕过安全验证
  10. 零基础5天入门Python数据分析:第五课
  11. Python人脸融合时出现关于pybind11的问题
  12. python如何返回除数,公约数,倍数
  13. python 返回多重嵌套列表(多于两层嵌套)的元素
  14. 用Python采集了几千条相亲文案,终于发现了告别单身的秘密
  15. python正负序列题,目前只学到循环,怎么做啊(*꒦ິ⌓꒦ີ)
  16. 拿爱奇艺练手Python爬虫,是在法律边缘试探吗?爬虫技巧学习
  17. Python注释删除代码依然报错
  18. python的pyautogui模块中的pyautogui.scroll()括号中无论写什么值滚动范围都相同
  19. 为什么python在vscode里运行报语法错误,在IDLE里就不会
  20. 请问python如何在将pdf转成word时,去除pdf上的页眉页脚(或者对于每页pdf只取第2行-倒数第二行)
  21. matlab改为python,偏最小二乘回归分析的一个程序
  22. 应该是python基础题希望能用基础方法解决
  23. 想找个会Python的做场外援助,上课没听明白
  24. Python程序,插入不了MySQL的date格式
  25. (初学者)关于Python操作Excel问题
  26. 求人来解答这两道Python题
  27. python中用三引号换行,举例说明
  28. python数码管该怎么用,十四段
  29. python进行中文文本聚类(切词以及Kmeans聚类)
  30. Python - 字符串作为文件
  31. Python - 转换二进制为ASCII码
  32. Python - 在段落中计算令牌
  33. Python - 重新格式化段落
  34. Python - 排序线
  35. Python - 字符串不变性
  36. Python - 文本摘要
  37. Python+微信小程序开发(六)双向绑定和前后端通信
  38. 基于Anaconda搭建Django环境
  39. Django基础篇(2)--视图
  40. 288页的python编程文档,从入门到实践,入门看这一篇就够了
  41. Python Web实战:Flask + Vue 开发一个漂亮的词云网站
  42. 让我深夜十二点催她睡觉,我用 Python 轻松搞定!
  43. 4.Python-常用语句
  44. 【Python】基于FastAPI的Restful规范实践
  45. 【Python】FastAPI脚手架:规范FastAPI后端接口项目开发
  46. 【Python】单元测试实践内部指南
  47. Django开发中使用Cache缓存提升10倍效率
  48. python如何重复执行程序命令而不是一次退出
  49. python 编写程序题使用for循环
  50. 一道简单的python作业题,就是不能运行
  51. 使用python回答,望有人来帮
  52. 用python插入日期格式到mysql数据库中,一直运行不了。
  53. 关于以下Python问题如何解决
  54. Use Python to help the financial sister solve the PDF splitting. The sister said it was great...
  55. Comment résoudre les problèmes Python suivants
  56. 如何使用python建立列表?新手入门
  57. python 3d画图库matplotlib,第一次用
  58. python 3d畫圖庫matplotlib,第一次用
  59. Python 3D painting Library matplotlib, utilisé pour la première fois
  60. Comment créer une liste en utilisant python? Débutant