Linear regression It's one of the basic statistical and machine learning technologies . economic , Computer science , In Social Sciences and so on , Whether it's statistical analysis , Or machine learning , Or scientific computing , There's a huge opportunity to use linear models . I suggest learning it first , And then try something more complicated .
This paper mainly introduces How to step in Python Linear regression is realized in . As for the mathematical derivation of linear regression 、 How does linear regression work , How parameter selection improves the regression model will be explained later .
Return to
Regression analysis is one of the most important fields in statistics and machine learning . There are many regression methods available . Linear regression is one of them . And linear regression is probably one of the most important and widely used regression techniques . This is one of the simplest regression methods . One of its main advantages is that the results obtained by linear regression are very easy to explain . Well, the return mainly includes :
Simple linear regression
Multiple linear regression
Polynomial regression
How to be in python Linear regression is realized in
Use of packages
NumPy
NumPy yes Python Basic science software package for , It allows many high-performance operations to be performed on single - and multi-dimensional arrays .
scikit-learn
scikit-learn Is in NumPy And some other packages based on the widely used Python Machine learning library . It provides preprocessing data , Reduce the dimension , Return to reality , classification , Clustering and so on .
statsmodels
If you want to achieve linear regression and need to function beyond scikit-learn The scope of the , The use of statsmodels
Can be used to estimate statistical models , Perform tests, etc .
scikit-learn Simple linear regression of
1. Import used packages And the class
import numpy as np
from sklearn.linear_model import LinearRegression
2. Create data
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])
Now we have two arrays : Input x( Regression variables ) and Output y( Predictive variables ), Let's see
>>> print(x)
[[ 5]
[15]
[25]
[35]
[45]
[55]]
>>> print(y)
[ 5 20 14 32 22 38]
You can see x It's two-dimensional and y It's one-dimensional , Because in a more complicated model , There's more than one coefficient . It's here .reshape() To switch .
3. Build a model
Create an instance of a class LinearRegression
, It will represent the regression model :
model = LinearRegression()
Now start fitting the model , First you can call .fit()
Function to get the optimal