# Machine learning | unitary regression model Python practical case

Data Studio 2021-09-15 07:22:59
machine learning unitary regression model

Hello everyone , I am cloud king ！

Book No. 「 data STUDIO」 Long term acceptance of paid contributions , Public menu bar 【 Cloud house 】-【 contribute 】 You can view the draft document ！

This article is contributed by my friend Cai ge , official account ： You can call me brother CAI The owner of , As a game operator, he taught himself python Just to make work easier , At present, this number has accumulated 100 original articles , cover python Basics 、pandas Data analysis 、 Data visualization and python Reptiles, etc , Welcome to pay attention , Study with brother Cai .

Our actual combat case uses the relationship data between beer sales and temperature , Explore the impact of temperature on beer sales . In practice, the factors affecting beer sales are naturally more than temperature , However, this actual combat only considers the variable of temperature .

Regression analysis involves only two variables , Called univariate regression analysis . The main task of univariate regression is to estimate another variable from one of the two related variables , Estimated variables , Dependent variable , May be set as Y; Estimated variables , Nominal variable , Set to X. Regression analysis is to find out a mathematical model Y=f(X), Make from X It is estimated that Y It can be calculated by a function . When Y=f(X) When the form of is a linear equation , be called Univariate linear regression . This equation can be expressed as Y=A+BX, According to the least square method or other methods , Constant terms can be determined from sample data A And regression coefficient B Value .

### 1. Bring in tool library

Here we need to use `numpy``pandas` and `matplotlib` Three swordsmen and scientific computing package `scipy`、 Statistical model library `statsmodels` and `seaborn`.

```# Bring in tool library
import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
# A library for estimating statistical models
import statsmodels.formula.api as smf
import statsmodels.api as sm
```

### 2. Load data and draw joint distribution map

```# Read case data
```

Case data

#### Draw a joint distribution map

```# Draw a joint distribution map
sns.jointplot(x = "temperature", y = "beer",
data = beer,
color = 'black'
)
```

Joint distribution

As you can see from the diagram , The higher the temperature, the higher the sales volume .

### 3. mathematical modeling

We build a univariate regression model ,`Y=A+BX`, among X It's the temperature ,Y It's sales ,A and B Is the value to be determined , among A For constant ,B Is the regression coefficient .

If B Not for 0, It can be considered that beer sales are related to the temperature ; If B Is a positive number , The higher the temperature, the higher the beer sales ; If B It's a negative number , The opposite is true .

When A and B After the value is determined , We can predict sales based on the temperature .

After determining the basic model , We use `ols` Function modeling ,`fit` The function is fitted

```# Modeling and fitting
lm_model = smf.ols(formula = "beer ~ temperature",
data = beer).fit()
```
• `ols` It is the abbreviation of least square method , Its full name is `ordinary least squares`
• `"beer ~ temperature"` The variables representing the model are `temperature`, The dependent variable is `beer`
• `fit` Is the fitting process , Autocomplete parameters A and B Estimation

We'll pass it again `summery` Function print results **（OLS Details of the model **）

```# OLS Details of the model
lm_model.summary()
```

OLS Details of the model

In the above `OLS` Model details , In the second part Intercept and temperature It's ours A and B

`coef` Namely A and B Specific value ,`std err` Is the standard error of the coefficient , This is followed by `t` value 、0 Hypothetical `p` Value and 95%` confidence interval ` Lower confidence limit and upper confidence limit .

p The smaller the value. , It can be considered that the coefficient of temperature and 0 There are significant differences between , That is, the relationship between temperature and sales is obvious .

And we see the coefficient `B` Value 0.7654 It is greater than 0 Of , That is, the higher the temperature , The more beer you sell .

About `OLS` Description of other information in model details

`Dep. Variable`: The name of the variable `Mode`l/`Method`: The model is the least square method `Date`: Modeling date `No. Observations`: Sample size `Df Residuals`: Sample size minus the number of parameters involved in the estimation `Df Mode`l: Number of dependent variables used `Covariance Type`: Covariance type , The default is `nonrobust` `R-squared/Adj. R-squared`: Determination coefficient and modified determination coefficient `F-statistic:/Prob (F-statistic)`: The results of analysis of variance `Log-Likelihood`: Maximum log likelihood `AIC`: Red pool information criterion `BIC`: Bayesian information criterion

#### Coefficient of determination

The coefficient of determination here is 0.504, It means the proportion of the difference that can be predicted by the model in the overall difference , How do you understand that ？

If we don't have a regression model , So the average is our best estimate , The degree of variation is expressed by sample variance , namely （ Sample value - Average ） Sum of squares of , Call it total variation If you have a regression model , Then we can infer and predict the result of a specific independent variable through regression model , such （ Sample value - Predictive value ） The sum of squares is the degree of variation that cannot be explained ,（ Sample value - Predictive value ） The sum of squares is called the residual sum of squares . Suppose there is a perfect model that can predict all the observation points , The unexplained variation is 0 了 . The coefficient of determination is the variation that can be explained / Total variation , The higher the coefficient of determination , The higher the degree to which the representation can be interpreted , The better the regression model .

### 4. Use models to predict

After the parameters of the univariate regression model are determined , We can make predictions , Direct use `predict` Function .

```# Estimated value of univariate regression model
beer['predict_beer'] = lm_model.predict()
```

If you want to predict sales at a certain temperature , It can be like this ：

```# forecast The temperature 30 Sales volume
lm_model.predict(pd.DataFrame({"temperature":}))
''' Output
0 57.573043
dtype: float64
```
```

We plot the combination between actual and estimated values （ The former is scattered 、 The latter is straight ）.

```# Chinese display
plt.rcParams['font.sans-serif'] = ['SimHei']
x = beer.temperature
y1 = beer.beer
y2 = beer.predict_beer
plt.plot(x, y1, 'o', c='r', label=' Raw data ')
plt.plot(x, y2, label=' Univariate regression model ')
plt.legend()
```

### 5. Draw a regression curve

Actually ,`sns.lmplot` You can draw a regression curve .

```sns.lmplot(x = "temperature", y = "beer",
data = beer,
scatter_kws = {"color": "black"},
line_kws = {"color": "black"}
)
```

Because only one independent variable is involved in the univariate regression model , So it's a relatively simple model case , What we encounter in real life is more multivariable regression models , Let's follow up . Complete data acquisition in this paper ： A little praise and after watching , Reply to the background of this official account. ：「210903」 that will do .