Machine learning | unitary regression model Python practical case

Data Studio 2021-09-15 07:22:59
machine learning unitary regression model


Hello everyone , I am cloud king !

Book No. 「 data STUDIO」 Long term acceptance of paid contributions , Public menu bar 【 Cloud house 】-【 contribute 】 You can view the draft document !

This article is contributed by my friend Cai ge , official account : You can call me brother CAI The owner of , As a game operator, he taught himself python Just to make work easier , At present, this number has accumulated 100 original articles , cover python Basics 、pandas Data analysis 、 Data visualization and python Reptiles, etc , Welcome to pay attention , Study with brother Cai .

Our actual combat case uses the relationship data between beer sales and temperature , Explore the impact of temperature on beer sales . In practice, the factors affecting beer sales are naturally more than temperature , However, this actual combat only considers the variable of temperature .


Regression analysis involves only two variables , Called univariate regression analysis . The main task of univariate regression is to estimate another variable from one of the two related variables , Estimated variables , Dependent variable , May be set as Y; Estimated variables , Nominal variable , Set to X. Regression analysis is to find out a mathematical model Y=f(X), Make from X It is estimated that Y It can be calculated by a function . When Y=f(X) When the form of is a linear equation , be called Univariate linear regression . This equation can be expressed as Y=A+BX, According to the least square method or other methods , Constant terms can be determined from sample data A And regression coefficient B Value .

1. Bring in tool library

Here we need to use numpypandas and matplotlib Three swordsmen and scientific computing package scipy、 Statistical model library statsmodels and seaborn.

# Bring in tool library
import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
# A library for estimating statistical models
import statsmodels.formula.api as smf
import statsmodels.api as sm

2. Load data and draw joint distribution map

Load data

# Read case data
beer = pd.read_csv("beer.csv")
beer.head()

Case data

Draw a joint distribution map

# Draw a joint distribution map
sns.jointplot(x = "temperature", y = "beer",
data = beer,
color = 'black'
)

Joint distribution

As you can see from the diagram , The higher the temperature, the higher the sales volume .

3. mathematical modeling

We build a univariate regression model ,Y=A+BX, among X It's the temperature ,Y It's sales ,A and B Is the value to be determined , among A For constant ,B Is the regression coefficient .

If B Not for 0, It can be considered that beer sales are related to the temperature ; If B Is a positive number , The higher the temperature, the higher the beer sales ; If B It's a negative number , The opposite is true .

When A and B After the value is determined , We can predict sales based on the temperature .

OLS Return to

After determining the basic model , We use ols Function modeling ,fit The function is fitted

# Modeling and fitting
lm_model = smf.ols(formula = "beer ~ temperature",
data = beer).fit()
  • ols It is the abbreviation of least square method , Its full name is ordinary least squares
  • "beer ~ temperature" The variables representing the model are temperature, The dependent variable is beer
  • fit Is the fitting process , Autocomplete parameters A and B Estimation

We'll pass it again summery Function print results **(OLS Details of the model **)

# OLS Details of the model
lm_model.summary()

OLS Details of the model

In the above OLS Model details , In the second part Intercept and temperature It's ours A and B

coef Namely A and B Specific value ,std err Is the standard error of the coefficient , This is followed by t value 、0 Hypothetical p Value and 95% confidence interval Lower confidence limit and upper confidence limit .

p The smaller the value. , It can be considered that the coefficient of temperature and 0 There are significant differences between , That is, the relationship between temperature and sales is obvious .

And we see the coefficient B Value 0.7654 It is greater than 0 Of , That is, the higher the temperature , The more beer you sell .

About OLS Description of other information in model details

Dep. Variable: The name of the variable Model/Method: The model is the least square method Date: Modeling date No. Observations: Sample size Df Residuals: Sample size minus the number of parameters involved in the estimation Df Model: Number of dependent variables used Covariance Type: Covariance type , The default is nonrobust R-squared/Adj. R-squared: Determination coefficient and modified determination coefficient F-statistic:/Prob (F-statistic): The results of analysis of variance Log-Likelihood: Maximum log likelihood AIC: Red pool information criterion BIC: Bayesian information criterion

Coefficient of determination

The coefficient of determination here is 0.504, It means the proportion of the difference that can be predicted by the model in the overall difference , How do you understand that ?

If we don't have a regression model , So the average is our best estimate , The degree of variation is expressed by sample variance , namely ( Sample value - Average ) Sum of squares of , Call it total variation If you have a regression model , Then we can infer and predict the result of a specific independent variable through regression model , such ( Sample value - Predictive value ) The sum of squares is the degree of variation that cannot be explained ,( Sample value - Predictive value ) The sum of squares is called the residual sum of squares . Suppose there is a perfect model that can predict all the observation points , The unexplained variation is 0 了 . The coefficient of determination is the variation that can be explained / Total variation , The higher the coefficient of determination , The higher the degree to which the representation can be interpreted , The better the regression model .

4. Use models to predict

After the parameters of the univariate regression model are determined , We can make predictions , Direct use predict Function .

# Estimated value of univariate regression model
beer['predict_beer'] = lm_model.predict()
beer.head()

If you want to predict sales at a certain temperature , It can be like this :

# forecast The temperature 30 Sales volume
lm_model.predict(pd.DataFrame({"temperature":[30]}))
''' Output
0 57.573043
dtype: float64
```

We plot the combination between actual and estimated values ( The former is scattered 、 The latter is straight ).

# Chinese display
plt.rcParams['font.sans-serif'] = ['SimHei']
x = beer.temperature
y1 = beer.beer
y2 = beer.predict_beer
plt.plot(x, y1, 'o', c='r', label=' Raw data ')
plt.plot(x, y2, label=' Univariate regression model ')
plt.legend()

5. Draw a regression curve

Actually ,sns.lmplot You can draw a regression curve .

sns.lmplot(x = "temperature", y = "beer",
data = beer,
scatter_kws = {"color": "black"},
line_kws = {"color": "black"}
)

Because only one independent variable is involved in the univariate regression model , So it's a relatively simple model case , What we encounter in real life is more multivariable regression models , Let's follow up . Complete data acquisition in this paper : A little praise and after watching , Reply to the background of this official account. :「210903」 that will do .

This article is from WeChat official account. - data STUDIO(jim_learning) , author : Only an elder brother

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

Original publication time : 2021-09-03

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

版权声明
本文为[Data Studio]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/09/20210909125824231b.html

  1. Take you to learn more about nginx basic login authentication: generating passwords using Python
  2. 超硬核Python避坑学习方案奉上!入门到就业一篇就搞定!
  3. Talk about how JMeter executes Python scripts concurrently
  4. Talk about how JMeter executes Python scripts concurrently
  5. Talk about how JMeter executes Python scripts concurrently
  6. python内置函数通过字符串的方式来执行函数代码块,类似java的反射机制相当强大!
  7. python内置函数通过字符串的方式来执行函数代码块,类似java的反射机制相当强大!
  8. python內置函數通過字符串的方式來執行函數代碼塊,類似java的反射機制相當强大!
  9. Les fonctions intégrées Python exécutent des blocs de code de fonction à travers des chaînes, et les mécanismes de réflexion comme Java sont assez puissants!
  10. Python module 1
  11. Python tip: use namedtuple instead of manually created classes
  12. Python - poetry(3)配置项详解
  13. Python - poetry(3)配置项详解
  14. Python - poetry(3)配置項詳解
  15. Python - poetry(3)配置項詳解
  16. Python - détails de l'élément de configuration Poetry (3)
  17. Python - détails de l'élément de configuration Poetry (3)
  18. Python案例实战,pygame模块,Python实现字母代码雨
  19. Python calculation vector angle code
  20. Python基础面试题解读|《Python面试100层》|第1层
  21. 面对小白的pandas命令手册+练习题【三万字详解】
  22. 面對小白的pandas命令手册+練習題【三萬字詳解】
  23. Face au Manuel de commande pandas de Xiaobai + question d'exercice [30 000 mots pour plus de détails]
  24. Interprétation des questions d'entrevue de base Python | 100 couches d'entrevue Python | couche 1
  25. Python data structure and algorithm (17) -- merge sort
  26. Les fonctions intégrées Python exécutent des blocs de code de fonction à travers des chaînes, et les mécanismes de réflexion comme Java sont assez puissants!
  27. Python笔记-uiautomator2截图点击,OpenCV找图
  28. Python文档阅读笔记-OpenCV中Template Matching
  29. Python笔记-利用OpenCV的matchTemplate屏幕找图并使用pyautogui点击
  30. Python筆記-利用OpenCV的matchTemplate屏幕找圖並使用pyautogui點擊
  31. Notes python - utilisez l'écran matchtemplate d'OpenCV pour trouver des images et cliquez sur
  32. Notes de lecture de documents python - Matching de modèles dans OpenCV
  33. Notes python - capture d'écran de l'automate 2 Cliquez pour ouvrir la vue
  34. python链接云服务器的mysql8
  35. python鏈接雲服務器的mysql8
  36. Mysql8 pour les serveurs Cloud liés Python
  37. Python资源大集合,要的话可以拿走!
  38. ️万字【Python基础】保姆式教学️,小白快速入门Python!
  39. ️萬字【Python基礎】保姆式教學️,小白快速入門Python!
  40. Wanzi [base Python] Baby - sitting Teaching, Little White Quick Start Python!
  41. Realizing the function of sending e-mail automatically with Python
  42. Smtpauthenticationerror in Python: solution
  43. 8 steps to teach you how to solve Sudoku in Python! (including source code)
  44. Python change la vie | identifier facilement des centaines de numéros de livraison
  45. Python change life | utilisation de modèles reconnus par ocr
  46. Bibliothèques Python utiles et intéressantes - - psutil
  47. 3. Traitement des données pandas
  48. 【Python编程基础】控制流之链式比较运算符
  49. MFC uses Python scripting language
  50. 【Python編程基礎】控制流之鏈式比較運算符
  51. 【 base de programmation python】 opérateur de comparaison de chaîne pour le flux de contrôle
  52. Python game development, pyGame module, python implementation of Xiaole games
  53. Mise en œuvre du Code de vérification unique (OTP) avec le cadre de repos Django
  54. Python - eval ()
  55. Python - Programmation orientée objet - _Rapport()
  56. Différence entre python - rep (), Str ()
  57. Python - Programmation orientée objet - _Appel()
  58. Python calling matlab script
  59. Python - Programmation orientée objet - _Nouveau() et mode Singleton
  60. Python - Programmation orientée objet - méthode magique (méthode de double soulignement)